40
Research Article The Discriminative Lexicon: A Unified Computational Model for the Lexicon and Lexical Processing in Comprehension and Production Grounded Not in (De)Composition but in Linear Discriminative Learning R. Harald Baayen , 1 Yu-Ying Chuang, 1 Elnaz Shafaei-Bajestan, 1 and James P. Blevins 2 1 Seminar f¨ ur Sprachwissenschaſt, Eberhard-Karls University of T¨ ubingen, Wilhelmstrasse 19, 72074 T¨ ubingen, Germany 2 Homerton College, University of Cambridge, Hills Road, Cambridge CB2 8PH, UK Correspondence should be addressed to R. Harald Baayen; [email protected] Received 1 June 2018; Accepted 14 November 2018; Published 1 January 2019 Guest Editor: Dirk Wulff Copyright © 2019 R. Harald Baayen et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. e discriminative lexicon is introduced as a mathematical and computational model of the mental lexicon. is novel theory is inspired by word and paradigm morphology but operationalizes the concept of proportional analogy using the mathematics of linear algebra. It embraces the discriminative perspective on language, rejecting the idea that words’ meanings are compositional in the sense of Frege and Russell and arguing instead that the relation between form and meaning is fundamentally discriminative. e discriminative lexicon also incorporates the insight from machine learning that end-to-end modeling is much more effective than working with a cascade of models targeting individual subtasks. e computational engine at the heart of the discriminative lexicon is linear discriminative learning: simple linear networks are used for mapping form onto meaning and meaning onto form, without requiring the hierarchies of post-Bloomfieldian ‘hidden’ constructs such as phonemes, morphemes, and stems. We show that this novel model meets the criteria of accuracy (it properly recognizes words and produces words correctly), productivity (the model is remarkably successful in understanding and producing novel complex words), and predictivity (it correctly predicts a wide array of experimental phenomena in lexical processing). e discriminative lexicon does not make use of static representations that are stored in memory and that have to be accessed in comprehension and production. It replaces static representations by states of the cognitive system that arise dynamically as a consequence of external or internal stimuli. e discriminative lexicon brings together visual and auditory comprehension as well as speech production into an integrated dynamic system of coupled linear networks. 1. Introduction eories of language and language processing have a long history of taking inspiration from mathematics and computer science. For more than a century, formal logic has been influencing linguistics [1–3], and one of the most widely- known linguistic theories, generative grammar, has strong roots in the mathematics of formal languages [4, 5]. Similarly, Bayesian inference is currently seen as an attractive frame- work for understanding language processing [6]. However, current advances in machine learning present linguistics with new challenges and opportunities. Meth- ods across machine learning, such as random forests [7, 8] and deep learning [9–11], offer unprecedented predic- tion accuracy. At the same time, these new approaches confront linguistics with deep fundamental questions, as these models typically are so-called end-to-end models that eschew representations for standard linguistic con- structs such as phonemes, morphemes, syllables, and word forms. ere are mainly three possible responses to these new algorithms. A first response is to dismiss machine learning as irrelevant for understanding language and cognition. Given that machine learning algorithms currently outperform algo- rithms that make use of standard concepts from linguistics, this response is unlikely to be productive in the long run. Hindawi Complexity Volume 2019, Article ID 4895891, 39 pages https://doi.org/10.1155/2019/4895891

The Discriminative Lexicon: A Unified Computational Model ...downloads.hindawi.com/journals/complexity/2019/4895891.pdf · as developed within computational linguistics and com-puter

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The Discriminative Lexicon: A Unified Computational Model ...downloads.hindawi.com/journals/complexity/2019/4895891.pdf · as developed within computational linguistics and com-puter

Research ArticleThe Discriminative Lexicon A Unified Computational Model forthe Lexicon and Lexical Processing in Comprehension andProduction Grounded Not in (De)Composition but in LinearDiscriminative Learning

R Harald Baayen 1 Yu-Ying Chuang1 Elnaz Shafaei-Bajestan1 and James P Blevins2

1Seminar fur Sprachwissenschaft Eberhard-Karls University of Tubingen Wilhelmstrasse 19 72074 Tubingen Germany2Homerton College University of Cambridge Hills Road Cambridge CB2 8PH UK

Correspondence should be addressed to R Harald Baayen haraldbaayengmailcom

Received 1 June 2018 Accepted 14 November 2018 Published 1 January 2019

Guest Editor Dirk Wulff

Copyright copy 2019 R Harald Baayen et al This is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in any medium provided the original work is properlycited

The discriminative lexicon is introduced as a mathematical and computational model of the mental lexicon This novel theory isinspired by word and paradigm morphology but operationalizes the concept of proportional analogy using the mathematics oflinear algebra It embraces the discriminative perspective on language rejecting the idea that wordsrsquo meanings are compositional inthe sense of Frege and Russell and arguing instead that the relation between form andmeaning is fundamentally discriminativeThediscriminative lexicon also incorporates the insight from machine learning that end-to-end modeling is much more effective thanworking with a cascade of models targeting individual subtasksThe computational engine at the heart of the discriminative lexiconis linear discriminative learning simple linear networks are used for mapping form onto meaning andmeaning onto form withoutrequiring the hierarchies of post-Bloomfieldian lsquohiddenrsquo constructs such as phonemes morphemes and stems We show that thisnovel model meets the criteria of accuracy (it properly recognizes words and produces words correctly) productivity (the model isremarkably successful in understanding and producing novel complex words) and predictivity (it correctly predicts a wide arrayof experimental phenomena in lexical processing) The discriminative lexicon does not make use of static representations that arestored in memory and that have to be accessed in comprehension and production It replaces static representations by states of thecognitive system that arise dynamically as a consequence of external or internal stimuli The discriminative lexicon brings togethervisual and auditory comprehension as well as speech production into an integrated dynamic system of coupled linear networks

1 Introduction

Theories of language and language processing have a longhistory of taking inspiration frommathematics and computerscience For more than a century formal logic has beeninfluencing linguistics [1ndash3] and one of the most widely-known linguistic theories generative grammar has strongroots in themathematics of formal languages [4 5] SimilarlyBayesian inference is currently seen as an attractive frame-work for understanding language processing [6]

However current advances in machine learning presentlinguistics with new challenges and opportunities Meth-ods across machine learning such as random forests [7

8] and deep learning [9ndash11] offer unprecedented predic-tion accuracy At the same time these new approachesconfront linguistics with deep fundamental questions asthese models typically are so-called end-to-end modelsthat eschew representations for standard linguistic con-structs such as phonemes morphemes syllables and wordforms

There are mainly three possible responses to these newalgorithms A first response is to dismiss machine learning asirrelevant for understanding language and cognition Giventhat machine learning algorithms currently outperform algo-rithms that make use of standard concepts from linguisticsthis response is unlikely to be productive in the long run

HindawiComplexityVolume 2019 Article ID 4895891 39 pageshttpsdoiorg10115520194895891

2 Complexity

A second response is to interpret the units on thehidden layers of deep learning networks as capturing therepresentations and their hierarchical organization familiarfrom standard linguistic frameworks The dangers inherentin this approach are well illustrated by the deep learningmodel proposed by Hannagan et al [12] for lexical learningin baboons [13] The hidden layers of their deep learningnetwork were claimed to correspond to areas in the ventralpathway of the primate brain However Scarf et al [14]reported that pigeons can also learn to discriminate betweenEnglish words and nonword strings Given that the avianbrain is organized very differently from the primate brain andyet accomplishes the same task the claim that the layers ofHannagan et alrsquos deep learning network correspond to areasin the ventral pathway must be premature FurthermoreLinke et al [15] showed that baboon lexical learning can bemodeled much more precisely by a two-layer wide learningnetwork It is also noteworthy that while for vision someform of hierarchical layering of increasingly specific receptivefields is well established [16] it is unclear whether similarneural organization characterizes auditory comprehensionand speech production

A third response is to take the ground-breaking resultsfrom machine learning as a reason for rethinking againstthe backdrop of linguistic domain knowledge and at thefunctional level the nature of the algorithms that underlielanguage learning and language processing

The present study presents the results of an ongoingresearch program that exemplifies the third kind of responsenarrowed down to the lexicon and focusing on those algo-rithms that play a central role in making possible the basicsof visual and auditory comprehension as well as speechproduction The model that we propose here brings togetherseveral strands of research across theoretical morphologypsychology and machine learning Our model can be seenas a computational implementation of paradigmatic analogyin word and paradigm morphology From psychology ourmodel inherits the insight that classes and categories areconstantly recalibrated as experience unfolds and that thisrecalibration can be captured to a considerable extent by verysimple principles of error-driven learning We adopted end-to-end modeling from machine learning but we kept ournetworks as simple as possible as the combination of smartfeatures and linear mappings is surprisingly powerful [17 18]Anticipating discussion of technical details we implementlinear networks (mathematically linear mappings) that arebased entirely on discrimination as learning mechanismand that work with large numbers of features at muchlower levels of representation than in current and classicalmodels

Section 2 provides the theoretical background for thisstudy and introduces the central ideas of the discriminativelexicon that are implemented in subsequent sections Sec-tion 3 introduces our operationalization of lexical semanticsSection 4 discusses visual and auditory comprehension andSection 5 presents how we approach the modeling of speechproductionThis is followed by a brief discussion of how timecan be brought into themodel (Section 6) In the final sectionwe discuss the implications of our results

2 Background

This section lays out some prerequisites that we will buildon in the remainder of this study We first introduce Wordand Paradigm morphology an approach to word structuredeveloped in theoretical morphology within the broadercontext of linguistics Word and Paradigm morphologyprovides the background for our highly critical evaluationof the morpheme as theoretical unit The many problemsassociated with morphemes motivated the morpheme-freecomputational model developed below Section 22 intro-duces the idea of vector representations for word meaningsas developed within computational linguistics and com-puter science Semantic vectors lie at the heart of how wemodel word meaning The next Section 23 explains howwe calculated the semantic vectors that we used in thisstudy Section 24 discusses naive discriminative learningand explains why when learning the relation between formvectors and semantic vectors it is advantageous to use lineardiscriminative learning instead of naive discriminative learn-ing The last subsection explains how linear discriminativelearning provides a mathematical formalization of the notionof proportional analogy that is central toWord and Paradigmmorphology

21 Word and Paradigm Morphology The dominant view ofthemental lexicon in psychology and cognitive science is wellrepresented by Zwitserlood [19 p 583]

Parsing and composition mdash for which there isample evidence from many languages mdash requiremorphemes to be stored in addition to infor-mation as to how morphemes are combinedor to whole-word representations specifying thecombination

Words are believed to be built from morphemes either byrules or by constructional schemata [20] and the meaningsof complexwords are assumed to be a compositional functionof the meanings of their parts

This perspective on word structure which has found itsway into almost all introductory textbooks on morphologyhas its roots in post-Bloomfieldian American structuralism[21] However many subsequent studies have found thisperspective to be inadequate [22] Beard [23] pointed outthat in language change morphological form and mor-phological meaning follow their own trajectories and thetheoretical construct of the morpheme as a minimal signcombining form and meaning therefore stands in the wayof understanding the temporal dynamics of language Beforehim Matthews [24 25] had pointed out that the inflectionalsystem of Latin is not well served by analyses positing thatits fusional system is best analyzed as underlying agglu-tinative (ie as a morphological system in which wordsconsist of sequences of morphemes as (approximately) inTurkish see also [26]) Matthews argued that words are thebasic units and that proportional analogies between wordswithin paradigms (eg walkwalks = talktalks) make thelexicon as a system productive By positing the word asbasic unit Word and Paradigm morphology avoids a central

Complexity 3

problem that confronts morpheme-based theories namelythat systematicities in form can exist without correspondingsystematicities in meaning Minimal variation in form canserve to distinguish words or to predict patterns of variationelsewhere in a paradigm or class One striking exampleis given by the locative cases in Estonian which expressmeanings that in English would be realized with locativeand directional prepositions Interestingly most plural caseendings in Estonian are built on the form of the partitivesingular (a grammatical case that can mark for instancethe direct object) However the semantics of these pluralcase forms do not express in any way the semantics of thesingular and the partitive For instance the form for thepartitive singular of the noun for lsquolegrsquo is jalga The formfor expressing lsquoon the legsrsquo jalgadele takes the form of thepartitive singular and adds the formatives for plural and thelocative case for lsquoonrsquo the so-called adessive (see [21 27]for further details) Matthews [28 p 92] characterized theadoption by Chomsky and Halle [29] of the morpheme aslsquoa remarkable tribute to the inertia of ideasrsquo and the samecharacterization pertains to the adoption of themorpheme bymainstream psychology and cognitive science In the presentstudy we adopt from Word and Paradigm morphology theinsight that the word is the basic unit of analysis Wherewe take Word and Paradigm morphology a step furtheris in how we operationalize proportional analogy As willbe laid out in more detail below we construe analogies asbeing system-wide rather than constrained to paradigms andour mathematical formalization better integrates semanticanalogies with formal analogies

22 Distributional Semantics The question that arises at thispoint is what word meanings are In this study we adoptthe approach laid out by Landauer and Dumais [30] andapproximate word meanings by means of semantic vectorsreferred to as embeddings in computational linguisticsWeaver [31] and Firth [32] noted that words with similardistributions tend to have similar meanings This intuitioncan be formalized by counting how often words co-occuracross documents or within some window of text In thisway a wordrsquos meaning comes to be represented by a vectorof reals and the semantic similarity between two words isevaluated by means of the similarities of their vectors Onesuch measure is the cosine similarity of the vectors a relatedmeasure is the Pearson correlation of the vectors Manyimplementations of the same general idea are available suchas hal [33] HiDEx [34] and word2vec [35] Below weprovide further detail on how we estimated semantic vectors

There are two ways in which semantic vectors can beconceptualized within the context of theories of the mentallexicon First semantic vectors could be fixed entities thatare stored in memory in a way reminiscent of a standardprinted or electronic dictionary the entries of which consistof a search key (a wordrsquos form) and a meaning specification(the information accessed through the search key) Thisconceptualization is very close to the currently prevalentway of thinking in psycholinguistics which has adopted aform of naive realism in which word meanings are typicallyassociated with monadic concepts (see eg [36ndash39]) It is

worth noting that when one replaces these monadic conceptsby semantic vectors the general organization of (paper andelectronic) dictionaries can still be retained resulting inresearch questions addressing the nature of the access keys(morphemes whole-words perhaps both) and the processof accessing these keys

However there is good reason to believe that wordmeanings are not fixed static representations The literatureon categorization indicates that the categories (or classes)that arise as we interact with our environment are constantlyrecalibrated [40ndash42] A particularly eloquent example isgiven by Marsolek [41] In a picture naming study hepresented subjects with sequences of two pictures He askedsubjects to say aloud the name of the second picture Thecritical manipulation concerned the similarity of the twopicturesWhen the first picture was very similar to the second(eg a grand piano and a table) responses were slowercompared to the control condition in which visual similaritywas reduced (orange and table) What we see at work hereis error-driven learning when understanding the picture ofa grand piano as signifying a grand piano features such ashaving a large flat surface are strengthened to the grand pianoandweakened to the table As a consequence interpreting thepicture of a table has become more difficult which in turnslows the word naming response

23 Discrimination Learning of Semantic Vectors What isrequired therefore is a conceptualization of word meaningsthat allows for the continuous recalibration of thesemeaningsas we interact with the world To this end we constructsemantic vectors using discrimination learning We take thesentences from a text corpus and for each sentence in theorder in which these sentences occur in the corpus train alinear network to predict the words in that sentence fromthe words in that sentence The training of the network isaccomplished with a simplified version of the learning ruleof Rescorla and Wagner [43] basically the learning rule ofWidrow and Hoff [44] We denote the connection strengthfrom input word 119894 to output word 119895 by 119908119894119895 Let 120575119894 denote anindicator variable that is 1 when input word 119894 is present insentence 119905 and zero otherwise Likewise let 120575119895 denotewhetheroutput word 119895 is present in the sentence Given a learning rate120588 (120588 ≪ 1) the change in the connection strength from (input)word 119894 to (output) word 119895 for sentence (corpus time) 119905 Δ 119894119895 isgiven by Δ 119894119895 = 120575119894120588 (120575119895 minus sum

119896

120575119896119908119896119895) (1)

where 120575119894 and 120575119895 vary from sentence to sentence dependingon which cues and outcomes are present in a sentence Given119899 distinct words an 119899 times 119899 network is updated incrementallyin this way The row vectors of the networkrsquos weight matrixS define wordsrsquo semantic vectors Below we return in moredetail to how we derived the semantic vectors used in thepresent study Importantly we now have semantic represen-tations that are not fixed but dynamic they are constantlyrecalibrated from sentence to sentence (the property ofsemantic vectors changing over time as experience unfolds

4 Complexity

is not unique to the present approach but is shared withother algorithms for constructing semantic vectors such asword2vec However time-variant semantic vectors are instark contrast to the monadic units representing meaningsin models such as proposed by Baayen et al [45] Taft [46]Levelt et al [37] The distributed representation for wordsrsquomeanings in the triangle model [47] were derived fromWordNet and hence are also time-invariant) Thus in whatfollows a wordrsquos meaning is defined as a semantic vectorthat is a function of time it is subject to continuous changeBy itself without context a wordrsquos semantic vector at time 119905reflects its lsquoentanglementrsquo with other words

Obviously it is extremely unlikely that our understandingof the classes and categories that we discriminate betweenis well approximated just by lexical cooccurrence statisticsHowever we will show that text-based semantic vectors aregood enough as a first approximation for implementing acomputationally tractable discriminative lexicon

Returning to the learning rule (1) we note that it iswell-known to capture important aspects of the dynamics ofdiscrimination learning [48 49] For instance it captures theblocking effect [43 50 51] as well as order effects in learning[52] The blocking effect is the finding that when one featurehas been learned to perfection adding a novel feature doesnot improve learning This follows from (1) which is that as119908119894119895 tends towards 1 Δ 119894119895 tends towards 0 A novel featureeven though by itself it is perfectly predictive is blocked fromdeveloping a strong connection weight of its own The ordereffect has been observed for eg the learning of words forcolors Given objects of fixed shape and size but varying colorand the corresponding color words it is essential that theobjects are presented to the learner before the color wordThis ensures that the objectrsquos properties become the featuresfor discriminating between the color words In this caseapplication of the learning rule (1) will result in a strongconnection between the color features of the object to theappropriate color word If the order is reversed the colorwords become features predicting object properties In thiscase application of (1) will result in weights from a color wordto object features that are proportional to the probabilities ofthe objectrsquos features in the training data

Given the dynamics of discriminative learning staticlexical entries are not useful Anticipating more detaileddiscussion below we argue that actually the whole dictio-nary metaphor of lexical access is misplaced and that incomprehensionmeanings are dynamically created from formand that in production forms are dynamically created frommeanings Importantly what forms andmeanings are createdwill vary from case to case as all learning is fundamentallyincremental and subject to continuous recalibration In orderto make the case that this is actually possible and computa-tionally feasible we first introduce the framework of naivediscriminative learning discuss its limitations and then layout how this approach can be enhanced to meet the goals ofthis study

24 Naive Discriminative Learning Naive discriminativelearning (ndl) is a computational framework grounded

in learning rule (1) that was inspired by prior work ondiscrimination learning (eg [52 53]) A model for readingcomplex words was introduced by Baayen et al [54] Thisstudy implemented a two-layer linear network with letterpairs as input features (henceforth cues) and units repre-senting meanings as output classes (henceforth outcomes)Following Word and Paradigm morphology this model doesnot include form representations for morphemes or wordforms It is an end-to-end model for the relation betweenform and meaning that sets itself the task to predict wordmeanings from sublexical orthographic features Baayen etal [54] were able to show that the extent to which cuesin the visual input support word meanings as gaugedby the activation of the corresponding outcomes in thenetwork reflects a wide range of phenomena reported inthe lexical processing literature including the effects offrequency of occurrence family size [55] relative entropy[56] phonaesthemes (nonmorphemic sounds with a con-sistent semantic contribution such as gl in glimmer glowand glisten) [57] and morpho-orthographic segmentation[58]

However the model of Baayen et al [54] adopted a starkform of naive realism albeit just for reasons of computationalconvenience A wordrsquos meaning was defined in terms of thepresence or absence of an outcome (120575119895 in (1)) Subsequentwork sought to address this shortcoming by developingsemantic vectors using learning rule (1) to predict wordsfrom words as explained above [58 59] The term ldquolexomerdquowas introduced as a technical term for the outcomes ina form-to-meaning network conceptualized as pointers tosemantic vectors

In this approach lexomes do double duty In a firstnetwork lexomes are the outcomes that the network istrained to discriminate between given visual input In asecond network the lexomes are the lsquoatomicrsquo units in acorpus that serve as the input for building a distributionalmodel with semantic vectors Within the theory unfoldedin this study the term lexome refers only to the elementsfrom which a semantic vector space is constructed Thedimension of this vector space is equal to the number oflexomes and each lexome is associated with its own semanticvector

It turns out however that mathematically a networkdiscriminating between lexomes given orthographic cues asin naive discriminative learning models is suboptimal Inorder to explain why the set-up of ndl is suboptimal weneed some basic concepts and notation from linear algebrasuch as matrix multiplication and matrix inversion Readersunfamiliar with these concepts are referred to Appendix Awhich provides an informal introduction

241 Limitations of Naive Discriminative Learning Math-ematically naive discrimination learning works with twomatrices a cue matrix C that specifies wordsrsquo form featuresand a matrix S that specifies the targeted lexomes [61] Weillustrate the cuematrix for four words aaa aab abb and abband their letter bigrams aa ab and bb A cell 119888119894119895 is 1 if word 119894has bigram 119895 and zero otherwise

Complexity 5

C =aa ab bb

aaaaababbabb

( 11000111

0011 ) (2)

The third and fourth words are homographs they shareexactly the same bigrams We next define a target matrix Sthat defines for each of the four words the correspondinglexome 120582 This is done by setting one bit to 1 and all otherbits to zero in the row vectors of this matrix

S =1205821 1205822 1205823 1205824

aaaaababbabb

( 10000100

00100001 ) (3)

The problem that arises at this point is that the word formsjointly set up a space with three dimensions whereas thetargeted lexomes set up a four-dimensional space This isproblematic for the following reason A linear mapping of alower-dimensional spaceA onto a higher-dimensional spaceB will result in a subspace of B with a dimensionality thatcannot be greater than that of A [62] If A is a space in R2

and B is a space in R3 a linear mapping of A onto B willresult in a plane inB All the points inB that are not on thisplane cannot be reached fromA with the linear mapping

As a consequence it is impossible for ndl to perfectlydiscriminate between the four lexomes (which set up a four-dimensional space) given their bigram cues (which jointlydefine a three-dimensional space) For the input bb thenetwork therefore splits its support equally over the lexomes1205823 and 1205824 The transformation matrix F is

F = C1015840S = 1205821 1205822 1205823 1205824aaabbb

( 1minus1101minus1

00050005 ) (4)

and the matrix with predicted vectors S is

S = CF =1205821 1205822 1205823 1205824

aaaaababbabb

( 10000100

000505000505 ) (5)

When a cuematrixC and a target matrix S are set up for largenumbers of words the dimension of the ndl cue space willbe substantially smaller than the space of S the dimensionof which is equal to the number of lexomes Thus in the set-up of naive discriminative learning we do take into accountthat words tend to have similar forms but we do not take into

account that words are also similar in meaning By settingup S as completely orthogonal we are therefore making itunnecessarily hard to relate form to meaning

This study therefore implements discrimination learningwith semantic vectors of reals replacing the one-bit-on rowvectors of the target matrix S By doing so we properlyreduce the dimensionality of the target space which in turnmakes discriminative learningmore accurateWewill refer tothis new implementation of discrimination learning as lineardiscriminative learning (ldl) instead of naive discriminativelearning as the outcomes are no longer assumed to beindependent and the networks aremathematically equivalentto linear mappings onto continuous target matrices

25 Linear Transformations and Proportional Analogy Acentral concept in word and paradigm morphology is thatthe form of a regular inflected form stands in a relationof proportional analogy to other words in the inflectionalsystem As explained by Matthews ([25] p 192f)

In effect we are predicting the inflections of servusby analogy with those of dominus As GenitiveSingular domini is to Nominative Singular domi-nus so x (unknown) must be to NominativeSingular servus What then is x Answer it mustbe servi In notation dominus domini = servusservi ([25] p 192f)

Here form variation is associated with lsquomorphosyntacticfeaturesrsquo Such features are often naively assumed to functionas proxies for a notion of lsquogrammatical meaningrsquo Howeverin word and paradigm morphology these features actu-ally represent something more like distribution classes Forexample the accusative singular forms of nouns belongingto different declensions will typically differ in form and beidentified as the lsquosamersquo case in different declensions by virtueof distributional parallels The similarity in meaning thenfollows from the distributional hypothesis which proposesthat linguistic items with similar distributions have similarmeanings [31 32] Thus the analogy of forms

dominus domini = servus servi (6)

is paralleled by an analogy of distributions 119889119889 (dominus) 119889 (domini) = 119889 (servus) 119889 (servi) (7)

Borrowing notational conventions of matrix algebra we canwrite

(dominusdominiservusservi

) sim (119889 (dominus)119889 (domini)119889 (servus)119889 (servi) ) (8)

In this study we operationalize the proportional analogy ofword and paradigmmorphology by replacing the word formsin (8) by vectors of features that are present or absent in aword and by replacing wordsrsquo distributions by semantic vec-tors Linear mappings between form and meaning formalize

6 Complexity

two distinct proportional analogies one analogy for goingfrom form to meaning and a second analogy for going frommeaning to form Consider for instance a semantic matrix Sand a form matrix C with rows representing words

S = (1198861 11988621198871 1198872)C = (1199011 11990121199021 1199022) (9)

The transformation matrix G that maps S onto C111988611198872 minus 11988621198871 ( 1198872 minus1198862minus1198871 1198861 )⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟Sminus1

(1199011 11990121199022 1199022)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟C= 111988611198872 minus 11988621198871 [( 11988721199011 11988721199012minus11988711199011 minus11988711199012) + (minus11988621199021 minus1198862119902211988611199021 11988611199022 )]⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

G

(10)

can be written as the sum of two matrices (both of whichare scaled by 11988611198872 minus 11988621198871) The first matrix takes the firstform vector of C and weights it by the elements of thesecond semantic vector The second matrix takes the secondform vector of C and weights this by the elements of thefirst semantic vector of S Thus with this linear mapping apredicted form vector is a semantically weighted mixture ofthe form vectors of C

The remainder of this paper is structured as follows Wefirst introduce the semantic vector space S that we will beusing which we derived from the tasa corpus [63 64] Nextwe show that we canmodel word comprehension bymeans ofa matrix (or network) F that transforms the cue row vectorsof C into the semantic row vectors of S ie CF = S We thenshow that we can model the production of word forms giventheir semantics by a transformation matrix G ie SG = CwhereC is a matrix specifying for each word which triphonesit contains As the network is informative only about whichtriphones are at issue for a word but remains silent abouttheir order an algorithm building on graph theory is usedto properly order the triphones All models are evaluated ontheir accuracy and are also tested against experimental data

In the general discussion we will reflect on the conse-quences for linguistic theory of our finding that it is indeedpossible to model the lexicon and lexical processing usingsystemic discriminative learning without requiring lsquohiddenunitsrsquo such as morphemes and stems

3 A Semantic Vector Space Derived fromthe TASA Corpus

The first part of this section describes how we constructedsemantic vectors Specific to our approach is that we con-structed semantic vectors not only for content words butalso for inflectional and derivational functions The semanticvectors for these functions are of especial importance forthe speech production component of the model The second

part of this section addresses the validation of the semanticvectors for which we used paired associate learning scoressemantic relatedness ratings and semantic plausibility andsemantic transparency ratings for derived words

31 Constructing the Semantic Vector Space The semanticvector space that we use in this study is derived from thetasa corpus [63 64] We worked with 752130 sentencesfrom this corpus to a total of 10719386 word tokens Weused the treetagger software [65] to obtain for eachword token its stem and a part of speech tag Compoundswritten with a space such as apple pie and jigsaw puzzlewere when listed in the celex lexical database [66] joinedinto one onomasiological unit Information about wordsrsquomorphological structure was retrieved from celex

In computational morphology several options have beenexplored for representing themeaning of affixesMitchell andLapata [67] proposed to model the meaning of an affix as avector that when added to the vector b of a base word givesthe vector d of the derived word They estimated this vectorby calculating d119894 minus b119894 for all available pairs 119894 of base andderived words and taking the average of the resulting vectorsLazaridou et al [68] andMarelli and Baroni [60]modeled thesemantics of affixes by means of matrix operations that takeb as input and produce d as output ie

d = Mb (11)

whereas Cotterell et al [69] constructed semantic vectorsas latent variables in a Gaussian graphical model Whatthese approaches have in common is that they derive orimpute semantic vectors given the semantic vectors of wordsproduced by algorithms such as word2vec algorithmswhichwork with (stemmed) words as input units

From the perspective of discriminative linguistics how-ever it does not make sense to derive one meaning fromanother Furthermore rather than letting writing conven-tions dictate what units are accepted as input to onersquosfavorite tool for constructing semantic vectors includingnot and again but excluding the prefixes un- in- and re-we included not only lexomes for content words but alsolexomes for prefixes and suffixes as units for constructinga semantic vector space As a result semantic vectors forprefixes and suffixes are obtained straightforwardly togetherwith semantic vectors for content words

To illustrate this procedure consider the sentence theboysrsquo happiness was great to observe Standard methods willapply stemming and remove stop words resulting in thelexomes boy happiness great and observe being theinput to the algorithm constructing semantic vectors fromlexical co-occurrences In our approach by contrast thelexomes considered are the boy happiness be greatto and observe and in addition plural ness past Stopwords are retained (although including function words mayseem counterproductive from the perspective of semanticvectors in natural language processing applications they areretained in the present approach for two reasons First sincein our model semantic vectors drive speech production inorder to model the production of function words semantic

Complexity 7

vectors for function words are required Second althoughthe semantic vectors of function words typically are lessinformative (they typically have small association strengthsto very large numbers of words) they still show structureas illustrated in Appendix C for pronouns and prepositions)inflectional endings are replaced by the inflectional functionsthey subserve and for derived words the semantic functionof the derivational suffix is identified as well In what followswe outline in more detail what considerations motivate thisway of constructing the input for our algorithm constructingsemantic vectors We note here that our method for handlingmorphologically complex words can be combined with anyof the currently available algorithms for building semanticvectors We also note here that we are not concerned withthe question of how lexomes for plurality or tense might beinduced from word forms from world knowledge or anycombination of the two All we assume is first that anyoneunderstanding the sentence the boysrsquo happiness was great toobserve knows that more than one boy is involved and thatthe narrator situates the event in the past Second we takethis understanding to drive learning

We distinguished the following seven inflectional func-tions comparative and superlative for adjectives sin-gular and plural for nouns and past perfective con-tinuous persistence (denotes the persistent relevance ofthe predicate in sentences such as London is a big city) andperson3 (third person singular) for verbs

The semantics of derived words tend to be characterizedby idiosyncracies [70] that can be accompanied by formalidiosyncracies (eg business and resound) but often areformally unremarkable (eg heady with meanings as diverseas intoxicating exhilerating impetuous and prudent) Wetherefore paired derived words with their own content lex-omes as well as with a lexome for the derivational functionexpressed in the derivedwordWe implemented the followingderivational lexomes ordinal for ordinal numbers notfor negative un and in undo for reversative un other fornonnegative in and its allomorphs ion for ation ution itionand ion ee for ee agent for agents with er instrumentfor instruments with er impagent for impersonal agentswith er causer for words with er expressing causers (thedifferentiation of different semantic functions for er wasbased on manual annotation of the list of forms with er theassignment of a given form to a category was not informedby sentence context and hence can be incorrect) again forre and ness ity ism ist ic able ive ous ize enceful ish under sub self over out mis and dis for thecorresponding prefixes and suffixes This set-up of lexomes isinformed by the literature on the semantics of English wordformation [71ndash73] and targets affixal semantics irrespective ofaffixal forms (following [74])

This way of setting up the lexomes for the semantic vectorspace model illustrates an important aspect of the presentapproach namely that lexomes target what is understoodand not particular linguistic forms Lexomes are not unitsof form For example in the study of Geeraert et al [75]the forms die pass away and kick the bucket are all linkedto the same lexome die In the present study we have notattempted to identify idioms and likewise no attempt was

made to disambiguate homographs These are targets forfurther research

In the present study we did implement the classicaldistinction between inflection and word formation by rep-resenting inflected words with a content lexome for theirstem but derived words with a content lexome for the derivedword itself In this way the sentence scientists believe thatexposure to shortwave ultraviolet rays can cause blindnessis associated with the following lexomes scientist plbelieve persistence that exposure sg to shortwaveultraviolet ray can present cause blindness andness In our model sentences constitute the learning eventsie for each sentence we train the model to predict thelexomes in that sentence from the very same lexomes inthat sentence Sentences or for spoken corpora utterancesare more ecologically valid than windows of say two wordspreceeding and two words following a target word mdash theinterpretation of a word may depend on the company thatit keeps at long distances in a sentence We also did notremove any function words contrary to standard practicein distributional semantics (see Appendix C for a heatmapillustrating the kind of clustering observable for pronounsand prepositions) The inclusion of semantic vectors for bothaffixes and function words is necessitated by the generalframework of our model which addresses not only com-prehension but also production and which in the case ofcomprehension needs to move beyond lexical identificationas inflectional functions play an important role during theintegration of words in sentence and discourse

Only words with a frequency exceeding 8 occurrences inthe tasa corpus were assigned lexomes This threshold wasimposed in order to avoid excessive data sparseness whenconstructing the distributional vector space The number ofdifferent lexomes that met the criterion for inclusion was23562

We constructed a semantic vector space by training anndl network on the tasa corpus using the ndl2 packagefor R [76] Weights on lexome-to-lexome connections wererecalibrated sentence by sentence in the order in which theyappear in the tasa corpus using the learning rule of ndl(ie a simplified version of the learning rule of Rescorlaand Wagner [43] that has only two free parameters themaximum amount of learning 120582 (set to 1) and a learning rate120588 (set to 0001) This resulted in a 23562 times 23562 matrixSentence-based training keeps the carbon footprint of themodel down as the number of learning events is restrictedto the number of utterances (approximately 750000) ratherthan the number of word tokens (approximately 10000000)The row vectors of the resulting matrix henceforth S are thesemantic vectors that we will use in the remainder of thisstudy (work is in progress to further enhance the S matrixby including word sense disambiguation and named entityrecognition when setting up lexomes The resulting lexomicversion of the TASA corpus and scripts (in python as wellas in R) for deriving the S matrix will be made available athttpwwwsfsuni-tuebingendesimhbaayen)

The weights on themain diagonal of thematrix tend to behigh unsurprisingly as in each sentence on which the modelis trained each of the words in that sentence is an excellent

8 Complexity

predictor of that sameword occurring in that sentenceWhenthe focus is on semantic similarity it is useful to set the maindiagonal of this matrix to zero However for more generalassociation strengths between words the diagonal elementsare informative and should be retained

Unlike standard models of distributional semantics wedid not carry out any dimension reduction using forinstance singular value decomposition Because we do notwork with latent variables each dimension of our semanticspace is given by a column vector of S and hence eachdimension is linked to a specific lexomeThus the rowvectorsof S specify for each of the lexomes how well this lexomediscriminates between all the lexomes known to the model

Although semantic vectors of length 23562 can providegood results vectors of this length can be challenging forstatistical evaluation Fortunately it turns out that many ofthe column vectors of S are characterized by extremely smallvariance Such columns can be removed from the matrixwithout loss of accuracy In practice we have found it sufficestoworkwith approximately 4 to 5 thousand columns selectedcalculating column variances and using only those columnswith a variance exceeding a threshold value

32 Validation of the Semantic Vector Space As shown byBaayen et al [59] and Milin et al [58 77] measures basedon matrices such as S are predictive for behavioral measuressuch as reaction times in the visual lexical decision taskas well as for self-paced reading latencies In what followswe first validate the semantic vectors of S on two data setsone data set with accuracies in paired associate learningand one dataset with semantic similarity ratings We thenconsidered specifically the validity of semantic vectors forinflectional and derivational functions by focusing first onthe correlational structure of the pertinent semantic vectorsfollowed by an examination of the predictivity of the semanticvectors for semantic plausibility and semantic transparencyratings for derived words

321 PairedAssociate Learning Thepaired associate learning(PAL) task is a widely used test in psychology for evaluatinglearning and memory Participants are given a list of wordpairs tomemorize Subsequently at testing they are given thefirst word and have to recall the second wordThe proportionof correctly recalled words is the accuracy measure on thePAL task Accuracy on the PAL test decreases with age whichhas been attributed to cognitive decline over the lifetimeHowever Ramscar et al [78] and Ramscar et al [79] provideevidence that the test actually measures the accumulationof lexical knowledge In what follows we use the data onPAL performance reported by desRosiers and Ivison [80]We fitted a linear mixed model to accuracy in the PAL taskas a function of the Pearson correlation 119903 of paired wordsrsquosemantic vectors in S (but with weights on the main diagonalincluded and using the 4275 columns with highest variance)with random intercepts for word pairs sex and age as controlvariables and crucially an interaction of 119903 by age Given thefindings of Ramscar and colleagues we expect to find that theslope of 119903 (which is always negative) as a predictor of PAL

00 01 02 03 04 05r

010

2030

4050

simila

rity

ratin

g

Figure 1 Similarity rating in the MEN dataset as a function of thecorrelation 119903 between the row vectors in S of the words in the testpairs

accuracy increases with age (indicating decreasing accuracy)Table 1 shows that this prediction is born out For age group20ndash29 (the reference level of age) the slope for 119903 is estimatedat 031 For the next age level this slope is adjusted upward by012 and as age increases these upward adjustments likewiseincrease to 021 024 and 032 for age groups 40ndash49 50-59 and 60-69 respectively Older participants know theirlanguage better and hence are more sensitive to the semanticsimilarity (or lack thereof) of thewords thatmake up PAL testpairs For the purposes of the present study the solid supportfor 119903 as a predictor for PAL accuracy contributes to validatingthe row vectors of S as semantic vectors

322 Semantic Relatedness Ratings We also examined per-formance of S on the MEN test collection [81] that providesfor 3000 word pairs crowd sourced ratings of semanticsimilarity For 2267 word pairs semantic vectors are availablein S for both words Figure 1 shows that there is a nonlinearrelation between the correlation of wordsrsquo semantic vectorsin S and the ratings The plot shows as well that for lowcorrelations the variability in the MEN ratings is larger Wefitted a Gaussian location scale additive model to this dataset summarized in Table 2 which supported 119903 as a predictorfor both mean MEN rating and the variability in the MENratings

To put this performance in perspective wecollected the latent semantic analysis (LSA) similarityscores for the MEN word pairs using the website athttplsacoloradoedu The Spearman correlationfor LSA scores andMEN ratings was 0697 and that for 119903 was0704 (both 119901 lt 00001) Thus our semantic vectors performon a par with those of LSA a well-established older techniquethat still enjoys wide use in psycholinguistics Undoubtedlyoptimized techniques from computational linguistics such asword2vec will outperform our model The important pointhere is that even with training on full sentences rather thanusing small windows and even when including function

Complexity 9

Table 1 Linear mixed model fit to paired associate learning scores with the correlation 119903 of the row vectors in S of the paired words aspredictor Treatment coding was used for factors For the youngest age group 119903 is not predictive but all other age groups show increasinglylarge slopes compared to the slope of the youngest age group The range of 119903 is [minus488 minus253] hence larger values for the coefficients of 119903 andits interactions imply worse performance on the PAL task

A parametric coefficients Estimate Std Error t-value p-valueintercept 36220 06822 53096 lt 00001119903 03058 01672 18293 00691age=39 02297 01869 12292 02207age=49 04665 01869 24964 00135age=59 06078 01869 32528 00014age=69 08029 01869 42970 lt 00001sex=male -01074 00230 -46638 lt 00001119903age=39 01167 00458 25490 00117119903age=49 02090 00458 45640 lt 00001119903age=59 02463 00458 53787 lt 00001119903age=69 03239 00458 70735 lt 00001B smooth terms edf Refdf F-value p-valuerandom intercepts word pair 178607 180000 1282283 lt 00001

Table 2 Summary of a Gaussian location scale additive model fitted to the similarity ratings in the MEN dataset with as predictor thecorrelation 119903 of wordrsquos semantic vectors in the Smatrix TPRS thin plate regression spline b theminimum standard deviation for the logb linkfunction Location parameter estimating the mean scale parameter estimating the variance through the logb link function (120578 = log(120590 minus 119887))A parametric coefficients Estimate Std Error t-value p-valueIntercept [location] 252990 01799 1406127 lt 00001Intercept [scale] 21297 00149 1430299 lt 00001B smooth terms edf Refdf F-value p-valueTPRS 119903 [location] 83749 88869 27662399 lt 00001TPRS 119903 [scale] 59333 70476 875384 lt 00001

words performance of our linear network with linguisticallymotivated lexomes is sufficiently high to serve as a basis forfurther analysis

323 Correlational Structure of Inflectional and DerivationalVectors Now that we have established that the semantic vec-tor space S obtainedwith perhaps the simplest possible error-driven learning algorithm discriminating between outcomesgiven multiple cues (1) indeed captures semantic similaritywe next consider how the semantic vectors of inflectional andderivational lexomes cluster in this space Figure 2 presents aheatmap for the correlation matrix of the functional lexomesthat we implemented as described in Section 31

Nominal and verbal inflectional lexomes as well as adver-bial ly cluster in the lower left corner and tend to be eithernot correlated with derivational lexomes (indicated by lightyellow) or to be negatively correlated (more reddish colors)Some inflectional lexomes however are interspersed withderivational lexomes For instance comparative superla-tive and perfective form a small subcluster togetherwithin the group of derivational lexomes

The derivational lexomes show for a majority of pairssmall positive correlationsand stronger correlations in thecase of the verb-forming derivational lexomes mis over

under and undo which among themselves show thevstrongest positive correlations of allThe inflectional lexomescreating abstract nouns ion ence ity and ness also form asubcluster In other words Figure 2 shows that there is somestructure to the distribution of inflectional and derivationallexomes in the semantic vector space

Derived words (but not inflected words) have their owncontent lexomes and hence their own row vectors in S Do thesemantic vectors of these words cluster according to the theirformatives To address this question we extracted the rowvectors from S for a total of 3500 derivedwords for 31 differentformatives resulting in a 3500 times 23562 matrix 119863 sliced outof S To this matrix we added the semantic vectors for the 31formatives We then used linear discriminant analysis (LDA)to predict a lexomersquos formative from its semantic vector usingthe lda function from the MASS package [82] (for LDA towork we had to remove columns from D with very smallvariance as a consequence the dimension of the matrix thatwas the input to LDA was 3531 times 814) LDA accuracy for thisclassification task with 31 possible outcomes was 072 All 31derivational lexomes were correctly classified

When the formatives are randomly permuted thus break-ing the relation between the semantic vectors and theirformatives LDA accuracy was on average 0528 (range across

10 Complexity

SGPR

ESEN

TPE

RSIS

TEN

CE PLCO

NTI

NU

OU

SPA

ST LYG

ENPE

RSO

N3

PASS

IVE

CAN

FUTU

REU

ND

OU

ND

ERO

VER MIS

ISH

SELF IZ

EIN

STRU

MEN

TD

IS IST

OU

GH

T EE ISM

SUB

COM

PARA

TIV

ESU

PERL

ATIV

EPE

RFEC

TIV

EPE

RSO

N1

ION

ENCE IT

YN

ESS Y

SHA

LLO

RDIN

AL

OU

TAG

AIN

FUL

LESS

ABL

EAG

ENT

MEN

T ICO

US

NO

TIV

E

SGPRESENTPERSISTENCEPLCONTINUOUSPASTLYGENPERSON3PASSIVECANFUTUREUNDOUNDEROVERMISISHSELFIZEINSTRUMENTDISISTOUGHTEEISMSUBCOMPARATIVESUPERLATIVEPERFECTIVEPERSON1IONENCEITYNESSYSHALLORDINALOUTAGAINFULLESSABLEAGENTMENTICOUSNOTIVE

minus02 0 01 02Value

020

060

010

00Color Key

and Histogram

Cou

nt

Figure 2 Heatmap for the correlation matrix of the row vectors in S of derivational and inflectional function lexomes (individual words willbecome legible by zooming in on the figure at maximal magnification)

10 permutations 0519ndash0537) From this we conclude thatderived words show more clustering in the semantic space ofS than can be expected under randomness

How are the semantic vectors of derivational lexomespositioned with respect to the clusters of their derived wordsTo address this question we constructed heatmaps for thecorrelation matrices of the pertinent semantic vectors Anexample of such a heatmap is presented for ness in Figure 3Apart from the existence of clusters within the cluster of nesscontent lexomes it is striking that the ness lexome itself isfound at the very left edge of the dendrogram and at the very

left column and bottom row of the heatmapThe color codingindicates that surprisingly the ness derivational lexome isnegatively correlated with almost all content lexomes thathave ness as formative Thus the semantic vector of nessis not a prototype at the center of its cloud of exemplarsbut an antiprototype This vector is close to the cloud ofsemantic vectors but it is outside its periphery This patternis not specific to ness but is found for the other derivationallexomes as well It is intrinsic to our model

The reason for this is straightforward During learningalthough for a derived wordrsquos lexome 119894 and its derivational

Complexity 11

NES

Ssle

eple

ssne

ssas

sert

iven

ess

dire

ctne

ssso

undn

ess

bign

ess

perm

issiv

enes

sap

prop

riate

ness

prom

ptne

sssh

rew

dnes

sw

eakn

ess

dark

ness

wild

erne

ssbl

ackn

ess

good

ness

kind

ness

lone

lines

sfo

rgiv

enes

ssti

llnes

sco

nsci

ousn

ess

happ

ines

ssic

knes

saw

aren

ess

blin

dnes

sbr

ight

ness

wild

ness

mad

ness

grea

tnes

sso

ftnes

sth

ickn

ess

tend

erne

ssbi

ttern

ess

effec

tiven

ess

hard

ness

empt

ines

sw

illin

gnes

sfo

olish

ness

eage

rnes

sne

atne

sspo

liten

ess

fresh

ness

high

ness

nerv

ousn

ess

wet

ness

unea

sines

slo

udne

ssco

olne

ssid

lene

ssdr

ynes

sbl

uene

ssda

mpn

ess

fond

ness

liken

ess

clean

lines

ssa

dnes

ssw

eetn

ess

read

ines

scle

vern

ess

noth

ingn

ess

cold

ness

unha

ppin

ess

serio

usne

ssal

ertn

ess

light

ness

uglin

ess

flatn

ess

aggr

essiv

enes

sse

lfminusco

nsci

ousn

ess

ruth

lessn

ess

quie

tnes

sva

guen

ess

shor

tnes

sbu

sines

sho

lines

sop

enne

ssfit

ness

chee

rful

ness

earn

estn

ess

right

eous

ness

unpl

easa

ntne

ssqu

ickn

ess

corr

ectn

ess

disti

nctiv

enes

sun

cons

ciou

snes

sfie

rcen

ess

orde

rline

ssw

hite

ness

talln

ess

heav

ines

sw

icke

dnes

sto

ughn

ess

lawle

ssne

ssla

zine

ssnu

mbn

ess

sore

ness

toge

ther

ness

resp

onsiv

enes

scle

arne

ssne

arne

ssde

afne

ssill

ness

dizz

ines

sse

lfish

ness

fatn

ess

rest

lessn

ess

shyn

ess

richn

ess

thor

ough

ness

care

less

ness

tired

ness

fairn

ess

wea

rines

stig

htne

ssge

ntle

ness

uniq

uene

ssfr

iend

lines

sva

stnes

sco

mpl

eten

ess

than

kful

ness

slow

ness

drun

kenn

ess

wre

tche

dnes

sfr

ankn

ess

exac

tnes

she

lples

snes

sat

trac

tiven

ess

fulln

ess

roug

hnes

sm

eann

ess

hope

lessn

ess

drow

sines

ssti

ffnes

sclo

sene

ssgl

adne

ssus

eful

ness

firm

ness

rude

ness

bold

ness

shar

pnes

sdu

llnes

sw

eigh

tless

ness

stra

ngen

ess

smoo

thne

ss

NESSsleeplessnessassertivenessdirectnesssoundnessbignesspermissivenessappropriatenesspromptnessshrewdnessweaknessdarknesswildernessblacknessgoodnesskindnesslonelinessforgivenessstillnessconsciousnesshappinesssicknessawarenessblindnessbrightnesswildnessmadnessgreatnesssoftnessthicknesstendernessbitternesseffectivenesshardnessemptinesswillingnessfoolishnesseagernessneatnesspolitenessfreshnesshighnessnervousnesswetnessuneasinessloudnesscoolnessidlenessdrynessbluenessdampnessfondnesslikenesscleanlinesssadnesssweetnessreadinessclevernessnothingnesscoldnessunhappinessseriousnessalertnesslightnessuglinessflatnessaggressivenessselfminusconsciousnessruthlessnessquietnessvaguenessshortnessbusinessholinessopennessfitnesscheerfulnessearnestnessrighteousnessunpleasantnessquicknesscorrectnessdistinctivenessunconsciousnessfiercenessorderlinesswhitenesstallnessheavinesswickednesstoughnesslawlessnesslazinessnumbnesssorenesstogethernessresponsivenessclearnessnearnessdeafnessillnessdizzinessselfishnessfatnessrestlessnessshynessrichnessthoroughnesscarelessnesstirednessfairnesswearinesstightnessgentlenessuniquenessfriendlinessvastnesscompletenessthankfulnessslownessdrunkennesswretchednessfranknessexactnesshelplessnessattractivenessfullnessroughnessmeannesshopelessnessdrowsinessstiffnessclosenessgladnessusefulnessfirmnessrudenessboldnesssharpnessdullnessweightlessnessstrangenesssmoothness

Color Keyand Histogram

0 05minus05

Value

020

0040

0060

0080

00C

ount

Figure 3 Heatmap for the correlation matrix of lexomes for content words with ness as well as the derivational lexome of ness itself Thislexome is found at the very left edge of the dendrograms and is negatively correlated with almost all content lexomes (Individual words willbecome legible by zooming in on the figure at maximal magnification)

lexome ness are co-present cues the derivational lexomeoccurs in many other words 119895 and each time anotherword 119895 is encountered weights are reduced from ness to119894 As this happens for all content lexomes the derivationallexome is during learning slowly but steadily discriminatedaway from its content lexomes We shall see that this is animportant property for our model to capture morphologicalproductivity for comprehension and speech production

When the additive model of Mitchell and Lapata [67]is used to construct a semantic vector for ness ie when

the average vector is computed for the vectors obtained bysubtracting the vector of the derived word from that of thebase word the result is a vector that is embedded insidethe cluster of derived vectors and hence inherits semanticidiosyncracies from all these derived words

324 Semantic Plausibility and Transparency Ratings forDerived Words In order to obtain further evidence for thevalidity of inflectional and derivational lexomes we re-analyzed the semantic plausibility judgements for word pairs

12 Complexity

consisting of a base and a novel derived form (eg accentaccentable) reported byMarelli and Baroni [60] and availableat httpcliccimecunitnitcomposesFRACSSThe number of pairs available to us for analysis 236 wasrestricted compared to the original 2559 pairs due to theconstraints that the base words had to occur in the tasacorpus Furthermore we also explored the role of wordsrsquoemotional valence arousal and dominance as availablein Warriner et al [83] which restricted the number ofitems even further The reason for doing so is that humanjudgements such as those of age of acquisition may reflectdimensions of emotion [59]

A measure derived from the semantic vectors of thederivational lexomes activation diversity and the L1-normof the semantic vector (the sum of the absolute values of thevector elements ie its city-block distance) turned out tobe predictive for these plausibility ratings as documented inTable 3 and the left panel of Figure 4 Activation diversity is ameasure of lexicality [58] In an auditory word identificationtask for instance speech that gives rise to a low activationdiversity elicits fast rejections whereas speech that generateshigh activation diversity elicits higher acceptance rates but atthe cost of longer response times [18]

Activation diversity interacted with word length Agreater word length had a strong positive effect on ratedplausibility but this effect progressively weakened as acti-vation diversity increases In turn activation diversity hada positive effect on plausibility for shorter words and anegative effect for longer words Apparently the evidencefor lexicality that comes with higher L1-norms contributespositively when there is little evidence coming from wordlength As word length increases and the relative contributionof the formative in theword decreases the greater uncertaintythat comes with higher lexicality (a greater L1-norm impliesmore strong links to many other lexomes) has a detrimentaleffect on the ratings Higher arousal scores also contributedto higher perceived plausibility Valence and dominance werenot predictive and were therefore left out of the specificationof themodel reported here (word frequency was not includedas predictor because all derived words are neologisms withzero frequency addition of base frequency did not improvemodel fit 119901 gt 091)

The degree to which a complex word is semanticallytransparent with respect to its base word is of both practi-cal and theoretical interest An evaluation of the semantictransparency of complex words using transformation matri-ces is developed in Marelli and Baroni [60] Within thepresent framework semantic transparency can be examinedstraightforwardly by comparing the correlations of (i) thesemantic vectors of the base word to which the semanticvector of the affix is added with (ii) the semantic vector ofthe derived word itself The more distant the two vectors arethe more negative their correlation should be This is exactlywhat we find For ness for instance the six most negativecorrelations are business (119903 = minus066) effectiveness (119903 = minus051)awareness (119903 = minus050) loneliness (119903 = minus045) sickness (119903 =minus044) and consciousness (119903 = minus043) Although ness canhave an anaphoric function in discourse [84] words such asbusiness and consciousness have a much deeper and richer

semantics than just reference to a previously mentioned stateof affairs A simple comparison of the wordrsquos actual semanticvector in S (derived from the TASA corpus) and its semanticvector obtained by accumulating evidence over base and affix(ie summing the vectors of base and affix) brings this outstraightforwardly

However evaluating correlations for transparency isprone to researcher bias We therefore also investigated towhat extent the semantic transparency ratings collected byLazaridou et al [68] can be predicted from the activationdiversity of the semantic vector of the derivational lexomeThe original dataset of Lazaridou and colleagues comprises900 words of which 330 meet the criterion of occurringmore than 8 times in the tasa corpus and for which wehave semantic vectors available The summary of a Gaussianlocation-scale additivemodel is given in Table 4 and the rightpanel of Figure 4 visualizes the interaction of word lengthby activation diversity Although modulated by word lengthoverall transparency ratings decrease as activation diversityis increased Apparently the stronger the connections aderivational lexome has with other lexomes the less clearit becomes what its actual semantic contribution to a novelderived word is In other words under semantic uncertaintytransparency ratings decrease

4 Comprehension

Now that we have established that the present semanticvectors make sense even though they are based on a smallcorpus we next consider a comprehension model that hasform vectors as input and semantic vectors as output (a pack-age for R implementing the comprehension and productionalgorithms of linear discriminative learning is available athttpwwwsfsuni-tuebingendesimhbaayenpublicationsWpmWithLdl 10targz The package isdescribed in Baayen et al [85]) We begin with introducingthe central concepts underlying mappings from form tomeaning We then discuss visual comprehension and thenturn to auditory comprehension

41 Setting Up the Mapping Let C denote the cue matrix amatrix that specifies for each word (rows) the form cues ofthat word (columns) For a toy lexicon with the words onetwo and three the Cmatrix is

C

= wV wVn Vn tu tu Tr Tri rionetwothree

( 100100

100010

010001

001001 ) (12)

where we use the disc keyboard phonetic alphabet (theldquoDIstinct SingleCharacterrdquo representationwith one characterfor each phoneme was introduced by CELEX [86]) fortriphones the symbol denotes the word boundary Supposethat the semantic vectors for these words are the row vectors

Complexity 13

26 27

28

29

29

3 3

31

31

32

32

33

33

34

34

35

35

3

3

35

4

45 36

38

40

42

44

Activ

atio

n D

iver

sity

of th

e Der

ivat

iona

l Lex

ome

8 10 12 14 16 186Word Length

36

38

40

42

44

Activ

atio

n D

iver

sity

of th

e Der

ivat

iona

l Lex

ome

6 8 10 12 144Word Length

Figure 4 Interaction of word length by activation diversity in GAMs fitted to plausibility ratings for complex words (left) and to semantictransparency ratings (right)

Table 3 GAM fitted to plausibility ratings for derivational neologisms with the derivational lexomes able again agent ist less and notdata from Marelli and Baroni [60]

A parametric coefficients Estimate Std Error t-value p-valueIntercept -18072 27560 -06557 05127Word Length 05901 02113 27931 00057Activation Diversity 11221 06747 16632 00976Arousal 03847 01521 25295 00121Word Length times Activation Diversity -01318 00500 -26369 00089B smooth terms edf Refdf F-value p-valueRandom Intercepts Affix 33610 40000 556723 lt 00001

of the following matrix S

S = one two threeonetwothree

( 100201031001

040110 ) (13)

We are interested in a transformation matrix 119865 such that

CF = S (14)

The transformation matrix is straightforward to obtain LetC1015840 denote the Moore-Penrose generalized inverse of Cavailable in R as the ginv function in theMASS package [82]Then

F = C1015840S (15)

For the present example

F =one two three

wVwVnVntutuTrTriri

((((((((((((

033033033010010003003003

010010010050050003003003

013013013005005033033033

))))))))))))

(16)

and for this simple example CF is exactly equal to SIn the remainder of this section we investigate how well

this very simple end-to-end model performs for visual wordrecognition as well as auditory word recognition For visualword recognition we use the semantic matrix S developed inSection 3 but we consider two different cue matrices C oneusing letter trigrams (following [58]) and one using phonetrigramsA comparison of the performance of the twomodels

14 Complexity

Table 4 Gaussian location-scale additive model fitted to the semantic transparency ratings for derived words te tensor product smooth ands thin plate regression spline

A parametric coefficients Estimate Std Error t-value p-valueintercept [location] 55016 02564 214537 lt 00001intercept [scale] -04135 00401 -103060 lt 00001B smooth terms edf Refdf F-value p-valuete(activation diversity word length) [location] 53879 62897 237798 00008s(derivational lexome) [location] 143180 150000 3806254 lt 00001s(activation diversity) [scale] 21282 25993 284336 lt 00001s(word length) [scale] 17722 21403 1327311 lt 00001

sheds light on the role of wordsrsquo ldquosound imagerdquo on wordrecognition in reading (cf [87ndash89] for phonological effectsin visual word Recognition) For auditory word recognitionwe make use of the acoustic features developed in Arnold etal [18]

Although the features that we selected as cues are basedon domain knowledge they do not have an ontological statusin our framework in contrast to units such as phonemes andmorphemes in standard linguistic theories In these theoriesphonemes and morphemes are Russelian atomic units offormal phonological and syntactic calculi and considerableresearch has been directed towards showing that these unitsare psychologically real For instance speech errors involvingmorphemes have been interpreted as clear evidence for theexistence in the mind of morphemes [90] Research hasbeen directed at finding behavioral and neural correlates ofphonemes and morphemes [6 91 92] ignoring fundamentalcriticisms of both the phoneme and the morpheme astheoretical constructs [25 93 94]

The features that we use for representing aspects of formare heuristic features that have been selected or developedprimarily because they work well as discriminators We willgladly exchange the present features for other features if theseother features can be shown to afford higher discriminativeaccuracy Nevertheless the features that we use are groundedin domain knowledge For instance the letter trigrams usedfor modeling reading (see eg[58]) are motivated by thefinding in stylometry that letter trigrams are outstandingfeatures for discriminating between authorial hands [95] (inwhat follows we work with letter triplets and triphoneswhich are basically contextually enriched letter and phoneunits For languages with strong phonotactic restrictionssuch that the syllable inventory is quite small (eg Viet-namese) digraphs work appear to work better than trigraphs[96] Baayen et al [85] show that working with four-gramsmay enhance performance and current work in progress onmany other languages shows that for highly inflecting lan-guages with long words 4-grams may outperform 3-gramsFor computational simplicity we have not experimented withmixtures of units of different length nor with algorithmswithwhich such units might be determined) Letter n-grams arealso posited by Cohen and Dehaene [97] for the visual wordform system at the higher endof a hierarchy of neurons tunedto increasingly large visual features of words

Triphones the units that we use for representing thelsquoacoustic imagersquo or lsquoauditory verbal imageryrsquo of canonicalword forms have the same discriminative potential as lettertriplets but have as additional advantage that they do betterjustice compared to phonemes to phonetic contextual inter-dependencies such as plosives being differentiated primarilyby formant transitions in adjacent vowels The acousticfeatures that we use for modeling auditory comprehensionto be introduced in more detail below are motivated in partby the sensitivity of specific areas on the cochlea to differentfrequencies in the speech signal

Evaluation of model predictions proceeds by comparingthe predicted semantic vector s obtained by multiplying anobserved cue vector c with the transformation matrix F (s =cF) with the corresponding target row vector s of S A word 119894is counted as correctly recognized when s119894 is most stronglycorrelated with the target semantic vector s119894 of all targetvectors s119895 across all words 119895

Inflected words do not have their own semantic vectors inS We therefore created semantic vectors for inflected wordsby adding the semantic vectors of stem and affix and addedthese as additional row vectors to S before calculating thetransformation matrix F

We note here that there is only one transformationmatrix ie one discrimination network that covers allaffixes inflectional and derivational as well as derived andmonolexomic (simple) words This approach contrasts withthat of Marelli and Baroni [60] who pair every affix with itsown transformation matrix

42 Visual Comprehension For visual comprehension wefirst consider a model straightforwardly mapping form vec-tors onto semantic vectors We then expand the model withan indirect route first mapping form vectors onto the vectorsfor the acoustic image (derived from wordsrsquo triphone repre-sentations) and thenmapping the acoustic image vectors ontothe semantic vectors

The dataset on which we evaluated our models com-prised 3987 monolexomic English words 6595 inflectedvariants of monolexomic words and 898 derived words withmonolexomic base words to a total of 11480 words Thesecounts follow from the simultaneous constraints of (i) a wordappearing with sufficient frequency in TASA (ii) the wordbeing available in the British Lexicon Project (BLP [98]) (iii)

Complexity 15

the word being available in the CELEX database and theword being of the abovementioned morphological type Inthe present study we did not include inflected variants ofderived words nor did we include compounds

421 The Direct Route Straight from Orthography to Seman-tics To model single word reading as gauged by the visuallexical decision task we used letter trigrams as cues In ourdataset there were a total of 3465 unique trigrams resultingin a 11480 times 3465 orthographic cue matrix C The semanticvectors of the monolexomic and derived words were takenfrom the semantic weight matrix described in Section 3 Forinflected words semantic vectors were obtained by summa-tion of the semantic vectors of base words and inflectionalfunctions For the resulting semantic matrix S we retainedthe 5030 column vectors with the highest variances settingthe cutoff value for the minimal variance to 034times10minus7 FromS and C we derived the transformation matrix F which weused to obtain estimates s of the semantic vectors s in S

For 59 of the words s had the highest correlation withthe targeted semantic vector s (for the correctly predictedwords the mean correlation was 083 and the median was086 With respect to the incorrectly predicted words themean and median correlation were both 048) The accuracyobtained with naive discriminative learning using orthog-onal lexomes as outcomes instead of semantic vectors was27Thus as expected performance of linear discriminationlearning (ldl) is substantially better than that of naivediscriminative learning (ndl)

To assess whether this model is productive in the sensethat it canmake sense of novel complex words we considereda separate set of inflected and derived words which were notincluded in the original dataset For an unseen complexwordboth base word and the inflectional or derivational functionappeared in the training set but not the complex word itselfThe network F therefore was presented with a novel formvector which it mapped onto a novel vector in the semanticspace Recognition was successful if this novel vector wasmore strongly correlated with the semantic vector obtainedby summing the semantic vectors of base and inflectional orderivational function than with any other semantic vector

Of 553 unseen inflected words 43 were recognized suc-cessfully The semantic vectors predicted from their trigramsby the network F were overall well correlated in the meanwith the targeted semantic vectors of the novel inflectedwords (obtained by summing the semantic vectors of theirbase and inflectional function) 119903 = 067 119901 lt 00001The predicted semantic vectors also correlated well with thesemantic vectors of the inflectional functions (119903 = 061 119901 lt00001) For the base words the mean correlation dropped to119903 = 028 (119901 lt 00001)

For unseen derivedwords (514 in total) we also calculatedthe predicted semantic vectors from their trigram vectorsThe resulting semantic vectors s had moderate positivecorrelations with the semantic vectors of their base words(119903 = 040 119901 lt 00001) but negative correlations withthe semantic vectors of their derivational functions (119903 =minus013 119901 lt 00001) They did not correlate with the semantic

vectors obtained by summation of the semantic vectors ofbase and affix (119903 = 001 119901 = 056)

The reduced correlations for the derived words as com-pared to those for the inflected words likely reflects tosome extent that the model was trained on many moreinflected words (in all 6595) than derived words (898)whereas the number of different inflectional functions (7)was much reduced compared to the number of derivationalfunctions (24) However the negative correlations of thederivational semantic vectors with the semantic vectors oftheir derivational functions fit well with the observation inSection 323 that derivational functions are antiprototypesthat enter into negative correlations with the semantic vectorsof the corresponding derived words We suspect that theabsence of a correlation of the predicted vectors with thesemantic vectors obtained by integrating over the vectorsof base and derivational function is due to the semanticidiosyncracies that are typical for derivedwords For instanceausterity can denote harsh discipline or simplicity or apolicy of deficit cutting or harshness to the taste Anda worker is not just someone who happens to work butsomeone earning wages or a nonreproductive bee or waspor a thread in a computer program Thus even within theusages of one and the same derived word there is a lotof semantic heterogeneity that stands in stark contrast tothe straightforward and uniform interpretation of inflectedwords

422 The Indirect Route from Orthography via Phonology toSemantics Although reading starts with orthographic inputit has been shown that phonology actually plays a role duringthe reading process as well For instance developmentalstudies indicate that childrenrsquos reading development can bepredicted by their phonological abilities [87 99] Evidence ofthe influence of phonology on adultsrsquo reading has also beenreported [89 100ndash105] As pointed out by Perrone-Bertolottiet al [106] silent reading often involves an imagery speechcomponent we hear our own ldquoinner voicerdquo while readingSince written words produce a vivid auditory experiencealmost effortlessly they argue that auditory verbal imageryshould be part of any neurocognitive model of readingInterestingly in a study using intracranial EEG recordingswith four epileptic neurosurgical patients they observed thatsilent reading elicits auditory processing in the absence ofany auditory stimulation suggesting that auditory images arespontaneously evoked during reading (see also [107])

To explore the role of auditory images in silent readingwe operationalized sound images bymeans of phone trigrams(in our model phone trigrams are independently motivatedas intermediate targets of speech production see Section 5for further discussion) We therefore ran the model againwith phone trigrams instead of letter triplets The semanticmatrix S was exactly the same as above However a newtransformation matrix F was obtained for mapping a 11480times 5929 cue matrix C of phone trigrams onto S To our sur-prise the triphone model outperformed the trigram modelsubstantially Overall accuracy rose to 78 an improvementof almost 20

16 Complexity

In order to integrate this finding into our model weimplemented an additional network K that maps ortho-graphic vectors (coding the presence or absence of trigrams)onto phonological vectors (indicating the presence or absenceof triphones) The result is a dual route model for (silent)reading with a first route utilizing a network mappingstraight from orthography to semantics and a secondindirect route utilizing two networks one mapping fromtrigrams to triphones and the other subsequently mappingfrom triphones to semantics

The network K which maps trigram vectors onto tri-phone vectors provides good support for the relevant tri-phones but does not provide guidance as to their orderingUsing a graph-based algorithm detailed in the Appendix (seealso Section 52) we found that for 92 of the words thecorrect sequence of triphones jointly spanning the wordrsquossequence of letters received maximal support

We then constructed amatrix Twith the triphone vectorspredicted from the trigrams by networkK and defined a newnetwork H mapping these predicted triphone vectors ontothe semantic vectors SThemean correlation for the correctlyrecognized words was 090 and the median correlation was094 For words that were not recognized correctly the meanand median correlation were 045 and 043 respectively Wenow have for each word two estimated semantic vectors avector s1 obtained with network F of the first route and avector s2 obtained with network H from the auditory targetsthat themselves were predicted by the orthographic cuesThemean of the correlations of these pairs of vectors was 119903 = 073(119901 lt 00001)

To assess to what extent the networks that we definedare informative for actual visual word recognition we madeuse of the reaction times in the visual lexical decision taskavailable in the BLPAll 11480words in the current dataset arealso included in BLPWe derived threemeasures from each ofthe two networks

The first measure is the sum of the L1-norms of s1 ands2 to which we will refer as a wordrsquos total activation diversityThe total activation diversity is an estimate of the support fora wordrsquos lexicality as provided by the two routes The secondmeasure is the correlation of s1 and s2 to which we will referas a wordrsquos route congruency The third measure is the L1-norm of a lexomersquos column vector in S to which we referas its prior This last measure has previously been observedto be a strong predictor for lexical decision latencies [59] Awordrsquos prior is an estimate of a wordrsquos entrenchment and prioravailability

We fitted a generalized additive model to the inverse-transformed response latencies in the BLP with these threemeasures as key covariates of interest including as controlpredictors word length andword type (derived inflected andmonolexomic with derived as reference level) As shown inTable 5 inflected words were responded to more slowly thanderived words and the same holds for monolexomic wordsalbeit to a lesser extent Response latencies also increasedwith length Total activation diversity revealed a U-shapedeffect (Figure 5 left panel) For all but the lowest totalactivation diversities we find that response times increase as

total activation diversity increases This result is consistentwith previous findings for auditory word recognition [18]As expected given the results of Baayen et al [59] the priorwas a strong predictor with larger priors affording shorterresponse times There was a small effect of route congruencywhich emerged in a nonlinear interaction with the priorsuch that for large priors a greater route congruency affordedfurther reduction in response time (Figure 5 right panel)Apparently when the two routes converge on the samesemantic vector uncertainty is reduced and a faster responsecan be initiated

For comparison the dual-route model was implementedwith ndl as well Three measures were derived from thenetworks and used to predict the same response latenciesThese measures include activation activation diversity andprior all of which have been reported to be reliable predictorsfor RT in visual lexical decision [54 58 59] However sincehere we are dealing with two routes the three measures canbe independently derived from both routes We thereforesummed up the measures of the two routes obtaining threemeasures total activation total activation diversity and totalprior With word type and word length as control factorsTable 6 shows that thesemeasures participated in a three-wayinteraction presented in Figure 6 Total activation showed aU-shaped effect on RT that is increasingly attenuated as totalactivation diversity is increased (left panel) Total activationalso interacted with the total prior (right panel) For mediumrange values of total activation RTs increased with totalactivation and as expected decreased with the total priorThecenter panel shows that RTs decrease with total prior formostof the range of total activation diversity and that the effect oftotal activation diversity changes sign going from low to highvalues of the total prior

Model comparison revealed that the generalized additivemodel with ldl measures (Table 5) provides a substantiallyimproved fit to the data compared to the GAM using ndlmeasures (Table 6) with a difference of no less than 65101AIC units At the same time the GAM based on ldl is lesscomplex

Given the substantially better predictability of ldl mea-sures on human behavioral data one remaining question iswhether it is really the case that two routes are involved insilent reading After all the superior accuracy of the secondroute might be an artefact of a simple machine learningtechnique performing better for triphones than for trigramsThis question can be addressed by examining whether thefit of the GAM summarized in Table 5 improves or worsensdepending on whether the activation diversity of the first orthe second route is taken out of commission

When the GAM is provided access to just the activationdiversity of s2 the AIC increased by 10059 units Howeverwhen themodel is based only on the activation diversity of s1themodel fit increased by no less than 14790 AIC units Fromthis we conclude that at least for the visual lexical decisionlatencies in the BLP the second route first mapping trigramsto triphones and then mapping triphones onto semanticvectors plays the more important role

The superiority of the second route may in part bedue to the number of triphone features being larger

Complexity 17

minus0295

minus00175

026

part

ial e

ffect

1 2 3 4 50Total activation diversity

000

005

010

015

Part

ial e

ffect

05

10

15

20

Prio

r

02 04 06 08 1000Selfminuscorrelation

Figure 5 The partial effects of total activation diversity (left) and the interaction of route congruency and prior (right) on RT in the BritishLexicon Project

Table 5 Summary of a generalized additive model fitted to response latencies in visual lexical decision using measures based on ldl s thinplate regression spline smooth te tensor product smooth

A parametric coefficients Estimate Std Error t-value p-valueintercept -17774 00087 -205414 lt0001word typeInflected 01110 00059 18742 lt0001word typeMonomorphemic 00233 00054 4322 lt0001word length 00117 00011 10981 lt0001B smooth terms edf Refdf F p-values(total activation diversity) 6002 7222 236 lt0001te(route congruency prior) 14673 17850 2137 lt0001

02 06 10 14

NDL total activation

minus0171

0031

0233

part

ial e

ffect

minus4 minus2 0 2 4

10

15

20

25

30

35

40

NDL total prior

ND

L to

tal a

ctiv

atio

n di

vers

ity

minus0416

01915

0799

part

ial e

ffect

02 06 10 14

minus4

minus2

02

4

NDL total activation

ND

L to

tal p

rior

minus0158

01635

0485

part

ial e

ffect

10

15

20

25

30

35

40

ND

L to

tal a

ctiv

atio

n di

vers

ity

Figure 6The interaction of total activation and total activation diversity (left) of total prior by total activation diversity (center) and of totalactivation by total prior (right) on RT in the British Lexicon Project based on ndl

18 Complexity

Table 6 Summary of a generalized additive model fitted to response latencies in visual lexical decision using measures from ndl te tensorproduct smooth

A parametric coefficients Estimate Std Error t-value p-valueintercept -16934 00086 -195933 lt0001word typeInflected 00286 00052 5486 lt0001word typeMonomorphemic 00042 00055 0770 = 4word length 00066 00011 5893 lt0001B smooth terms edf Refdf F p-valuete(activation activation diversity prior) 4134 4972 1084 lt0001

than the number of trigram features (3465 trigrams ver-sus 5929 triphones) More features which mathematicallyamount to more predictors enable more precise map-pings Furthermore the heavy use made in English ofletter sequences such as ough (with 10 different pro-nunciations httpswwwdictionarycomesough)reduces semantic discriminability compared to the corre-sponding spoken forms It is noteworthy however that thebenefits of the triphone-to-semantics mapping are possibleonly thanks to the high accuracy with which orthographictrigram vectors are mapped onto phonological triphonevectors (92)

It is unlikely that the lsquophonological routersquo is always dom-inant in silent reading Especially in fast lsquodiagonalrsquo readingthe lsquodirect routersquomay bemore dominantThere is remarkablealthough for the present authors unexpected convergencewith the dual route model for reading aloud of Coltheartet al [108] and Coltheart [109] However while the text-to-phonology route of their model has as primary function toexplain why nonwords can be pronounced our results showthat both routes can actually be active when silently readingreal words A fundamental difference is however that in ourmodel wordsrsquo semantics play a central role

43 Auditory Comprehension For the modeling of readingwe made use of letter trigrams as cues These cues abstractaway from the actual visual patterns that fall on the retinapatterns that are already transformed at the retina beforebeing sent to the visual cortex Our hypothesis is that lettertrigrams represent those high-level cells or cell assemblies inthe visual system that are critical for reading andwe thereforeleave the modeling possibly with deep learning networksof how patterns on the retina are transformed into lettertrigrams for further research

As a consequence of the high level of abstraction of thetrigrams a word form is represented by a unique vectorspecifying which of a fixed set of letter trigrams is present inthe word Although one might consider modeling auditorycomprehension with phone triplets (triphones) replacing theletter trigrams of visual comprehension such an approachwould not do justice to the enormous variability of actualspeech Whereas the modeling of reading printed wordscan depart from the assumption that the pixels of a wordrsquosletters on a computer screen are in a fixed configurationindependently of where the word is shown on the screenthe speech signal of the same word type varies from token

to token as illustrated in Figure 7 for the English wordcrisis A survey of the Buckeye corpus [110] of spontaneousconversations recorded at Columbus Ohio [111] indicatesthat around 5 of the words are spoken with one syllablemissing and that a little over 20 of words have at least onephone missing

It is widely believed that the phoneme as an abstractunit of sound is essential for coming to grips with thehuge variability that characterizes the speech signal [112ndash114]However the phoneme as theoretical linguistic constructis deeply problematic [93] and for many spoken formscanonical phonemes do not do justice to the phonetics ofthe actual sounds [115] Furthermore if words are defined assequences of phones the problem arises what representationsto posit for words with two ormore reduced variants Addingentries for reduced forms to the lexicon turns out not toafford better overal recognition [116] Although exemplarmodels have been put forward to overcome this problem[117] we take a different approach here and following Arnoldet al [18] lay out a discriminative approach to auditorycomprehension

The cues that we make use of to represent the acousticsignal are the Frequency Band Summary Features (FBSFs)introduced by Arnold et al [18] as input cues FBSFs summa-rize the information present in the spectrogram of a speechsignal The algorithm that derives FBSFs first chunks theinput at the minima of the Hilbert amplitude envelope of thesignalrsquos oscillogram (see the upper panel of Figure 8) Foreach chunk the algorithm distinguishes 21 frequency bandsin the MEL scaled spectrum and intensities are discretizedinto 5 levels (lower panel in Figure 8) for small intervalsof time For each chunk and for each frequency band inthese chunks a discrete feature is derived that specifies chunknumber frequency band number and a description of thetemporal variation in the band bringing together minimummaximum median initial and final intensity values The21 frequency bands are inspired by the 21 receptive areason the cochlear membrane that are sensitive to differentranges of frequencies [118] Thus a given FBSF is a proxyfor cell assemblies in the auditory cortex that respond to aparticular pattern of changes over time in spectral intensityThe AcousticNDLCodeR package [119] for R [120] wasemployed to extract the FBSFs from the audio files

We tested ldl on 20 hours of speech sampled from theaudio files of the ucla library broadcast newsscapedata a vast repository of multimodal TV news broadcasts

Complexity 19

0 5000 10000 15000 20000 25000 30000 35000 40000minus6000

minus4000

minus2000

0

2000

4000

6000

8000

0 5000 10000 15000 20000 25000 30000 35000 40000minus6000

minus4000

minus2000

0

2000

4000

6000

8000

Figure 7Oscillogram for twodifferent realizations of theword crisiswith different degrees of reduction in theNewsScape archive [kraız](left)and [khraısıs](right)

00 01 02 03 04 05 06time

minus020minus015minus010minus005

000005010015

ampl

itude

1

21

frequ

ency

ban

d

Figure 8 Oscillogram of the speech signal for a realization of the word economic with Hilbert envelope (in orange) outlining the signal isshown on the top panel The lower panel depicts the discretized MEL scaled spectrum of the signalThe vertical bars are the boundary pointsthat partition the signal into 4 chunks For one chunk horizontal lines (in red) highlight one of the frequency bands for which the FBSFsprovide summaries of the variation over time in that frequency band The FBSF for the highlighted band is ldquoband11-start3-median3-min2-max5-end3-part2rdquo

provided to us by the Distributed Little Red Hen Lab Theaudio files of this resource were automatically classified asclean for relatively clean parts where there is speech withoutbackground noise or music and noisy for speech snippetswhere background noise or music is present Here we reportresults for 20 hours of clean speech to a total of 131673word tokens (representing 4779word types) with in all 40639distinct FBSFs

The FBSFs for the word tokens are brought together ina matrix C119886 with dimensions 131673 audio tokens times 40639FBSFs The targeted semantic vectors are taken from the Smatrix which is expanded to a matrix with 131673 rows onefor each audio token and 4609 columns the dimension of

the semantic vectors Although the transformation matrix Fcould be obtained by calculating C1015840S the calculation of C1015840is numerically expensive To reduce computational costs wecalculated F as follows (the transpose of a square matrix Xdenoted by X119879 is obtained by replacing the upper triangleof the matrix by the lower triangle and vice versa Thus( 3 87 2 )119879 = ( 3 78 2 ) For a nonsquare matrix the transpose isobtained by switching rows and columnsThus a 4times2 matrixbecomes a 2 times 4 matrix when transposed)

CF = S

C119879CF = C119879S

20 Complexity

(C119879C)minus1 (C119879C) F = (C119879C)minus1 C119879SF = (C119879C)minus1 C119879S

(17)

In this way matrix inversion is required for a much smallermatrixC119879C which is a square matrix of size 40639 times 40639

To evaluate model performance we compared the esti-mated semantic vectors with the targeted semantic vectorsusing the Pearson correlation We therefore calculated the131673 times 131673 correlation matrix for all possible pairs ofestimated and targeted semantic vectors Precisionwas calcu-lated by comparing predicted vectors with the gold standardprovided by the targeted vectors Recognition was defined tobe successful if the correlation of the predicted vector withthe targeted gold vector was the highest of all the pairwisecorrelations of this predicted vector with any of the other goldsemantic vectors Precision defined as the proportion of cor-rect recognitions divided by the total number of audio tokenswas at 3361 For the correctly identified words the meancorrelation was 072 and for the incorrectly identified wordsit was 055 To place this in perspective a naive discrimi-nation model with discrete lexomes as output performed at12 and a deep convolution network Mozilla DeepSpeech(httpsgithubcommozillaDeepSpeech based onHannun et al [9]) performed at 6 The low performanceofMozilla DeepSpeech is due primarily due to its dependenceon a languagemodelWhenpresentedwith utterances insteadof single words it performs remarkably well

44 Discussion A problem with naive discriminative learn-ing that has been puzzling for a long time is that measuresbased on ndl performed well as predictors of processingtimes [54 58] whereas accuracy for lexical decisions waslow An accuracy of 27 for a model performing an 11480-classification task is perhaps reasonable but the lack ofprecision is unsatisfactory when the goal is to model humanvisual lexicality decisions Bymoving fromndl to ldlmodelaccuracy is substantially improved (to 59) At the sametime predictions for reaction times improved considerablyas well As lexicality decisions do not require word identifica-tion further improvement in predicting decision behavior isexpected to be possible by considering not only whether thepredicted semantic is closest to the targeted vector but alsomeasures such as how densely the space around the predictedsemantic vector is populated

Accuracy for auditory comprehension is lower for thedata we considered above at around 33 Interestingly aseries of studies indicates that recognizing isolated wordstaken out of running speech is a nontrivial task also forhuman listeners [121ndash123] Correct identification by nativespeakers of 1000 randomly sampled word tokens from aGerman corpus of spontaneous conversational speech rangedbetween 21 and 44 [18] For both human listeners andautomatic speech recognition systems recognition improvesconsiderably when words are presented in their naturalcontext Given that ldl with FBSFs performs very wellon isolated word recognition it seems worth investigating

further whether the present approach can be developed into afully-fledged model of auditory comprehension that can takefull utterances as input For a blueprint of how we plan toimplement such a model see Baayen et al [124]

5 Speech Production

This section examines whether we can predict wordsrsquo formsfrom the semantic vectors of S If this is possible for thepresent dataset with reasonable accuracy we have a proofof concept that discriminative morphology is feasible notonly for comprehension but also for speech productionThe first subsection introduces the computational imple-mentation The next subsection reports on the modelrsquosperformance which is evaluated first for monomorphemicwords then for inflected words and finally for derivedwords Section 53 provides further evidence for the pro-duction network by showing that as the support from thesemantics for the triphones becomes weaker the amount oftime required for articulating the corresponding segmentsincreases

51 Computational Implementation For a productionmodelsome representational format is required for the output thatin turn drives articulation In what follows we make useof triphones as output features Triphones capture part ofthe contextual dependencies that characterize speech andthat render problematic the phoneme as elementary unit ofa phonological calculus [93] Triphones are in many waysnot ideal in that they inherit the limitations that come withdiscrete units Other output features structured along thelines of gestural phonology [125] or time series of movementsof key articulators registered with electromagnetic articulog-raphy or ultrasound are on our list for further explorationFor now we use triphones as a convenience construct andwe will show that given the support from the semantics forthe triphones the sequence of phones can be reconstructedwith high accuracy As some models of speech productionassemble articulatory targets from phone segments (eg[126]) the present implementation can be seen as a front-endfor this type of model

Before putting the model to the test we first clarify theway the model works by means of our toy lexicon with thewords one two and three Above we introduced the semanticmatrix S (13) which we repeat here for convenience

S = one two threeonetwothree

( 100201031001

040110 ) (18)

as well as an C indicator matrix specifying which triphonesoccur in which words (12) As in what follows this matrixspecifies the triphones targeted for productionwe henceforthrefer to this matrix as the Tmatrix

Complexity 21

T

= wV wVn Vn tu tu Tr Tri rionetwothree

( 100100

100010

010001

001001 ) (19)

For production our interest is in thematrixG that transformsthe row vectors of S into the row vectors of T ie we need tosolve

SG = T (20)

Given G we can predict for any semantic vector s the vectorof triphones t that quantifies the support for the triphonesprovided by s simply by multiplying s with G

t = sG (21)

As before the transformation matrix G is straightforwardto estimate Let S1015840 denote the Moore-Penrose generalizedinverse of S Since

S1015840SG = S1015840T (22)

we have

G = S1015840T (23)

Given G we can predict the triphone matrix T from thesemantic matrix S

SG = T (24)

For the present example the inverse of S and S1015840 is

S1015840 = one two threeonetwothree

( 110minus021minus009minus029107minus008

minus041minus002104 ) (25)

and the transformation matrix G is

G = wV wVn Vn tu tu Tr Tri ri

onetwothree

( 110minus021minus009110minus021minus009

110minus021minus009minus029107minus008

minus029107minus008minus041minus002104

minus041minus002104minus041minus002104 ) (26)

For this simple example T is virtually identical to T Forrealistic data T will not be identical to T but will be anapproximation of it that is optimal in the least squares senseThe triphones with the strongest support are expected to bethe most likely to be the triphones defining a wordrsquos form

We made use of the S matrix which we derived fromthe tasa corpus as described in Section 3 The majority ofcolumns of the 23561 times 23561 matrix S show very smalldeviations from zero and hence are uninformative As beforewe reduced the number of columns of S by removing columnswith very low variance Here one option is to remove allcolumns with a variance below a preset threshold 120579 Howeverto avoid adding a threshold as a free parameter we setthe number of columns retained to the number of differenttriphones in a given dataset a number which is around 119899 =4500 In summary S denotes a 119908times119899 matrix that specifies foreach of 119908 words an 119899-dimensional semantic vector

52 Model Performance

521 Performance on Monolexomic Words We first exam-ined model performance on a dataset comprising monolex-omicwords that did not carry any inflectional exponentsThisdataset of 3987words comprised three irregular comparatives

(elder less andmore) two irregular superlatives (least most)28 irregular past tense forms and one irregular past participle(smelt) For this set of words we constructed a 3987 times 4446matrix of semantic vectors S119898 (a submatrix of the S matrixintroduced above in Section 3) and a 3987 times 4446 triphonematrix T119898 (a submatrix of T) The number of colums of S119898was set to the number of different triphones (the columndimension of T119898) Those column vectors of S119898 were retainedthat had the 4446 highest variances We then estimated thetransformation matrix G and used this matrix to predict thetriphones that define wordsrsquo forms

We evaluated model performance in two ways First weinspected whether the triphones with maximal support wereindeed the targeted triphones This turned out to be the casefor all words Targeted triphones had an activation valueclose to one and nontargeted triphones an activation closeto zero As the triphones are not ordered we also investi-gated whether the sequence of phones could be constructedcorrectly for these words To this end we set a thresholdof 099 extracted all triphones with an activation exceedingthis threshold and used the all simple paths (a path issimple if the vertices it visits are not visited more than once)function from the igraph package [127] to calculate all pathsstarting with any left-edge triphone in the set of extractedtriphones From the resulting set of paths we selected the

22 Complexity

longest path which invariably was perfectly aligned with thesequence of triphones that defines wordsrsquo forms

We also evaluated model performance with a secondmore general heuristic algorithm that also makes use of thesame algorithm from graph theory Our algorithm sets up agraph with vertices collected from the triphones that are bestsupported by the relevant semantic vectors and considers allpaths it can find that lead from an initial triphone to a finaltriphoneThis algorithm which is presented inmore detail inthe appendix and which is essential for novel complex wordsproduced the correct form for 3982 out of 3987 words Itselected a shorter form for five words int for intent lin forlinnen mis for mistress oint for ointment and pin for pippinThe correct forms were also found but ranked second due toa simple length penalty that is implemented in the algorithm

From these analyses it is clear that mapping nearly4000 semantic vectors on their corresponding triphone pathscan be accomplished with very high accuracy for Englishmonolexomic words The question to be addressed nextis how well this approach works for complex words Wefirst address inflected forms and limit ourselves here to theinflected variants of the present set of 3987 monolexomicwords

522 Performance on Inflected Words Following the classicdistinction between inflection and word formation inflectedwords did not receive semantic vectors of their own Never-theless we can create semantic vectors for inflected words byadding the semantic vector of an inflectional function to thesemantic vector of its base However there are several ways inwhich themapping frommeaning to form for inflectedwordscan be set up To explain this we need some further notation

Let S119898 and T119898 denote the submatrices of S and T thatcontain the semantic and triphone vectors of monolexomicwords Assume that a subset of 119896 of these monolexomicwords is attested in the training corpus with inflectionalfunction 119886 Let T119886 denote the matrix with the triphonevectors of these inflected words and let S119886 denote thecorresponding semantic vectors To obtain S119886 we take thepertinent submatrix S119898119886 from S119898 and add the semantic vectors119886 of the affix

S119886 = S119898119886 + i⨂ s119886 (27)

Here i is a unit vector of length 119896 and ⨂ is the generalizedKronecker product which in (27) stacks 119896 copies of s119886 row-wise As a first step we could define a separate mapping G119886for each inflectional function 119886

S119886G119886 = T119886 (28)

but in this set-up learning of inflected words does not benefitfrom the knowledge of the base words This can be remediedby a mapping for augmented matrices that contain the rowvectors for both base words and inflected words[S119898

S119886]G119886 = [T119898

T119886] (29)

The dimension of G119886 (length of semantic vector by lengthof triphone vector) remains the same so this option is notmore costly than the preceding one Nevertheless for eachinflectional function a separate large matrix is required Amuch more parsimonious solution is to build augmentedmatrices for base words and all inflected words jointly

[[[[[[[[[[S119898S1198861S1198862S119886119899

]]]]]]]]]]G = [[[[[[[[[[

T119898T1198861T1198862T119886119899

]]]]]]]]]](30)

The dimension of G is identical to that of G119886 but now allinflectional functions are dealt with by a single mappingIn what follows we report the results obtained with thismapping

We selected 6595 inflected variants which met the cri-terion that the frequency of the corresponding inflectionalfunction was at least 50 This resulted in a dataset with 91comparatives 97 superlatives 2401 plurals 1333 continuousforms (eg walking) 859 past tense forms and 1086 formsclassed as perfective (past participles) as well as 728 thirdperson verb forms (eg walks) Many forms can be analyzedas either past tenses or as past participles We followed theanalyses of the treetagger which resulted in a dataset inwhichboth inflectional functions are well-attested

Following (30) we obtained an (augmented) 10582 times5483 semantic matrix S whereas before we retained the 5483columns with the highest column variance The (augmented)triphone matrix T for this dataset had the same dimensions

Inspection of the activations of the triphones revealedthat targeted triphones had top activations for 85 of themonolexomic words and 86 of the inflected words Theproportion of words with at most one intruding triphone was97 for both monolexomic and inflected words The graph-based algorithm performed with an overall accuracy of 94accuracies broken down bymorphology revealed an accuracyof 99 for the monolexomic words and an accuracy of 92for inflected words One source of errors for the productionalgorithm is inconsistent coding in the celex database Forinstance the stem of prosper is coded as having a final schwafollowed by r but the inflected forms are coded without ther creating a mismatch between a partially rhotic stem andcompletely non-rhotic inflected variants

We next put model performance to a more stringent testby using 10-fold cross-validation for the inflected variantsFor each fold we trained on all stems and 90 of all inflectedforms and then evaluated performance on the 10of inflectedforms that were not seen in training In this way we can ascer-tain the extent to which our production system (network plusgraph-based algorithm for ordering triphones) is productiveWe excluded from the cross-validation procedure irregularinflected forms forms with celex phonological forms withinconsistent within-paradigm rhoticism as well as forms thestem of which was not available in the training set Thus

Complexity 23

cross-validation was carried out for a total of 6236 inflectedforms

As before the semantic vectors for inflected words wereobtained by addition of the corresponding content and inflec-tional semantic vectors For each training set we calculatedthe transformationmatrixG from the S andTmatrices of thattraining set For an out-of-bag inflected form in the test setwe calculated its semantic vector and multiplied this vectorwith the transformation matrix (using (21)) to obtain thepredicted triphone vector t

The proportion of forms that were predicted correctlywas 062 The proportion of forms ranked second was 025Forms that were incorrectly ranked first typically were otherinflected forms (including bare stems) that happened toreceive stronger support than the targeted form Such errorsare not uncommon in spoken English For instance inthe Buckeye corpus [110] closest is once reduced to closFurthermore Dell [90] classified forms such as concludementfor conclusion and he relax for he relaxes as (noncontextual)errors

The algorithm failed to produce the targeted form for3 of the cases Examples of the forms produced instead arethe past tense for blazed being realized with [zId] insteadof [zd] the plural mouths being predicted as having [Ts] ascoda rather than [Ds] and finest being reduced to finst Thevoiceless production formouth does not follow the dictionarynorm but is used as attested by on-line pronunciationdictionaries Furthermore the voicing alternation is partlyunpredictable (see eg [128] for final devoicing in Dutch)and hence model performance here is not unreasonable Wenext consider model accuracy for derived words

523 Performance on Derived Words When building thevector space model we distinguished between inflectionand word formation Inflected words did not receive theirown semantic vectors By contrast each derived word wasassigned its own lexome together with a lexome for itsderivational function Thus happiness was paired with twolexomes happiness and ness Since the semantic matrix forour dataset already contains semantic vectors for derivedwords we first investigated how well forms are predictedwhen derived words are assessed along with monolexomicwords without any further differentiation between the twoTo this end we constructed a semantic matrix S for 4885words (rows) by 4993 (columns) and constructed the trans-formation matrix G from this matrix and the correspondingtriphone matrix T The predicted form vectors of T =SG supported the targeted triphones above all other tri-phones without exception Furthermore the graph-basedalgorithm correctly reconstructed 99 of all forms withonly 5 cases where it assigned the correct form secondrank

Next we inspected how the algorithm performs whenthe semantic vectors of derived forms are obtained fromthe semantic vectors of their base words and those of theirderivational lexomes instead of using the semantic vectors ofthe derived words themselves To allow subsequent evalua-tion by cross-validation we selected those derived words that

contained an affix that occured at least 30 times in our data set(again (38) agent (177) ful (45) instrument (82) less(54) ly (127) and ness (57)) to a total of 770 complex words

We first combined these derived words with the 3987monolexomic words For the resulting 4885 words the Tand S matrices were constructed from which we derivedthe transformation matrix G and subsequently the matrixof predicted triphone strengths T The proportion of wordsfor which the targeted triphones were the best supportedtriphones was 096 and the graph algorithm performed withan accuracy of 989

In order to assess the productivity of the system weevaluated performance on derived words by means of 10-fold cross-validation For each fold we made sure that eachaffix was present proportionally to its frequency in the overalldataset and that a derived wordrsquos base word was included inthe training set

We first examined performance when the transformationmatrix is estimated from a semantic matrix that contains thesemantic vectors of the derived words themselves whereassemantic vectors for unseen derived words are obtained bysummation of the semantic vectors of base and affix It turnsout that this set-up results in a total failure In the presentframework the semantic vectors of derived words are tooidiosyncratic and too finely tuned to their own collocationalpreferences They are too scattered in semantic space tosupport a transformation matrix that supports the triphonesof both stem and affix

We then used exactly the same cross-validation procedureas outlined above for inflected words constructing semanticvectors for derived words from the semantic vectors of baseand affix and calculating the transformation matrix G fromthese summed vectors For unseen derivedwordsGwas usedto transform the semantic vectors for novel derived words(obtained by summing the vectors of base and affix) intotriphone space

For 75 of the derived words the graph algorithmreconstructed the targeted triphones For 14 of the derivednouns the targeted form was not retrieved These includecases such as resound which the model produced with [s]instead of the (unexpected) [z] sewer which the modelproduced with R instead of only R and tumbler where themodel used syllabic [l] instead of nonsyllabic [l] given in thecelex target form

53 Weak Links in the Triphone Graph and Delays in SpeechProduction An important property of the model is that thesupport for trigrams changes where a wordrsquos graph branchesout for different morphological variants This is illustratedin Figure 9 for the inflected form blending Support for thestem-final triphone End is still at 1 but then blend can endor continue as blends blended or blending The resultinguncertainty is reflected in the weights on the edges leavingEnd In this example the targeted ing form is driven by theinflectional semantic vector for continuous and hence theedge to ndI is best supported For other forms the edgeweights will be different and hence other paths will be bettersupported

24 Complexity

1

1

1051

024023

03

023

1

001 001

1

02

03

023

002

1 0051

bl

blE

lEn

EndndI

dIN

IN

INI

NIN

dId

Id

IdI

dII

IIN

ndz

dz

dzI

zIN

nd

EnI

nIN

lEI

EIN

Figure 9 The directed graph for blending Vertices represent triphones including triphones such as IIN that were posited by the graphalgorithm to bridge potentially relevant transitions that are not instantiated in the training data Edges are labelled with their edge weightsThe graph the edge weights of which are specific to the lexomes blend and continuous incorporates not only the path for blending butalso the paths for blend blends and blended

The uncertainty that arises at branching points in thetriphone graph is of interest in the light of several exper-imental results For instance inter keystroke intervals intyping become longer at syllable and morph boundaries[129 130] Evidence from articulography suggests variabilityis greater at morph boundaries [131] Longer keystroke exe-cution times and greater articulatory variability are exactlywhat is expected under reduced edge support We thereforeexamined whether the edge weights at the first branchingpoint are predictive for lexical processing To this end weinvestigated the acoustic duration of the segment at the centerof the first triphone with a reduced LDL edge weight (in thepresent example d in ndI henceforth lsquobranching segmentrsquo)for those words in our study that are attested in the Buckeyecorpus [110]

The dataset we extracted from the Buckeye corpus com-prised 15105 tokens of a total of 1327 word types collectedfrom 40 speakers For each of these words we calculatedthe relative duration of the branching segment calculatedby dividing segment duration by word duration For thepurposes of statistical evaluation the distribution of thisrelative duration was brought closer to normality bymeans ofa logarithmic transformationWe fitted a generalized additivemixed model [132] to log relative duration with randomintercepts for word and speaker log LDL edge weight aspredictor of interest and local speech rate (Phrase Rate)log neighborhood density (Ncount) and log word frequencyas control variables A summary of this model is presentedin Table 7 As expected an increase in LDL Edge Weight

goes hand in hand with a reduction in the duration of thebranching segment In other words when the edge weight isreduced production is slowed

The adverse effects of weaknesses in a wordrsquos path inthe graph where the path branches is of interest againstthe discussion about the function of weak links in diphonetransitions in the literature on comprehension [133ndash138] Forcomprehension it has been argued that bigram or diphonelsquotroughsrsquo ie low transitional probabilities in a chain of hightransitional probabilities provide points where sequencesare segmented and parsed into their constituents Fromthe perspective of discrimination learning however low-probability transitions function in exactly the opposite wayfor comprehension [54 124] Naive discriminative learningalso predicts that troughs should give rise to shorter process-ing times in comprehension but not because morphologicaldecomposition would proceed more effectively Since high-frequency boundary bigrams and diphones are typically usedword-internally across many words they have a low cuevalidity for thesewords Conversely low-frequency boundarybigrams are much more typical for very specific base+affixcombinations and hence are better discriminative cues thatafford enhanced activation of wordsrsquo lexomes This in turngives rise to faster processing (see also Ramscar et al [78] forthe discriminative function of low-frequency bigrams in low-frequency monolexomic words)

There is remarkable convergence between the directedgraphs for speech production such as illustrated in Figure 9and computational models using temporal self-organizing

Complexity 25

Table 7 Statistics for the partial effects of a generalized additive mixed model fitted to the relative duration of edge segments in the Buckeyecorpus The trend for log frequency is positive accelerating TPRS thin plate regression spline

A parametric coefficients Estimate Std Error t-value p-valueIntercept -19165 00592 -323908 lt 00001Log LDL Edge Weight -01358 00400 -33960 00007Phrase Rate 00047 00021 22449 00248Log Ncount 01114 00189 58837 lt 00001B smooth terms edf Refdf F-value p-valueTPRS log Frequency 19007 19050 75495 00004random intercepts word 11498839 13230000 314919 lt 00001random intercepts speaker 302995 390000 345579 lt 00001

maps (TSOMs [139ndash141]) TSOMs also use error-drivenlearning and in recent work attention is also drawn toweak edges in wordsrsquo paths in the TSOM and the conse-quences thereof for lexical processing [142] We are not usingTSOMs however one reason being that in our experiencethey do not scale up well to realistically sized lexiconsA second reason is that we are pursuing the hypothesisthat form serves meaning and that self-organization ofform by itself is not necessary This hypothesis is likelytoo strong especially as we do not provide a rationale forthe trigram and triphone units that we use to representaspects of form It is worth noting that spatial organizationof form features is not restricted to TSOMs Although inour approach the spatial organization of triphone features isleft unspecified such an organization can be easily enforced(see [85] for further details) by using an algorithm suchas graphopt which self-organizes the vertices of a graphinto a two-dimensional plane Figure 9 was obtained withthis algorithm (Graphopt was developed for the layout oflarge graphs httpwwwschmuhlorggraphopt andis implemented in the igraph package Graphopt uses basicprinciples of physics to iteratively determine an optimallayout Each node in the graph is given both mass and anelectric charge and edges between nodes are modeled asspringsThis sets up a system inwhich there are attracting andrepelling forces between the vertices of the graph and thisphysical system is simulated until it reaches an equilibrium)

54 Discussion Our production network reconstructsknown wordsrsquo forms with a high accuracy 999 formonolexomic words 92 for inflected words and 99 forderived words For novel complex words accuracy under10-fold cross-validation was 62 for inflected words and 75for derived words

The drop in performance for novel forms is perhapsunsurprising given that speakers understand many morewords than they themselves produce even though they hearnovel forms on a fairly regular basis as they proceed throughlife [79 143]

However we also encountered several technical problemsthat are not due to the algorithm but to the representationsthat the algorithm has had to work with First it is surprisingthat accuracy is as high as it is given that the semanticvectors are constructed from a small corpus with a very

simple discriminative algorithm Second we encounteredinconsistencies in the phonological forms retrieved from thecelex database inconsistencies that in part are due to theuse of discrete triphones Third many cases where the modelpredictions do not match the targeted triphone sequence thetargeted forms have a minor irregularity (eg resound with[z] instead of [s]) Fourth several of the typical errors that themodel makes are known kinds of speech errors or reducedforms that one might encounter in engaged conversationalspeech

It is noteworthy that themodel is almost completely data-driven The triphones are derived from celex and otherthan somemanual corrections for inconsistencies are derivedautomatically from wordsrsquo phone sequences The semanticvectors are based on the tasa corpus and were not in any wayoptimized for the production model Given the triphone andsemantic vectors the transformation matrix is completelydetermined No by-hand engineering of rules and exceptionsis required nor is it necessary to specify with hand-codedlinks what the first segment of a word is what its secondsegment is etc as in the weaver model [37] It is only in theheuristic graph algorithm that three thresholds are requiredin order to avoid that graphs become too large to remaincomputationally tractable

The speech production system that emerges from thisapproach comprises first of all an excellent memory forforms that have been encountered before Importantly thismemory is not a static memory but a dynamic one It is not arepository of stored forms but it reconstructs the forms fromthe semantics it is requested to encode For regular unseenforms however a second network is required that projects aregularized semantic space (obtained by accumulation of thesemantic vectors of content and inflectional or derivationalfunctions) onto the triphone output space Importantly itappears that no separate transformations or rules are requiredfor individual inflectional or derivational functions

Jointly the two networks both of which can also betrained incrementally using the learning rule ofWidrow-Hoff[44] define a dual route model attempts to build a singleintegrated network were not successfulThe semantic vectorsof derived words are too idiosyncratic to allow generalizationfor novel forms It is the property of the semantic vectorsof derived lexomes described in Section 323 namely thatthey are close to but not inside the cloud of their content

26 Complexity

lexomes which makes them productive Novel forms do notpartake in the idiosyncracies of lexicalized existing wordsbut instead remain bound to base and derivational lexomesIt is exactly this property that makes them pronounceableOnce produced a novel form will then gravitate as experi-ence accumulates towards the cloud of semantic vectors ofits morphological category meanwhile also developing itsown idiosyncracies in pronunciation including sometimeshighly reduced pronunciation variants (eg tyk for Dutchnatuurlijk ([natyrlk]) [111 144 145] It is perhaps possibleto merge the two routes into one but for this much morefine-grained semantic vectors are required that incorporateinformation about discourse and the speakerrsquos stance withrespect to the adressee (see eg [115] for detailed discussionof such factors)

6 Bringing in Time

Thus far we have not considered the role of time The audiosignal comes in over time longerwords are typically readwithmore than one fixation and likewise articulation is a temporalprocess In this section we briefly outline how time can bebrought into the model We do so by discussing the readingof proper names

Proper names (morphologically compounds) pose achallenge to compositional theories as the people referredto by names such as Richard Dawkins Richard NixonRichardThompson and SandraThompson are not obviouslysemantic composites of each other We therefore assignlexomes to names irrespective of whether the individu-als referred to are real or fictive alive or dead Further-more we assume that when the personal name and familyname receive their own fixations both names are under-stood as pertaining to the same named entity which istherefore coupled with its own unique lexome Thus forthe example sentences in Table 8 the letter trigrams ofJohn are paired with the lexome JohnClark and like-wise the letter trigrams of Clark are paired with this lex-ome

By way of illustration we obtained thematrix of semanticvectors training on 100 randomly ordered tokens of thesentences of Table 8 We then constructed a trigram cuematrix C specifying for each word (John Clark wrote )which trigrams it contains In parallel a matrix L specify-ing for each word its corresponding lexomes (JohnClarkJohnClark write ) was set up We then calculatedthe matrix F by solving CF = L and used F to cal-culate estimated (predicted) semantic vectors L Figure 10presents the correlations of the estimated semantic vectorsfor the word forms John John Welsch Janet and Clarkwith the targeted semantic vectors for the named entitiesJohnClark JohnWelsch JohnWiggam AnneHastieand JaneClark For the sequentially read words John andWelsch the semantic vector generated by John and thatgenerated by Welsch were summed to obtain the integratedsemantic vector for the composite name

Figure 10 upper left panel illustrates that upon readingthe personal name John there is considerable uncertainty

about which named entity is at issue When subsequentlythe family name is read (upper right panel) uncertainty isreduced and John Welsch now receives full support whereasother named entities have correlations close to zero or evennegative correlations As in the small world of the presenttoy example Janet is a unique personal name there is nouncertainty about what named entity is at issue when Janetis read For the family name Clark on its own by contrastJohn Clark and Janet Clark are both viable options

For the present example we assumed that every ortho-graphic word received one fixation However depending onwhether the eye lands far enough into the word namessuch as John Clark and compounds such as graph theorycan also be read with a single fixation For single-fixationreading learning events should comprise the joint trigrams ofthe two orthographic words which are then simultaneouslycalibrated against the targeted lexome (JohnClark graph-theory)

Obviously the true complexities of reading such asparafoveal preview are not captured by this example Never-theless as a rough first approximation this example showsa way forward for integrating time into the discriminativelexicon

7 General Discussion

We have presented a unified discrimination-driven modelfor visual and auditory word comprehension and wordproduction the discriminative lexicon The layout of thismodel is visualized in Figure 11 Input (cochlea and retina)and output (typing and speaking) systems are presented inlight gray the representations of themodel are shown in darkgray For auditory comprehension we make use of frequencyband summary (FBS) features low-level cues that we deriveautomatically from the speech signal The FBS features serveas input to the auditory network F119886 that maps vectors of FBSfeatures onto the semantic vectors of S For reading lettertrigrams represent the visual input at a level of granularitythat is optimal for a functional model of the lexicon (see [97]for a discussion of lower-level visual features) Trigrams arerelatively high-level features compared to the FBS featuresFor features implementing visual input at a much lower levelof visual granularity using histograms of oriented gradients[146] see Linke et al [15] Letter trigrams are mapped by thevisual network F119900 onto S

For reading we also implemented a network heredenoted by K119886 that maps trigrams onto auditory targets(auditory verbal images) represented by triphones Theseauditory triphones in turn are mapped by network H119886onto the semantic vectors S This network is motivated notonly by our observation that for predicting visual lexicaldecision latencies triphones outperform trigrams by a widemargin Network H119886 is also part of the control loop forspeech production see Hickok [147] for discussion of thiscontrol loop and the necessity of auditory targets in speechproduction Furthermore the acoustic durations with whichEnglish speakers realize the stem vowels of English verbs[148] as well as the duration of word final s in English [149]

Complexity 27

Table 8 Example sentences for the reading of proper names To keep the example simple inflectional lexomes are not taken into account

John Clark wrote great books about antsJohn Clark published a great book about dangerous antsJohn Welsch makes great photographs of airplanesJohn Welsch has a collection of great photographs of airplanes landingJohn Wiggam will build great harpsichordsJohn Wiggam will be a great expert in tuning harpsichordsAnne Hastie has taught statistics to great second year studentsAnne Hastie has taught probability to great first year studentsJanet Clark teaches graph theory to second year studentsJohnClark write great book about antJohnClark publish a great book about dangerous antJohnWelsch make great photograph of airplaneJohnWelsch have a collection of great photograph airplane landingJohnWiggam future build great harpsichordJohnWiggam future be a great expert in tuning harpsichordAnneHastie have teach statistics to great second year studentAnneHastie have teach probability to great first year studentJanetClark teach graphtheory to second year student

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

John

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

John Welsch

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

Janet

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

Clark

Pear

son

corr

elat

ion

minus0

50

00

51

0

Figure 10 Correlations of estimated semantic vectors for John JohnWelsch Janet andClarkwith the targeted semantic vectors of JohnClarkJohnWelsch JohnWiggam AnneHastie and JaneClark

can be predicted from ndl networks using auditory targets todiscriminate between lexomes [150]

Speech production is modeled by network G119886 whichmaps semantic vectors onto auditory targets which in turnserve as input for articulation Although current modelsof speech production assemble articulatory targets fromphonemes (see eg [126 147]) a possibility that is certainlycompatible with our model when phones are recontextual-ized as triphones we think that it is worthwhile investigatingwhether a network mapping semantic vectors onto vectors ofarticulatory parameters that in turn drive the articulators canbe made to work especially since many reduced forms arenot straightforwardly derivable from the phone sequences oftheir lsquocanonicalrsquo forms [111 144]

We have shown that the networks of our model havereasonable to high production and recognition accuraciesand also generate a wealth of well-supported predictions forlexical processing Central to all networks is the hypothesisthat the relation between form and meaning can be modeleddiscriminatively with large but otherwise surprisingly simplelinear networks the underlyingmathematics of which (linearalgebra) is well-understood Given the present results it isclear that the potential of linear algebra for understandingthe lexicon and lexical processing has thus far been severelyunderestimated (the model as outlined in Figure 11 is amodular one for reasons of computational convenience andinterpretabilityThe different networks probably interact (seefor some evidence [47]) One such interaction is explored

28 Complexity

typing speaking

orthographic targets

trigrams

auditory targets

triphones

semantic vectors

TASA

auditory cues

FBS features

orthographic cues

trigrams

cochlea retina

To Ta

Ha

GoGa

KaS

Fa Fo

CoCa

Figure 11 Overview of the discriminative lexicon Input and output systems are presented in light gray the representations of the model areshown in dark gray Each of these representations is a matrix with its own set of features Arcs are labeled with the discriminative networks(transformationmatrices) thatmapmatrices onto each otherThe arc from semantic vectors to orthographic vectors is in gray as thismappingis not explored in the present study

in Baayen et al [85] In their production model for Latininflection feedback from the projected triphone paths backto the semantics (synthesis by analysis) is allowed so thatof otherwise equally well-supported paths the path that bestexpresses the targeted semantics is selected For informationflows between systems in speech production see Hickok[147]) In what follows we touch upon a series of issues thatare relevant for evaluating our model

Incremental learning In the present study we esti-mated the weights on the connections of these networkswith matrix operations but importantly these weights canalso be estimated incrementally using the learning rule ofWidrow and Hoff [44] further improvements in accuracyare expected when using the Kalman filter [151] As allmatrices can be updated incrementally (and this holds aswell for the matrix with semantic vectors which are alsotime-variant) and in theory should be updated incrementally

whenever information about the order of learning events isavailable the present theory has potential for studying lexicalacquisition and the continuing development of the lexiconover the lifetime [79]

Morphology without compositional operations Wehave shown that our networks are predictive for a rangeof experimental findings Important from the perspective ofdiscriminative linguistics is that there are no compositionaloperations in the sense of Frege and Russell [1 2 152] Thework on compositionality in logic has deeply influencedformal linguistics (eg [3 153]) and has led to the beliefthat the ldquoarchitecture of the language facultyrdquo is groundedin a homomorphism between a calculus (or algebra) ofsyntactic representations and a calculus (or algebra) based onsemantic primitives Within this tradition compositionalityarises when rules combining representations of form arematched with rules combining representations of meaning

Complexity 29

The approach of Marelli and Baroni [60] who derive thesemantic vector of a complex word from the semantic vectorof its base and a dedicated affix-specific mapping from thepertinent base words to the corresponding derived wordsis in the spirit of compositionality in that a change in form(through affixation) goes hand in hand with a change inmeaning (modeled with a linear mapping in semantic space)

The perspective on morphological processing developedin the present study breaks with this tradition Althoughsemantic vectors for inflected words are obtained by summa-tion of the semantic vectors of the base word and those ofits inflectional functions (see [85] for modeling of the richverbal paradigms of Latin) the form vectors correspondingto the resulting semantic vectors are not obtained by thesummation of vectors for chunks of form nor by any otheroperations on forms Attempts to align chunks of wordsrsquo formwith their semantic structures mdash typically represented bygraphs constructed over semantic primitives (see eg [154])mdash are destined to end in quagmires of arbitrary decisionsabout the number of inflectional classes required (Estonian30 or 300) what the chunks should look like (does Germanhave a morpheme d in its articles and demonstratives derdie das diese ) whether Semitic morphology requiresoperations on form on one or two tiers [155ndash157] andhow many affixal position slots are required for Athebascanlanguages such asNavajo [158] A language that is particularlypainful for compositional theories is Yelı Dnye an Indo-Pacific language spoken on Rossel island (in the south-east ofPapuaNewGuinea)The language has a substantial inventoryof over a 1000 function words used for verb agreementTypically these words are monosyllables that simultaneouslyspecify values for negation tense aspect person and numberof subject deixis evidentiality associated motion and coun-terfactuality Furthermore verbs typically have no less thaneight different stems the use of which depends on particularconstellations of inflectional functions [159 160] Yelı Dnyethus provides a striking counterexample to the hypothesisthat a calculus of meaning would be paralleled by a calculusof form

Whereas the theory of linear discriminative learningradically rejects the idea that morphological processing isgrounded in compositionality as defined in mainstream for-mal linguistic theory it does allow for the conjoint expressionin form of lexical meanings and inflectional functions andfor forms to map onto such conjoint semantic vectors Butjust asmultiple inflectional functions can be expressed jointlyin a single word multiple words can map onto nonconjointsemantic vectors as was illustrated above for named entityrecognition John Wiggam can be understood to refer to avery specific person without this person being a composi-tional function of the proper name John and the surnameWiggam In both these cases there is no isomorphy betweeninflectional functions and lexical meanings on the one handand chunks of form on the other hand

Of course conjoint vectors for lexical meanings andinflectional functions can be implemented andmade to workonly because a Latin verb form such as sapıvissemus isunderstood when building the semantic space S to expresslexomes for person (first) number (plural) voice (active)

tense (pluperfect) mood (subjunctive) and lexical meaning(to know) (see [85] for computational modeling details)Although this could loosely be described as a decomposi-tional approach to inflected words we prefer not to use theterm lsquodecompositionalrsquo as in formal linguistics this termas outlined above has a very different meaning A moreadequate terminology would be that the system is conjointrealizational for production and conjoint inferential for com-prehension (we note here that we have focused on temporallyconjoint realization and inference leaving temporally disjointrealization and inference for sequences of words and possiblycompounds and fixed expression for further research)

Naive versus linear discriminative learning There isone important technical difference between the learningengine of the current study ldl and naive discriminativelearning (ndl) Because ndl makes the simplifying assump-tion that outcomes are orthogonal the weights from the cuesto a given outcome are independent of the weights from thecues to the other outcomes This independence assumptionwhich motivates the word naive in naive discriminativelearning makes it possible to very efficiently update networkweights during incremental learning In ldl by contrast thisindependence no longer holds learning is no longer naiveWe therefore refer to our networks not as ndl networksbut simply as linear discriminative learning networks Actualincremental updating of ldl networks is computationallymore costly but further numerical optimization is possibleand implementations are underway

Scaling An important property of the present approachis that good results are obtained already for datasets ofmodestsize Our semantic matrix is derived from the tasa corpuswhich with only 10 million words is dwarfed by the 28 billionwords of the ukWaC corpus used by Marelli and Baroni [60]more words than any individual speaker can ever encounterin their lifetime Similarly Arnold et al [18] report goodresults for word identification when models are trained on20 hours of speech which contrasts with the huge volumesof speech required for training deep learning networks forspeech recognition Although performance increases some-whatwithmore training data [161] it is clear that considerableheadway can be made with relatively moderate volumesof speech It thus becomes feasible to train models onfor instance Australian English and New Zealand Englishusing already existing speech corpora and to implementnetworks that make precise quantitative predictions for howunderstanding is mediated by listenersrsquo expectations abouttheir interlocutors (see eg [162])

In this study we have obtained good results with asemantic matrix with a dimensionality of around 4000lexomes By itself we find it surprising that a semantic vectorspace can be scaffolded with a mere 4000 words But thisfindingmay also shed light on the phenomenon of childhoodamnesia [163 164] Bauer and Larkina [165] pointed outthat young children have autobiographical memories thatthey may not remember a few years later They argue thatthe rate of forgetting diminishes as we grow older andstabilizes in adulthood Part of this process of forgettingmay relate to the changes in the semantic matrix Cruciallywe expect the correlational structure of lexomes to change

30 Complexity

considerably over time as experience accumulates If this isindeed the case the semantic vectors for the lexomes foryoung children are different from the corresponding lexomesfor older children which will again differ from those ofadults As a consequence autobiographical memories thatwere anchored to the lexomes at a young age can no longerbe accessed as the semantic vectors to which these memorieswere originally anchored no longer exist

No examplars Given how we estimate the transforma-tion matrices with which we navigate between form andmeaning it might seem that our approach is in essence animplementation of an exemplar-based theory of the mentallexicon After all for instance the C119886 matrix specifies foreach auditory word form (exemplar) its pertinent frequencyband summary features However the computational imple-mentation of the present study should not be confused withthe underlying algorithmic conceptualization The underly-ing conceptualization is that learning proceeds incrementallytrial by trial with the weights of transformation networksbeing updated with the learning rule of Widrow and Hoff[44] In the mean trial-by-trial updating with the Widrow-Hoff learning rule results in the same expected connectionweights as obtained by the present estimates using thegeneralized inverse The important conceptual advantage ofincremental learning is however that there is no need to storeexemplars By way of example for auditory comprehensionacoustic tokens are given with the input to the learningsystemThese tokens the ephemeral result of interactingwiththe environment leave their traces in the transformationmatrix F119886 but are not themselves stored The same holdsfor speech production We assume that speakers dynamicallyconstruct the semantic vectors from their past experiencesrather than retrieving these semantic vectors from somedictionary-like fixed representation familiar from currentfile systems The dynamic nature of these semantic vectorsemerged loud and clear from the modeling of novel complexwords for speech production under cross-validation In otherwords our hypothesis is that all input and output vectors areephemeral in the sense that they represent temporary statesof the cognitive system that are not themselves represented inthat system but that leave their traces in the transformationmatrices that jointly characterize and define the cognitivesystem that we approximate with the discriminative lexicon

This dynamic perspective on the mental lexicon asconsisting of several coupled networks that jointly bringthe cognitive system into states that in turn feed into fur-ther networks (not modeled here) for sentence integration(comprehension) and articulation (production) raises thequestion about the status of units in this theory Units playan important role when constructing numeric vectors forform and meaning Units for letter trigrams and phonetrigrams are of course context-enriched letter and phoneunits Content lexomes as well as lexomes for derivationaland inflectional functions are crutches for central semanticfunctions ranging from functions realizing relatively concreteonomasiological functions (lsquothis is a dog not a catrsquo) toabstract situational functions allowing entities events andstates to be specified on the dimensions of time spacequantity aspect etc However crucial to the central thrust of

the present study there are no morphological symbols (unitscombining form andmeaning) normorphological formunitssuch as stems and exponents of any kind in the modelAbove we have outlined the many linguistic considerationsthat have led us not to want to incorporate such units aspart of our theory Importantly as there are no hidden layersin our model it is not possible to seek for lsquosubsymbolicrsquoreflexes of morphological units in patterns of activation overhidden layers Here we depart from earlier connectionistmodels which rejected symbolic representations but retainedthe belief that there should be morphological representationalbeit subsymbolic ones Seidenberg and Gonnerman [166]for instance discuss morphological representations as beinglsquointerlevel representationsrsquo that are emergent reflections ofcorrelations among orthography phonology and semanticsThis is not to say that themore agglutinative a language is themore the present model will seem to the outside observer tobe operating withmorphemes But themore a language tendstowards fusional or polysyntheticmorphology the less strongthis impression will be

Model complexity Next consider the question of howdifficult lexical processing actually is Sidestepping the com-plexities of the neural embedding of lexical processing [126147] here we narrow down this question to algorithmiccomplexity State-of-the-art models in psychology [37 114167] implementmany layers of hierarchically organized unitsand many hold it for an established fact that such units arein some sense psychologically real (see eg [19 91 168])However empirical results can be equally well understoodwithout requiring the theoretical construct of the morpheme(see eg [47 54 58 166 169 170]) The present studyillustrates this point for lsquoboundaryrsquo effects in written andoral production When paths in the triphone graph branchuncertainty increases and processing slows down Althoughobserved lsquoboundaryrsquo effects are compatible with a post-Bloomfieldian lexicon the very same effects arise in ourmodel even though morphemes do not figure in any way inour computations

Importantly the simplicity of the linear mappings in ourmodel of the discriminative lexicon allows us to sidestep theproblem that typically arise in computationally implementedfull-scale hierarchical systems namely that errors arisingat lower levels propagate to higher levels The major stepsforward made in recent end-to-end models in languageengineering suggest that end-to-end cognitive models arealso likely to perform much better One might argue thatthe hidden layers of deep learning networks represent thetraditional lsquohidden layersrsquo of phonemes morphemes andword forms mediating between form and meaning in lin-guistic models and offshoots thereof in psychology Howeverthe case of baboon lexical learning [12 13] illustrates thatthis research strategy is not without risk and that simplerdiscriminative networks can substantially outperform deeplearning networks [15] We grant however that it is conceiv-able that model predictions will improve when the presentlinear networks are replaced by deep learning networks whenpresented with the same input and output representationsused here (see Zhao et al [171] for a discussion of lossfunctions for deep learning targeting numeric instead of

Complexity 31

categorical output vectors) What is unclear is whethersuch networks will improve our understandingmdashthe currentlinear networks offer the analyst connection weights betweeninput and output representations that are straightforwardlylinguistically interpretable Where we think deep learningnetworks really will come into their own is bridging the gapbetween for instance the cochlea and retina on the one handand the heuristic representations (letter trigrams) that wehave used as a starting point for our functional theory of themappings between form and meaning

Network flexibility An important challenge for deeplearning networks designed to model human lexical process-ing is to keep the networks flexible and open to rapid updat-ing This openness to continuous learning is required notonly by findings on categorization [40 41 41 42] includingphonetic categorization [172 173] but is also visible in timeseries of reaction times Baayen and Hendrix [17] reportedimproved prediction of visual lexical decision reaction timesin the British Lexicon Project when a naive discriminativelearning network is updated from trial to trial simulating thewithin-experiment learning that goes on as subjects proceedthrough the experiment In natural speech the phenomenonof phonetic convergence between interlocutors [174] likewisebears witness to a lexicon that is constantly recalibratedin interaction with its environment (for the many ways inwhich interlocutorsrsquo speech aligns see [175]) Likewise therapidity with which listeners adapt to foreign accents [176]within a few minutes shows that lexical networks exhibitfast local optimization As deep learning networks typicallyrequire huge numbers of cycles through the training dataonce trained they are at risk of not having the requiredflexibility for fast local adaptation This risk is reduced for(incremental) wide learning as illustrated by the phoneticconvergence evidenced by the model of Arnold et al [18] forauditory comprehension (a very similar point was made byLevelt [177] in his critique of connectionist models in thenineties)

Open questions This initial study necessarily leavesmany questions unanswered Compounds and inflectedderived words have not been addressed and morphologicaloperations such as reduplication infixation and nonconcate-native morphology provide further testing grounds for thepresent approach We also need to address the processingof compounds with multiple constituents ubiquitous inlanguages as different as German Mandarin and EstonianFurthermore in actual utterances words undergo assimila-tion at their boundaries and hence a crucial test case forthe discriminative lexicon is to model the production andcomprehension of words in multi-word utterances For ourproduction model we have also completely ignored stresssyllabification and pitch which however likely has renderedthe production task more difficult than necessary

A challenge for discriminative morphology is the devel-opment of proper semantic matrices For languages withrich morphology such as Estonian current morphologicalparsers will identify case-inflected words as nominativesgenitives or partitives (among others) but these labelsfor inflectional forms do not correspond to the semanticfunctions encoded in these forms (see eg [178 179]) What

is required are computational tools that detect the appropriateinflectional lexomes for these words in the sentences orutterances in which they are embedded

A related problem specifically for the speech productionmodel is how to predict the strongly reduced word formsthat are rampant in conversational speech [111] Our auditorycomprehension network F119886 is confronted with reduced formsin training and the study of Arnold et al [18] indicates thatpresent identification accuracymay approximate human per-formance for single-word recognition Whereas for auditorycomprehension we are making progress what is required forspeech production is a much more detailed understanding ofthe circumstances under which speakers actually use reducedword forms The factors driving reduction may be in partcaptured by the semantic vectors but we anticipate thataspects of the unfolding discourse play an important role aswell which brings us back to the unresolved question of howto best model not isolated words but words in context

An important challenge for our theory which formalizesincremental implicit learning is how to address and modelthe wealth of explicit knowledge that speakers have abouttheir language We play with the way words sound in poetrywe are perplexed by wordsrsquo meanings and reinterpret themso that they make more sense (eg the folk etymology [180]reflected in modern English crawfish a crustacean and nota fish originally Middle English crevis compare ecrevissein French) we teach morphology in schools and we evenfind that making morphological relations explicit may bebeneficial for aphasic patients [181] The rich culture of worduse cannot but interact with the implicit system but howexactly this interaction unfolds andwhat its consequences areat both sides of the consciousunconscious divide is presentlyunclear

Although many questions remain our model is remark-ably successful in capturing many aspects of lexical andmorphological processing in both comprehension and pro-duction Since themodel builds on a combination of linguisti-callymotivated lsquosmartrsquo low-level representations for form andmeaning and large but otherwise straightforward andmathe-matically well-understood two-layer linear networks (to beclear networks without any hidden layers) the conclusionseems justified that algorithmically the ldquomental lexiconrdquomaybe much simpler than previously thought

Appendix

A Vectors Matrices and MatrixMultiplication

Figure 12 presents two data points 119886 and 119887 with coordinates(3 4) and (-2 -3) Arrows drawn from the origin to thesepoints are shown in blue with a solid and dashed linerespectively We refer to these points as vectors Vectors aredenoted with lower case letters in bold and we place the 119909coordinate of a point next to the 119910 coordinate

a = (3 4) b = (minus2 minus3) (A1)

32 Complexity

minus4 minus2 0 2 4

minus4

minus2

02

4

x

y

a

b

x

y

Figure 12 Points 119886 and 119887 can be transformed into points 119901 and 119902 bya linear transformation

We can represent a and b jointly by placing them next to eachother in a matrix For matrices we use capital letters in boldfont

A = ( 3 4minus2 minus3) (A2)

In addition to the points 119886 and 119887 Figure 12 also shows twoother datapoints 119909 and 119910 The matrix for these two points is

B = (minus5 24 minus1) (A3)

Let us assume that points 119886 and 119887 represent the forms oftwo words 1205961 and 1205962 and that the points 119909 and 119910 representthe meanings of these two words (below we discuss in detailhow exactly wordsrsquo forms and meanings can be representedas vectors of numbers) As morphology is the study of therelation between wordsrsquo forms and their meanings we areinterested in how to transform points 119886 and b into points119909 and 119910 and likewise in how to transform points 119909 and119910 back into points 119886 and 119887 The first transformation isrequired for comprehension and the second transformationfor production

We begin with the transformation for comprehensionFormally we have matrix A which we want to transforminto matrix B A way of mapping A onto B is by means ofa linear transformation using matrix multiplication For 2 times2 matrices matrix multiplication is defined as follows(1198861 11988621198871 1198872) (1199091 11990921199101 1199102) = (11988611199091 + 11988621199101 11988611199092 + 1198862119910211988711199091 + 11988721199101 11988711199092 + 11988721199102) (A4)

Such a mapping of A onto B is given by a matrix F

F = ( 1 2minus2 minus1) (A5)

and it is straightforward to verify that indeed( 3 4minus2 minus3) ( 1 2minus2 minus1) = (minus5 24 minus1) (A6)

Using matrix notation we can write

AF = B (A7)

Given A and B how do we obtain F The answeris straightforward but we need two further concepts theidentity matrix and the matrix inverse For multiplication ofnumbers the identity multiplication is multiplication with 1For matrices multiplication with the identity matrix I leavesthe locations of the points unchangedThe identitymatrix hasones on the main diagonal and zeros elsewhere It is easilyverified that indeed(1 00 1) (1199091 11990921199101 1199102) = (1199091 11990921199101 1199102) (1 00 1) = (1199091 11990921199101 1199102) (A8)

For numbers the inverse of multiplication with 119904 is dividingby 119904 (1119904 ) (119904119909) = 119909 (A9)

For matrices the inverse of a square matrix X is that matrixXminus1 such that their product is the identity matrix

Xminus1X = XXminus1 = I (A10)

For nonsquare matrices the inverse Yminus1 is defined such that

Yminus1 (YX) = (XY) 119884minus1 = X (A11)

We find the square matrix F that maps the square matrices Aonto B as follows

AF = B

Aminus1AF = Aminus1B

IF = Aminus1B

F = Aminus1B(A12)

Since for the present example A and its inverse happen to beidentical (Aminus1 = A) we obtain (A6)

Fmaps wordsrsquo form vectors onto wordsrsquo semantic vectorsLet us now consider the reverse and see how we can obtainwordsrsquo form vectors from wordsrsquo semantic vectors That isgiven B we want to find that matrix G which maps B ontoA ie

BG = A (A13)

Solving this equation proceeds exactly as above

BG = A

Bminus1BG = Bminus1A

IG = Bminus1A

G = Bminus1A(A14)

Complexity 33

The inverse of B is

Bminus1 = (13 2343 53) (A15)

and hence we have

(13 2343 53) ( 3 4minus2 minus3) = (minus13 minus2323 13 ) = G (A16)

The inverse of a matrix needs not exist A square matrixthat does not have an inverse is referred to as a singularmatrix Almost all matrices with which we work in theremainder of this study are singular For singular matricesan approximation of the inverse can be used such as theMoore-Penrose generalized inverse (In R the generalizedinverse is available in theMASS package function ginv Thenumeric library that is used by ginv is LAPACK availableat httpwwwnetliborglapack) In this study wedenote the generalized inverse of a matrix X by X1015840

The examples presented here are restricted to 2 times 2matrices but themathematics generalize tomatrices of largerdimensions When multiplying two matrices X and Y theonly constraint is that the number of columns ofX is the sameas the number of rows ofYThe resultingmatrix has the samenumber of rows as X and the same number of columns as Y

In the present study the rows of matrices represent wordsand columns the features of thesewords For instance the rowvectors of A can represent wordsrsquo form features and the rowvectors of and B the corresponding semantic features Thefirst row vector of A is mapped onto the first row vector ofB as follows (3 4) ( 1 2minus2 minus1) = (minus5 2) (A17)

A transformationmatrix such as F can be represented as atwo-layer networkThe network corresponding to F is shownin Figure 13 When the input vector (3 4) is presented to thenetwork the nodes 1198941 and 1198942 are set to 3 and 4 respectively Toobtain the activations 1199001 = minus5 and 1199002 = 2 for the output vector(minus5 2) we cumulate the evidence at the input weighted bythe connection strengths specified on the edges of the graphThus for 1199001 we obtain the value 3 times 1 + 4 times minus2 = minus5 and for1199002 we have 3 times 2 + 4 times minus1 = 2 In other words the networkmaps the input vector (3 4) onto the output vector (minus5 2)

An important property of linear maps is that they areproductive We can present a novel form vector say

s = (2 2) (A18)

to the network and it will map this vector onto a novelsemantic vector Using matrices this new semantic vector isstraightforwardly calculated(2 2) ( 1 2minus2 minus1) = (minus2 2) (A19)

1 2 minus2 minus1

i1 i2

o1 o2

semantic vectors come out here

form features go in here

connection weightsare the transformationmatrix

Figure 13 The linear network corresponding to the transformationmatrix F There are two input nodes 1198941 and 1198942 (in blue)and twooutput nodes 1199001 and 1199002 in red When the input vector (3 4) ispresented to the network the value at 1199001 is 3 times 1 + 4 times minus2 = minus5and that at 1199002 is 3 times 2 + 4 times minus1 = 2 The network maps the point(3 4) onto the point (minus5 2)B Graph-Based Triphone Sequencing

The hypothesis underlying graph-based triphone sequencingis that in the directed graph that has triphones as verticesand an edge between any two vertices that overlap (iesegments 2 and 3 of triphone 1 are identical to segments 1and 2 of triphone 2) the path from a wordrsquos initial triphone(the triphone starting with ) to a wordrsquos final triphone (thetriphone ending with a ) is the path receiving the bestsupport of all possible paths from the initial to the finaltriphone Unfortunately the directed graph containing allvertices and edges is too large tomake computations tractablein a reasonable amount of computation time The followingalgorithm is a heuristic algorithm that first collects potentiallyrelevant vertices and edges and then calculates all pathsstarting fromany initial vertex using theall simple pathsfunction from the igraph package [127] As a first step allvertices that are supported by the stem and whole wordwith network support exceeding 01 are selected To theresulting set those vertices are added that are supported bythe affix which is conditional on these vertices having anetwork support exceeding 095 A directed graph can nowbe constructed and for each initial vertex all paths startingat these vertices are calculated The subset of those pathsreaching a final vertex is selected and the support for eachpath is calculated As longer paths trivially will have greatersupport the support for a path is weighted for its lengthsimply by dividing the raw support by the path length Pathsare ranked by weighted path support and the path withmaximal support is selected as the acoustic image drivingarticulation

It is possible that no path from an initial vertex to a finalvertex is found due to critical boundary triphones not beinginstantiated in the data on which the model was trainedThis happens when the model is assessed on novel inflected

34 Complexity

nobo

dyso

met

hing

ever

ythi

ngno

thin

gev

eryo

nean

ythi

ngan

yone

som

eone

hers

him his

via

than I

exce

ptou

tside

thei

rsw

hoev

erth

ence

min

ein

side

amon

gst its

thei

rou

rm

yyo

uyo

ur her

she

your

selv

esw

hate

ver us we it

they he

them m

em

yself

him

self

hers

elfou

rselv

esyo

urse

lfits

elfth

emse

lves

abou

tw

ithou

tw

ithin

beyo

ndam

ong

betw

een

thro

ugho

utdu

ring

since by

from in of

with at fo

r tobe

low

abov

e up offdo

wn

upon on

agai

nst

tow

ard

tow

ards

arou

ndne

aral

ong

unde

rbe

hind

besid

eth

roug

hon

toin

toov

erac

ross

nobodysomethingeverythingnothingeveryoneanythinganyonesomeonehershimhisviathanIexceptoutsidetheirswhoeverthencemineinsideamongstitstheirourmyyouyourhersheyourselveswhateverusweittheyhethemmemyselfhimselfherselfourselvesyourselfitselfthemselvesaboutwithoutwithinbeyondamongbetweenthroughoutduringsincebyfrominofwithatfortobelowaboveupoffdownupononagainsttowardtowardsaroundnearalongunderbehindbesidethroughontointooveracross

minus01 0 01 02Value

Color Keyand Histogram

050

015

0025

00C

ount

Figure 14 Heatmap for the correlations of the semantic vectors of pronouns and prepositions Note that quantifier pronouns form a groupof their own that prepositions and the other pronouns form distinguisable groups and that within groups further subclusters are visible egforwe and us she and her and across and over All diagonal elements represent correlations equal to 1 (this is not brought out by the heatmapwhich color-codes diagonal elements with the color for the highest correlation of the range we specified 025)

and derived words under cross-validationTherefore verticesand edges are added to the graph for all nonfinal verticesthat are at the end of any of the paths starting at any initialtriphone Such novel vertices and edges are assigned zeronetwork support Typically paths from the initial vertex tothe final vertex can now be constructed and path support isevaluated as above

For an algorithm that allows triphone paths to includecycles which may be the case in languages with much richer

morphology than English see Baayen et al [85] and for codethe R packageWpmWithLdl described therein

C A Heatmap for Function Words(Pronouns and Prepositions)

The heatmap for the correlations of the semantic vectors ofpronouns and prepositions is presented in Figure 14

Complexity 35

Data Availability

With the exception of the data from theDistributed Little RedHen Lab all primary data are publicly available from the citedsources

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The authors are indebted to Geoff Hollis Jessie Nixon ChrisWestbury and LuisMienhardt for their constructive feedbackon earlier versions of this manuscript This research wassupported by an ERC advancedGrant (no 742545) to the firstauthor

References

[1] G Frege ldquoBegriffsschrift a formula language modeled uponthat of arithmetic for pure thoughtrdquo in From Frege to GodelA Source Book in Mathematical Logic 1879-1931 pp 1ndash82 1967

[2] B RussellAn Inquiry intoMeaning andTruth Allen andUnwinLondon UK 1942

[3] R Montague ldquoThe proper treatment of quantification in ordi-nary englishrdquo in Approaches to Natural Language pp 221ndash242Springer 1973

[4] N Chomsky Syntactic Structures vol 33 Mouton The Hague1957

[5] W J M Levelt Formal Grammars in Linguistics And Psycholin-guistics Volume 1 An Introduction to the Theory of FormalLanguages and Automata Volume 2 Applications in LinguisticTheory Volume 3 Psycholinguistic Applications John BenjaminsPublishing 2008

[6] D F Kleinschmidt and T Florian Jaeger ldquoRobust speechperception Recognize the familiar generalize to the similar andadapt to the novelrdquo Psychological Review vol 122 no 2 pp 148ndash203 2015

[7] L Breiman ldquoRandom forestsrdquo Machine Learning vol 45 no 1pp 5ndash32 2001

[8] C Strobl A-L Boulesteix A Zeileis and T Hothorn ldquoBiasin random forest variable importance measures illustrationssources and a solutionrdquo BMC Bioinformatics vol 8 2007

[9] A Hannun C Case J Casper et al ldquoDeep speechScaling up end-to-end speech recognitionrdquo 2014httpsarxivorgabs14125567

[10] Z Tuske P Golik R Schluter and H Ney ldquoAcoustic modelingwith deep neural networks using raw time signal for lvcsrrdquo inProceedings of the INTERSPEECH pp 890ndash894 2014

[11] J Schmidhuber ldquoDeep learning in neural networks anoverviewrdquo Neural Networks vol 61 pp 85ndash117 2015

[12] T Hannagan J C Ziegler S Dufau J Fagot and J GraingerldquoDeep learning of orthographic representations in baboonsrdquoPLoS ONE vol 9 no 1 2014

[13] J Grainger S Dufau M Montant J C Ziegler and J FagotldquoOrthographic processing in baboons (Papio papio)rdquo Sciencevol 336 no 6078 pp 245ndash248 2012

[14] D Scarf K Boy A U Reinert J Devine O Gunturkun andM Colombo ldquoOrthographic processing in pigeons (columba

livia)rdquo Proceedings of the National Academy of Sciences vol 113no 40 pp 11272ndash11276 2016

[15] M Linke F Broker M Ramscar and R H Baayen ldquoArebaboons learning ldquoorthographicrdquo representations Probablynotrdquo PLoS ONE vol 12 no 8 2017

[16] D H Hubel and T N Wiesel ldquoReceptive fields binocularinteraction and functional architecture in the catrsquos visualcortexrdquo The Journal of Physiology vol 160 pp 106ndash154 1962

[17] R H Baayen and P Hendrix ldquoTwo-layer networks non-linear separation and human learningrdquo in From Semantics toDialectometry Festschrift in honor of JohnNerbonne Tributes 32M Wieling M Kroon G Van Noord and G Bouma Eds pp13ndash22 College Publications 2017

[18] D Arnold F Tomaschek K Sering F Lopez and R H BaayenldquoWords from spontaneous conversational speech can be rec-ognized with human-like accuracy by an error-driven learningalgorithm that discriminates between meanings straight fromsmart acoustic features bypassing the phoneme as recognitionunitrdquo PLoS ONE vol 12 no 4 2017

[19] P Zwitserlood ldquoProcessing and representation of morphologi-cal complexity in native language comprehension and produc-tionrdquo in The Construction of Words G E Booij Ed pp 583ndash602 Springer 2018

[20] G Booij ldquoConstruction morphologyrdquo Language and LinguisticsCompass vol 4 no 7 pp 543ndash555 2010

[21] J P Blevins ldquoWord-based morphologyrdquo Journal of Linguisticsvol 42 no 3 pp 531ndash573 2006

[22] G Stump Inflectional Morphology A Theory of ParadigmStructure Cambridge University Press 2001

[23] R Beard ldquoOn the extent and nature of irregularity in thelexiconrdquo Lingua vol 42 no 4 pp 305ndash341 1977

[24] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1974

[25] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1991

[26] C Hockett ldquoThe origin of speechrdquo Scientific American vol 203pp 89ndash97 1960

[27] M Erelt Estonian Language M Erelt Ed Estonian AcademyPublishers Tallinn Estonia 2003

[28] P H Matthews Grammatical Theory in the United States fromBloomfield to Chomsky Cambridge University Press Cam-bridge 1993

[29] N Chomsky andMHalleTheSound Pattern of English Harperand Row NY USA 1968

[30] T Landauer and S Dumais ldquoA solution to Platorsquos problem Thelatent semantic analysis theory of acquisition induction andrepresentation of knowledgerdquo Psychological Review vol 104 no2 pp 211ndash240 1997

[31] W Weaver ldquoTranslationrdquo in Machine Translation of LanguagesFourteen Essays W N Locke and A D Booth Eds pp 15ndash23MIT Press Cambridge 1955

[32] J R Firth Selected Papers of J R Firth 1952-59 IndianaUniversity Press 1968

[33] K Lund and C Burgess ldquoProducing high-dimensional seman-tic spaces from lexical co-occurrencerdquo Behavior Research Meth-ods Instruments and Computers vol 28 no 2 pp 203ndash2081996

[34] C Shaoul and C Westbury ldquoExploring lexical co-occurrencespace using HiDExrdquo Behavior Research Methods vol 42 no 2pp 393ndash413 2010

36 Complexity

[35] T Mikolov I Sutskever K Chen G Corrado and J DeanldquoDistributed representations of words and phrases and theircompositionalityrdquo inAdvances in Neural Information ProcessingSystems pp 3111ndash3119 2013

[36] A Roelofs ldquoThe WEAVER model of word-form encoding inspeech productionrdquo Cognition vol 64 no 3 pp 249ndash284 1997

[37] W J M Levelt A Roelofs and A S Meyer ldquoA theory of lexicalaccess in speech productionrdquoBehavioral and Brain Sciences vol22 no 1 pp 1ndash38 1999

[38] M Taft ldquoInteractive-activation as a Framework for Under-standing Morphological Processingrdquo Language and CognitiveProcesses vol 9 no 3 pp 271ndash294 1994

[39] R Schreuder and R H Baayen ldquoModeling morphologicalprocessingrdquo inMorphological Aspects of Language Processing LB Feldman Ed pp 131ndash154 Lawrence ErlbaumHillsdale NewJersey USA 1995

[40] B C Love D L Medin and T M Gureckis ldquoSUSTAIN ANetwork Model of Category Learningrdquo Psychological Reviewvol 111 no 2 pp 309ndash332 2004

[41] C J Marsolek ldquoWhat antipriming reveals about primingrdquoTrends in Cognitive Sciences vol 12 no 5 pp 176ndash181 2008

[42] M Ramscar and R Port ldquoCategorization (without categories)rdquoin Handbook of Cognitive Linguistics E Dabrowska and DDivjak Eds pp 75ndash99 De Gruyter Berlin Germany 2015

[43] R A Rescorla and A R Wagner ldquoA theory of Pavlovianconditioning Variations in the effectiveness of reinforcementand nonreinforcementrdquo in Classical Conditioning II CurrentResearch and Theory A H Black and W F Prokasy Eds pp64ndash99 Appleton Century Crofts NY USA 1972

[44] B Widrow and M E Hoff ldquoAdaptive switching circuitsrdquo 1960WESCON Convention Record Part IV pp 96ndash104 1960

[45] R H Baayen R Schreuder and R Sproat ldquoMorphology inthe Mental Lexicon A Computational Model for Visual WordRecognitionrdquo in Lexicon Development for Speech and LanguageProcessing F van Eynde and D Gibbon Eds pp 267ndash291Kluwer Academic Publishers 2000

[46] M Taft ldquoA morphological-decomposition model of lexicalrepresentationrdquo Linguistics vol 26 no 4 pp 657ndash667 1988

[47] MWHarmandM S Seidenberg ldquoComputing themeanings ofwords in reading Cooperative division of labor between visualand phonological processesrdquo Psychological Review vol 111 no3 pp 662ndash720 2004

[48] P C Trimmer J M McNamara A Houston and J A RMarshall ldquoDoes natural selection favour the Rescorla-Wagnerrulerdquo Journal of Theoretical Biology vol 302 pp 39ndash52 2012

[49] R R Miller R C Barnet and N J Grahame ldquoAssessment ofthe Rescorla-Wagner modelrdquo Psychological Bulletin vol 117 no3 pp 363ndash386 1995

[50] L J Kamin ldquoPredictability surprise attention and condition-ingrdquo in Punishment and Aversive Behavior B A Campbell andRMChurch Eds pp 276ndash296Appleton-Century-CroftsNYUSA 1969

[51] R A Rescorla ldquoPavlovian conditioning Itrsquos not what you thinkit isrdquo American Psychologist vol 43 no 3 pp 151ndash160 1988

[52] M Ramscar D Yarlett M Dye K Denny and K ThorpeldquoThe Effects of Feature-Label-Order and Their Implications forSymbolic Learningrdquo Cognitive Science vol 34 no 6 pp 909ndash957 2010

[53] M Ramscar and D Yarlett ldquoLinguistic self-correction in theabsence of feedback A new approach to the logical problem oflanguage acquisitionrdquo Cognitive Science vol 31 no 6 pp 927ndash960 2007

[54] R H Baayen P Milin D F ETHurđevic P Hendrix and MMarelli ldquoAn Amorphous Model for Morphological Process-ing in Visual Comprehension Based on Naive DiscriminativeLearningrdquo Psychological Review vol 118 no 3 pp 438ndash4812011

[55] F M Del Prado Martın R Bertram T Haikio R Schreuderand R H Baayen ldquoMorphological family size in a morphologi-cally rich languageThe case of Finnish compared to Dutch andHebrewrdquo Journal of Experimental Psychology LearningMemoryand Cognition vol 30 pp 1271ndash1278 2004

[56] P Milin D F Durdevic and F M Del Prado Martın ldquoThesimultaneous effects of inflectional paradigms and classes onlexical recognition Evidence from Serbianrdquo Journal of Memoryand Language vol 60 no 1 pp 50ndash64 2009

[57] B K Bergen ldquoThe psychological reality of phonaesthemesrdquoLanguage vol 80 no 2 pp 290ndash311 2004

[58] P Milin L B Feldman M Ramscar P Hendrix and R HBaayen ldquoDiscrimination in lexical decisionrdquo PLoS ONE vol 12no 2 2017

[59] R H Baayen P Milin and M Ramscar ldquoFrequency in lexicalprocessingrdquo Aphasiology vol 30 no 11 pp 1174ndash1220 2016

[60] M Marelli and M Baroni ldquoAffixation in semantic space Mod-eling morpheme meanings with compositional distributionalsemanticsrdquo Psychological Review vol 122 no 3 pp 485ndash5152015

[61] T Sering PMilin andRH Baayen ldquoLanguage comprehensionas amultiple label classification problemrdquo StatisticaNeerlandicapp 1ndash15 2018

[62] R Kaye and RWilson Linear Algebra Oxford University Press1998

[63] S H Ivens and B L Koslin Demands for Reading LiteracyRequire New Accountability Methods Touchstone Applied Sci-ence Associates 1991

[64] T K Landauer P W Foltz and D Laham ldquoAn introduction tolatent semantic analysisrdquoDiscourse Processes vol 25 no 2-3 pp259ndash284 1998

[65] H Schmid ldquoImprovements in Part-of-Speech Tagging with anApplication to Germanrdquo in Proceedings of the ACL SIGDAT-Workshop pp 13ndash25 Dublin Ireland 1995

[66] R H Baayen R Piepenbrock and L Gulikers The CELEXLexical Database (CD-ROM) Linguistic Data ConsortiumUniversity of Pennsylvania Philadelphia PA 1995

[67] J Mitchell and M Lapata ldquoVector-based models of semanticcompositionrdquo in Proceedings of the ACL pp 236ndash244 2008

[68] A Lazaridou M Marelli R Zamparelli and M BaronildquoCompositionally derived representations of morphologicallycomplex words in distributional semanticsrdquo in Proceedings ofthe ACL (1) pp 1517ndash1526 2013

[69] R Cotterell H Schutze and J Eisner ldquoMorphological smooth-ing and extrapolation of word embeddingsrdquo in Proceedings ofthe 54th Annual Meeting of the Association for ComputationalLinguistics vol 1 pp 1651ndash1660 2016

[70] B D Zeller S Pado and J Snajder ldquoTowards semantic valida-tion of a derivational lexiconrdquo in Proceedings of the COLING2014 the 25th International Conference on Computational Lin-guistics Technical Papers pp 1728ndash1739 2014

[71] H Marchand The Categories and Types of Present-Day EnglishWord Formation A Synchronic-Diachronic Approach BeckrsquoscheVerlagsbuchhandlung Munchen Germany 1969

[72] L Bauer EnglishWord-Formation Cambridge University PressCambridge 1983

Complexity 37

[73] I PlagWord-Formation in English CambridgeUniversity PressCambridge 2003

[74] R Beard Lexeme-Morpheme Base Morphology A GeneralTheory of Inflection and Word Formation State University ofNew York Press Albany NY USA 1995

[75] K Geeraert J Newman and R H Baayen ldquoIdiom VariationExperimental Data and a Blueprint of a Computational ModelrdquoTopics in Cognitive Science vol 9 no 3 pp 653ndash669 2017

[76] C Shaoul S Bitschau N Schilling et al ldquondl2 Naive discrim-inative learning an implementation in R R packagerdquo 2015

[77] P Milin D Divjak and R H Baayen ldquoA learning perspectiveon individual differences in skilled reading Exploring andexploiting orthographic and semantic discrimination cuesrdquoJournal of Experimental Psychology Learning Memory andCognition vol 43 no 11 pp 1730ndash1751 2017

[78] M Ramscar P Hendrix C Shaoul P Milin and R H BaayenldquoNonlinear dynamics of lifelong learning the myth of cognitivedeclinerdquo Topics in Cognitive Science vol 6 pp 5ndash42 2014

[79] M Ramscar C C Sun P Hendrix and R H Baayen ldquoTheMis-measurement of Mind Life-Span Changes in Paired-Associate-Learning Scores Reflect the ldquoCostrdquo of Learning Not CognitiveDeclinerdquo Psychological Science vol 28 no 8 pp 1171ndash1179 2017

[80] G desRosiers and D Ivison ldquoPaired associate learning Form 1and form 2 of the Wechsler memory scalerdquo Archives of ClinicalNeuropsychology vol 3 no 1 pp 47ndash67 1988

[81] E Bruni N K Tran andM Baroni ldquoMultimodal distributionalsemanticsrdquo Journal of Artificial Intelligence Research vol 49 pp1ndash47 2014

[82] W N Venables and B D RipleyModern Applied Statistics withS-PLUS Springer NY USA 2002

[83] A B Warriner V Kuperman and M Brysbaert ldquoNorms ofvalence arousal and dominance for 13915 English lemmasrdquoBehavior Research Methods vol 45 no 4 pp 1191ndash1207 2013

[84] D Kastovsky ldquoProductivity in word formationrdquo Linguistics vol24 no 3 pp 585ndash600 1986

[85] R H Baayen Y Chuang and J P Blevins ldquoInflectionalmorphology with linear mappingsrdquoTheMental Lexicon vol 13no 2 pp 232ndash270 2018

[86] Burnage CELEX A Guide for Users Centre for Lexical Infor-mation Nijmegen The Netherlands 1988

[87] C McBride-Chang ldquoModels of Speech Perception and Phono-logical Processing in Readingrdquo Child Development vol 67 no4 pp 1836ndash1856 1996

[88] G C Van Orden H Kloos et al ldquoThe question of phonologyand readingrdquo inThe Science of Reading A Handbook pp 61ndash782005

[89] D Jared and K OrsquoDonnell ldquoSkilled adult readers activate themeanings of high-frequency words using phonology Evidencefrom eye trackingrdquoMemoryampCognition vol 45 no 2 pp 334ndash346 2017

[90] G S Dell ldquoA Spreading-Activation Theory of Retrieval inSentence Productionrdquo Psychological Review vol 93 no 3 pp283ndash321 1986

[91] A Marantz ldquoNo escape from morphemes in morphologicalprocessingrdquo Language and Cognitive Processes vol 28 no 7 pp905ndash916 2013

[92] M Bozic W D Marslen-Wilson E A Stamatakis M HDavis and L K Tyler ldquoDifferentiating morphology formand meaning Neural correlates of morphological complexityrdquoCognitive Neuroscience vol 19 no 9 pp 1464ndash1475 2007

[93] R F Port andA P Leary ldquoAgainst formal phonologyrdquoLanguagevol 81 no 4 pp 927ndash964 2005

[94] J P Blevins ldquoStems and paradigmsrdquo Language vol 79 no 4 pp737ndash767 2003

[95] R Forsyth and D Holmes ldquoFeature-finding for test classifica-tionrdquo Literary and Linguistic Computing vol 11 no 4 pp 163ndash174 1996

[96] H Pham and R H Baayen ldquoVietnamese compounds showan anti-frequency effect in visual lexical decisionrdquo LanguageCognition and Neuroscience vol 30 no 9 pp 1077ndash1095 2015

[97] L Cohen and S Dehaene ldquoVentral and dorsal contributions toword readingrdquo inTheCognitive Neurosciences M S GazzanigaEd pp 789ndash804 Massachusetts Institute of Technology 2009

[98] E Keuleers P Lacey K Rastle and M Brysbaert ldquoThe BritishLexicon Project Lexical decision data for 28730 monosyllabicand disyllabic English wordsrdquo Behavior Research Methods vol44 no 1 pp 287ndash304 2012

[99] R K Wagner J K Torgesen and C A Rashotte ldquoDevel-opment of reading-related phonological processing abilitiesnew evidence of bidirectional causality from a latent variablelongitudinal studyrdquoDevelopmental Psychology vol 30 no 1 pp73ndash87 1994

[100] K F EWong andH-C Chen ldquoOrthographic and phonologicalprocessing in reading Chinese text Evidence from eye fixa-tionsrdquo Language and Cognitive Processes vol 14 no 5-6 pp461ndash480 1999

[101] R L Newman D Jared and C A Haigh ldquoDoes phonologyplay a role when skilled readers read high-frequency wordsEvidence from ERPsrdquo Language and Cognitive Processes vol 27no 9 pp 1361ndash1384 2012

[102] D Jared J Ashby S J Agauas and B A Levy ldquoPhonologicalactivation of word meanings in grade 5 readersrdquo Journal ofExperimental Psychology LearningMemory andCognition vol42 no 4 pp 524ndash541 2016

[103] T Bitan A Kaftory A Meiri-Leib Z Eviatar and O PelegldquoPhonological ambiguity modulates resolution of semanticambiguity during reading An fMRI study of Hebrewrdquo Neu-ropsychology vol 31 no 7 pp 759ndash777 2017

[104] D Jared and S Bainbridge ldquoReading homophone puns Evi-dence from eye trackingrdquo Canadian Journal of ExperimentalPsychology vol 71 no 1 pp 2ndash13 2017

[105] S Amenta M Marelli and S Sulpizio ldquoFrom sound tomeaning Phonology-to-Semantics mapping in visual wordrecognitionrdquo Psychonomic Bulletin amp Review vol 24 no 3 pp887ndash893 2017

[106] M Perrone-Bertolotti J Kujala J R Vidal et al ldquoHow silent issilent reading intracerebral evidence for top-down activationof temporal voice areas during readingrdquo The Journal of Neuro-science vol 32 no 49 pp 17554ndash17562 2012

[107] B Yao P Belin andC Scheepers ldquoSilent reading of direct versusindirect speech activates voice-selective areas in the auditorycortexrdquo Journal of Cognitive Neuroscience vol 23 no 10 pp3146ndash3152 2011

[108] M Coltheart B Curtis P Atkins and M Haller ldquoMod-els of Reading Aloud Dual-Route and Parallel-Distributed-Processing Approachesrdquo Psychological Review vol 100 no 4pp 589ndash608 1993

[109] M Coltheart ldquoModeling reading The dual-route approachrdquo inThe Science of Reading A Handbook pp 6ndash23 2005

38 Complexity

[110] M Pitt K Johnson E Hume S Kiesling and W RaymondldquoThe Buckeye corpus of conversational speech Labeling con-ventions and a test of transcriber reliabilityrdquo Speech Communi-cation vol 45 no 1 pp 89ndash95 2005

[111] K Johnson ldquoMassive reduction in conversational AmericanEnglishrdquo in Proceedings of the Spontaneous Speech Data andAnalysis Proceedings of the 1st Session of the 10th InternationalSymposium pp 29ndash54 The National International Institute forJapanese Language Tokyo Japan 2004

[112] C Phillips ldquoLevels of representation in the electrophysiology ofspeech perceptionrdquo Cognitive Science vol 25 no 5 pp 711ndash7312001

[113] R L Diehl A J Lotto and L L Holt ldquoSpeech perceptionrdquoAnnual Review of Psychology vol 55 pp 149ndash179 2004

[114] D Norris and J M McQueen ldquoShortlist B a Bayesian model ofcontinuous speech recognitionrdquo Psychological Review vol 115no 2 pp 357ndash395 2008

[115] S Hawkins ldquoRoles and representations of systematic finephonetic detail in speech understandingrdquo Journal of Phoneticsvol 31 no 3-4 pp 373ndash405 2003

[116] CCucchiarini andH Strik ldquoAutomatic Phonetic TranscriptionAn overviewrdquo in Proceedings of the 15th ICPhS pp 347ndash350Barcelona Spain 2003

[117] K JohnsonTheAuditoryPerceptual Basis for Speech Segmenta-tion Ohio State University Working Papers in Linguistics 1997

[118] H Fletcher ldquoAuditory patternsrdquo Reviews of Modern Physics vol12 no 1 pp 47ndash65 1940

[119] D Arnold ldquoAcousticndlcoder Coding sound files for use withndl R package version 101rdquo 2017

[120] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2016

[121] J Pickett and I Pollack ldquoIntelligibility of excerpts from fluentspeech Effects of rate of utterance and duration of excerptrdquoLanguage and Speech vol 6 pp 151ndash164 1963

[122] L Shockey ldquoPerception of reduced forms by non-native speak-ers of Englishrdquo in Sound Patterns of Spontaneous Speech DDuez Ed pp 97ndash100 ESCA Aix 1998

[123] M Ernestus H Baayen and R Schreuder ldquoThe recognition ofreduced word formsrdquo Brain and Language vol 81 no 1-3 pp162ndash173 2002

[124] R H Baayen C Shaoul J Willits and M Ramscar ldquoCom-prehension without segmentation a proof of concept withnaive discriminative learningrdquo Language Cognition and Neu-roscience vol 31 no 1 pp 106ndash128 2016

[125] C P Browman and L Goldstein ldquoArticulatory phonology Anoverviewrdquo Phonetica vol 49 no 3-4 pp 155ndash180 1992

[126] J A Tourville and F H Guenther ldquoThe DIVA model A neuraltheory of speech acquisition and productionrdquo Language andCognitive Processes vol 26 no 7 pp 952ndash981 2011

[127] G Csardi and T Nepusz ldquoThe igraph software package forcomplex network researchrdquo InterJournal Complex Systems vol1695 no 5 pp 1ndash9 2006

[128] M Ernestus and R H Baayen ldquoPredicting the unpredictableInterpreting neutralized segments in Dutchrdquo Language vol 79no 1 pp 5ndash38 2003

[129] R Weingarten G Nottbusch and U Will ldquoMorphemes sylla-bles and graphemes in written word productionrdquo inMultidisci-plinary Approaches to Speech Production T Pechmann and CHabel Eds pp 529ndash572 Mouton de Gruyter Berlin Germany2004

[130] R Bertram F E Toslashnnessen S Stromqvist J Hyona andP Niemi ldquoCascaded processing in written compound wordproductionrdquo Frontiers in Human Neuroscience vol 9 pp 1ndash102015

[131] T Cho ldquoEffects of morpheme boundaries on intergesturaltiming Evidence from Koreanrdquo Phonetica vol 58 no 3 pp129ndash162 2001

[132] S N Wood Generalized Additive Models Chapman ampHallCRC NY USA 2017

[133] M Seidenberg ldquoSublexical structures in visual word recogni-tion Access units or orthographic redundancyrdquo in Attentionand Performance XII M Coltheart Ed pp 245ndash264 LawrenceErlbaum Associates Hove 1987

[134] J M McQueen ldquoSegmentation of continuous speech usingphonotacticsrdquo Journal of Memory and Language vol 39 no 1pp 21ndash46 1998

[135] J B Hay Causes and Consequences of Word Structure Rout-ledge NY USA London UK 2003

[136] J B Hay ldquoFrom speech perception to morphology Affix-ordering revisitedrdquo Language vol 78 pp 527ndash555 2002

[137] J B Hay and R H Baayen ldquoPhonotactics parsing and produc-tivityrdquo Italian Journal of Linguistics vol 15 no 1 pp 99ndash1302003

[138] J Hay J Pierrehumbert and M Beckman ldquoSpeech perceptionwell-formedness and the statistics of the lexiconrdquo Papers inlaboratory phonology VI pp 58ndash74 2004

[139] M Ferro C Marzi and V Pirrelli ldquoA self-organizing modelof word storage and processing implications for morphologylearningrdquo Lingue e Linguaggio vol 10 no 2 pp 209ndash226 2011

[140] F ChersiM Ferro G Pezzulo andV Pirrelli ldquoTopological self-organization and prediction learning support both action andlexical chains in the brainrdquo Topics in Cognitive Science vol 6no 3 pp 476ndash491 2014

[141] V Pirrelli M Ferro and C Marzi ldquoComputational complexityof abstractive morphologyrdquo in Understanding and MeasuringMorphological Complexity M Bearman D Brown and G GCorbett Eds pp 141ndash166 Oxford University Press 2015

[142] C Marzi M Ferro and V Pirrelli ldquoIs inflectional irregularitydysfunctional to human processingrdquo in Abstract Booklet TheMental Lexicon 2018 V Kuperman Ed University of AlbertaEdmonton 2018

[143] E Keuleers M Stevens P Mandera and M Brysbaert ldquoWordknowledge in the crowd Measuring vocabulary size and wordprevalence in a massive online experimentrdquo The QuarterlyJournal of Experimental Psychology vol 68 no 8 pp 1665ndash16922015

[144] M Ernestus Voice Assimilation And Segment Reduction inCasual Dutch A Corpus-Based Study of the Phonology-PhoneticsInterface LOT Utrecht the Netherlands 2000

[145] R Kemps M Ernestus R Schreuder and H Baayen ldquoProcess-ing reducedword formsThe suffix restoration effectrdquoBrain andLanguage vol 90 no 1-3 pp 117ndash127 2004

[146] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) vol 1 pp 886ndash893 June 2005

[147] G Hickok ldquoThe architecture of speech production and the roleof the phoneme in speech processingrdquo Language Cognition andNeuroscience vol 29 no 1 pp 2ndash20 2014

[148] B V Tucker M Sims and R H Baayen Opposing Forceson Acoustic Duration Manuscript University of Alberta andUniversity of Tubingen 2018

Complexity 39

[149] I Plag J Homann and G Kunter ldquoHomophony and mor-phology The acoustics of word-final S in Englishrdquo Journal ofLinguistics vol 53 no 1 pp 181ndash216 2017

[150] F Tomaschek I Plag M Ernestus and R H BaayenModeling the Duration of Word-Final s in English withNaive Discriminative Learning Manuscript University ofSiegenTubingenNijmegen 2018

[151] R E Kalman ldquoA new approach to linear filtering and predictionproblemsrdquo Journal of Fluids Engineering vol 82 no 1 pp 35ndash451960

[152] B Russell ldquoOn denotingrdquo Mind vol 14 no 56 pp 479ndash4931905

[153] N Hornstein Logical Form FromGB toMinimalism Blackwell1995

[154] R Jackendoff Semantic StructuresMITPress Cambridge 1990[155] J J McCarthy ldquoA prosodic theory of non-concatenative mor-

phologyrdquo Linguistic Inquiry vol 12 pp 373ndash418 1981[156] A Ussishkin ldquoA Fixed Prosodic Theory of Nonconcatenative

Templaticmorphologyrdquo Natural Language amp Linguistic Theoryvol 23 no 1 pp 169ndash218 2005

[157] A Ussishkin ldquoAffix-favored contrast inequity and psycholin-guistic grounding for non-concatenative morphologyrdquo Mor-phology vol 16 no 1 pp 107ndash125 2006

[158] RW Young andWMorganTheNavajo Language A Grammarand Colloquial Dictionary University of New Mexico Press1980

[159] S C Levinson and A Majid ldquoThe island of time Yelı Dnye thelanguage of Rossel islandrdquo Frontiers in psychology vol 4 2013

[160] S C Levinson ldquoThe language of space in Yelı Dnyerdquo inGrammars of Space Explorations in Cognitive Diversity pp 157ndash203 Cambridge University Press 2006

[161] E Shafaei-Bajestan and R H Baayen ldquoWide Learning forAuditory Comprehensionrdquo in Proceedings of the Interspeech2018 2018

[162] J Hay and K Drager ldquoStuffed toys and speech perceptionrdquoLinguistics vol 48 no 4 pp 865ndash892 2010

[163] V Henri and C Henri ldquoOn our earliest recollections ofchildhoodrdquo Psychological Review vol 2 pp 215-216 1895

[164] S Freud ldquoChildhood and concealing memoriesrdquo in The BasicWritings of Sigmund Freud The Modern Library NY USA19051953

[165] P J Bauer and M Larkina ldquoThe onset of childhood amnesiain childhood A prospective investigation of the course anddeterminants of forgetting of early-life eventsrdquoMemory vol 22no 8 pp 907ndash924 2014

[166] M S Seidenberg and L M Gonnerman ldquoExplaining deriva-tional morphology as the convergence of codesrdquo Trends inCognitive Sciences vol 4 no 9 pp 353ndash361 2000

[167] M Coltheart K Rastle C Perry R Langdon and J ZieglerldquoDRC a dual route cascaded model of visual word recognitionand reading aloudrdquoPsychological Review vol 108 no 1 pp 204ndash256 2001

[168] M V Butz and E F Kutter How the Mind Comes into BeingIntroducing Cognitive Science from a Functional and Computa-tional Perspective Oxford University Press 2016

[169] M W Harm and M S Seidenberg ldquoPhonology readingacquisition and dyslexia Insights from connectionist modelsrdquoPsychological Review vol 106 no 3 pp 491ndash528 1999

[170] L M Gonnerman M S Seidenberg and E S AndersenldquoGraded semantic and phonological similarity effects in prim-ing Evidence for a distributed connectionist approach to

morphologyrdquo Journal of Experimental Psychology General vol136 no 2 pp 323ndash345 2007

[171] H Zhao O Gallo I Frosio and J Kautz ldquoLoss Functions forImage Restoration With Neural Networksrdquo IEEE Transactionson Computational Imaging vol 3 no 1 pp 47ndash57 2017

[172] C Clarke and P Luce ldquoPerceptual adaptation to speakercharacteristics Vot boundaries in stop voicing categorizationrdquoin Proceedings of the ISCA Workshop on Plasticity in SpeechPerception 2005

[173] DNorris JMMcQueen andACutler ldquoPerceptual learning inspeechrdquo Cognitive Psychology vol 47 no 2 pp 204ndash238 2003

[174] A Schweitzer and N Lewandowski ldquoSocial factors in conver-gence of f1 and f2 in spontaneous speechrdquo in Proceedings ofthe 10th International Seminar on Speech Production CologneGermany 2014

[175] M J Pickering and S Garrod ldquoToward a mechanistic psychol-ogy of dialoguerdquo Behavioral and Brain Sciences vol 27 no 2 pp169ndash190 2004

[176] C M Clarke and M F Garrett ldquoRapid adaptation to foreign-accented Englishrdquo The Journal of the Acoustical Society ofAmerica vol 116 no 6 pp 3647ndash3658 2004

[177] W J Levelt ldquoDie konnektionistische moderdquo Sprache UndKognition vol 10 no 2 pp 61ndash72 1991

[178] A Kostic ldquoInformational load constraints on processinginflected morphologyrdquo in Morphological Aspects of LanguageProcessing L B Feldman Ed Lawrence Erlbaum Inc Publish-ers New Jersey USA 1995

[179] J P BlevinsWord andParadigmMorphology OxfordUniversityPress 2016

[180] E Forstemann Uber Deutsche volksetymologie Zeitschrift furvergleichende Sprachforschung auf dem Gebiete des DeutschenGriechischen und Lateinischen 1852

[181] K Nault Morphological Therapy Protocal [Ph D thesis] Uni-versity of Alberta Edmonton 2010

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 2: The Discriminative Lexicon: A Unified Computational Model ...downloads.hindawi.com/journals/complexity/2019/4895891.pdf · as developed within computational linguistics and com-puter

2 Complexity

A second response is to interpret the units on thehidden layers of deep learning networks as capturing therepresentations and their hierarchical organization familiarfrom standard linguistic frameworks The dangers inherentin this approach are well illustrated by the deep learningmodel proposed by Hannagan et al [12] for lexical learningin baboons [13] The hidden layers of their deep learningnetwork were claimed to correspond to areas in the ventralpathway of the primate brain However Scarf et al [14]reported that pigeons can also learn to discriminate betweenEnglish words and nonword strings Given that the avianbrain is organized very differently from the primate brain andyet accomplishes the same task the claim that the layers ofHannagan et alrsquos deep learning network correspond to areasin the ventral pathway must be premature FurthermoreLinke et al [15] showed that baboon lexical learning can bemodeled much more precisely by a two-layer wide learningnetwork It is also noteworthy that while for vision someform of hierarchical layering of increasingly specific receptivefields is well established [16] it is unclear whether similarneural organization characterizes auditory comprehensionand speech production

A third response is to take the ground-breaking resultsfrom machine learning as a reason for rethinking againstthe backdrop of linguistic domain knowledge and at thefunctional level the nature of the algorithms that underlielanguage learning and language processing

The present study presents the results of an ongoingresearch program that exemplifies the third kind of responsenarrowed down to the lexicon and focusing on those algo-rithms that play a central role in making possible the basicsof visual and auditory comprehension as well as speechproduction The model that we propose here brings togetherseveral strands of research across theoretical morphologypsychology and machine learning Our model can be seenas a computational implementation of paradigmatic analogyin word and paradigm morphology From psychology ourmodel inherits the insight that classes and categories areconstantly recalibrated as experience unfolds and that thisrecalibration can be captured to a considerable extent by verysimple principles of error-driven learning We adopted end-to-end modeling from machine learning but we kept ournetworks as simple as possible as the combination of smartfeatures and linear mappings is surprisingly powerful [17 18]Anticipating discussion of technical details we implementlinear networks (mathematically linear mappings) that arebased entirely on discrimination as learning mechanismand that work with large numbers of features at muchlower levels of representation than in current and classicalmodels

Section 2 provides the theoretical background for thisstudy and introduces the central ideas of the discriminativelexicon that are implemented in subsequent sections Sec-tion 3 introduces our operationalization of lexical semanticsSection 4 discusses visual and auditory comprehension andSection 5 presents how we approach the modeling of speechproductionThis is followed by a brief discussion of how timecan be brought into themodel (Section 6) In the final sectionwe discuss the implications of our results

2 Background

This section lays out some prerequisites that we will buildon in the remainder of this study We first introduce Wordand Paradigm morphology an approach to word structuredeveloped in theoretical morphology within the broadercontext of linguistics Word and Paradigm morphologyprovides the background for our highly critical evaluationof the morpheme as theoretical unit The many problemsassociated with morphemes motivated the morpheme-freecomputational model developed below Section 22 intro-duces the idea of vector representations for word meaningsas developed within computational linguistics and com-puter science Semantic vectors lie at the heart of how wemodel word meaning The next Section 23 explains howwe calculated the semantic vectors that we used in thisstudy Section 24 discusses naive discriminative learningand explains why when learning the relation between formvectors and semantic vectors it is advantageous to use lineardiscriminative learning instead of naive discriminative learn-ing The last subsection explains how linear discriminativelearning provides a mathematical formalization of the notionof proportional analogy that is central toWord and Paradigmmorphology

21 Word and Paradigm Morphology The dominant view ofthemental lexicon in psychology and cognitive science is wellrepresented by Zwitserlood [19 p 583]

Parsing and composition mdash for which there isample evidence from many languages mdash requiremorphemes to be stored in addition to infor-mation as to how morphemes are combinedor to whole-word representations specifying thecombination

Words are believed to be built from morphemes either byrules or by constructional schemata [20] and the meaningsof complexwords are assumed to be a compositional functionof the meanings of their parts

This perspective on word structure which has found itsway into almost all introductory textbooks on morphologyhas its roots in post-Bloomfieldian American structuralism[21] However many subsequent studies have found thisperspective to be inadequate [22] Beard [23] pointed outthat in language change morphological form and mor-phological meaning follow their own trajectories and thetheoretical construct of the morpheme as a minimal signcombining form and meaning therefore stands in the wayof understanding the temporal dynamics of language Beforehim Matthews [24 25] had pointed out that the inflectionalsystem of Latin is not well served by analyses positing thatits fusional system is best analyzed as underlying agglu-tinative (ie as a morphological system in which wordsconsist of sequences of morphemes as (approximately) inTurkish see also [26]) Matthews argued that words are thebasic units and that proportional analogies between wordswithin paradigms (eg walkwalks = talktalks) make thelexicon as a system productive By positing the word asbasic unit Word and Paradigm morphology avoids a central

Complexity 3

problem that confronts morpheme-based theories namelythat systematicities in form can exist without correspondingsystematicities in meaning Minimal variation in form canserve to distinguish words or to predict patterns of variationelsewhere in a paradigm or class One striking exampleis given by the locative cases in Estonian which expressmeanings that in English would be realized with locativeand directional prepositions Interestingly most plural caseendings in Estonian are built on the form of the partitivesingular (a grammatical case that can mark for instancethe direct object) However the semantics of these pluralcase forms do not express in any way the semantics of thesingular and the partitive For instance the form for thepartitive singular of the noun for lsquolegrsquo is jalga The formfor expressing lsquoon the legsrsquo jalgadele takes the form of thepartitive singular and adds the formatives for plural and thelocative case for lsquoonrsquo the so-called adessive (see [21 27]for further details) Matthews [28 p 92] characterized theadoption by Chomsky and Halle [29] of the morpheme aslsquoa remarkable tribute to the inertia of ideasrsquo and the samecharacterization pertains to the adoption of themorpheme bymainstream psychology and cognitive science In the presentstudy we adopt from Word and Paradigm morphology theinsight that the word is the basic unit of analysis Wherewe take Word and Paradigm morphology a step furtheris in how we operationalize proportional analogy As willbe laid out in more detail below we construe analogies asbeing system-wide rather than constrained to paradigms andour mathematical formalization better integrates semanticanalogies with formal analogies

22 Distributional Semantics The question that arises at thispoint is what word meanings are In this study we adoptthe approach laid out by Landauer and Dumais [30] andapproximate word meanings by means of semantic vectorsreferred to as embeddings in computational linguisticsWeaver [31] and Firth [32] noted that words with similardistributions tend to have similar meanings This intuitioncan be formalized by counting how often words co-occuracross documents or within some window of text In thisway a wordrsquos meaning comes to be represented by a vectorof reals and the semantic similarity between two words isevaluated by means of the similarities of their vectors Onesuch measure is the cosine similarity of the vectors a relatedmeasure is the Pearson correlation of the vectors Manyimplementations of the same general idea are available suchas hal [33] HiDEx [34] and word2vec [35] Below weprovide further detail on how we estimated semantic vectors

There are two ways in which semantic vectors can beconceptualized within the context of theories of the mentallexicon First semantic vectors could be fixed entities thatare stored in memory in a way reminiscent of a standardprinted or electronic dictionary the entries of which consistof a search key (a wordrsquos form) and a meaning specification(the information accessed through the search key) Thisconceptualization is very close to the currently prevalentway of thinking in psycholinguistics which has adopted aform of naive realism in which word meanings are typicallyassociated with monadic concepts (see eg [36ndash39]) It is

worth noting that when one replaces these monadic conceptsby semantic vectors the general organization of (paper andelectronic) dictionaries can still be retained resulting inresearch questions addressing the nature of the access keys(morphemes whole-words perhaps both) and the processof accessing these keys

However there is good reason to believe that wordmeanings are not fixed static representations The literatureon categorization indicates that the categories (or classes)that arise as we interact with our environment are constantlyrecalibrated [40ndash42] A particularly eloquent example isgiven by Marsolek [41] In a picture naming study hepresented subjects with sequences of two pictures He askedsubjects to say aloud the name of the second picture Thecritical manipulation concerned the similarity of the twopicturesWhen the first picture was very similar to the second(eg a grand piano and a table) responses were slowercompared to the control condition in which visual similaritywas reduced (orange and table) What we see at work hereis error-driven learning when understanding the picture ofa grand piano as signifying a grand piano features such ashaving a large flat surface are strengthened to the grand pianoandweakened to the table As a consequence interpreting thepicture of a table has become more difficult which in turnslows the word naming response

23 Discrimination Learning of Semantic Vectors What isrequired therefore is a conceptualization of word meaningsthat allows for the continuous recalibration of thesemeaningsas we interact with the world To this end we constructsemantic vectors using discrimination learning We take thesentences from a text corpus and for each sentence in theorder in which these sentences occur in the corpus train alinear network to predict the words in that sentence fromthe words in that sentence The training of the network isaccomplished with a simplified version of the learning ruleof Rescorla and Wagner [43] basically the learning rule ofWidrow and Hoff [44] We denote the connection strengthfrom input word 119894 to output word 119895 by 119908119894119895 Let 120575119894 denote anindicator variable that is 1 when input word 119894 is present insentence 119905 and zero otherwise Likewise let 120575119895 denotewhetheroutput word 119895 is present in the sentence Given a learning rate120588 (120588 ≪ 1) the change in the connection strength from (input)word 119894 to (output) word 119895 for sentence (corpus time) 119905 Δ 119894119895 isgiven by Δ 119894119895 = 120575119894120588 (120575119895 minus sum

119896

120575119896119908119896119895) (1)

where 120575119894 and 120575119895 vary from sentence to sentence dependingon which cues and outcomes are present in a sentence Given119899 distinct words an 119899 times 119899 network is updated incrementallyin this way The row vectors of the networkrsquos weight matrixS define wordsrsquo semantic vectors Below we return in moredetail to how we derived the semantic vectors used in thepresent study Importantly we now have semantic represen-tations that are not fixed but dynamic they are constantlyrecalibrated from sentence to sentence (the property ofsemantic vectors changing over time as experience unfolds

4 Complexity

is not unique to the present approach but is shared withother algorithms for constructing semantic vectors such asword2vec However time-variant semantic vectors are instark contrast to the monadic units representing meaningsin models such as proposed by Baayen et al [45] Taft [46]Levelt et al [37] The distributed representation for wordsrsquomeanings in the triangle model [47] were derived fromWordNet and hence are also time-invariant) Thus in whatfollows a wordrsquos meaning is defined as a semantic vectorthat is a function of time it is subject to continuous changeBy itself without context a wordrsquos semantic vector at time 119905reflects its lsquoentanglementrsquo with other words

Obviously it is extremely unlikely that our understandingof the classes and categories that we discriminate betweenis well approximated just by lexical cooccurrence statisticsHowever we will show that text-based semantic vectors aregood enough as a first approximation for implementing acomputationally tractable discriminative lexicon

Returning to the learning rule (1) we note that it iswell-known to capture important aspects of the dynamics ofdiscrimination learning [48 49] For instance it captures theblocking effect [43 50 51] as well as order effects in learning[52] The blocking effect is the finding that when one featurehas been learned to perfection adding a novel feature doesnot improve learning This follows from (1) which is that as119908119894119895 tends towards 1 Δ 119894119895 tends towards 0 A novel featureeven though by itself it is perfectly predictive is blocked fromdeveloping a strong connection weight of its own The ordereffect has been observed for eg the learning of words forcolors Given objects of fixed shape and size but varying colorand the corresponding color words it is essential that theobjects are presented to the learner before the color wordThis ensures that the objectrsquos properties become the featuresfor discriminating between the color words In this caseapplication of the learning rule (1) will result in a strongconnection between the color features of the object to theappropriate color word If the order is reversed the colorwords become features predicting object properties In thiscase application of (1) will result in weights from a color wordto object features that are proportional to the probabilities ofthe objectrsquos features in the training data

Given the dynamics of discriminative learning staticlexical entries are not useful Anticipating more detaileddiscussion below we argue that actually the whole dictio-nary metaphor of lexical access is misplaced and that incomprehensionmeanings are dynamically created from formand that in production forms are dynamically created frommeanings Importantly what forms andmeanings are createdwill vary from case to case as all learning is fundamentallyincremental and subject to continuous recalibration In orderto make the case that this is actually possible and computa-tionally feasible we first introduce the framework of naivediscriminative learning discuss its limitations and then layout how this approach can be enhanced to meet the goals ofthis study

24 Naive Discriminative Learning Naive discriminativelearning (ndl) is a computational framework grounded

in learning rule (1) that was inspired by prior work ondiscrimination learning (eg [52 53]) A model for readingcomplex words was introduced by Baayen et al [54] Thisstudy implemented a two-layer linear network with letterpairs as input features (henceforth cues) and units repre-senting meanings as output classes (henceforth outcomes)Following Word and Paradigm morphology this model doesnot include form representations for morphemes or wordforms It is an end-to-end model for the relation betweenform and meaning that sets itself the task to predict wordmeanings from sublexical orthographic features Baayen etal [54] were able to show that the extent to which cuesin the visual input support word meanings as gaugedby the activation of the corresponding outcomes in thenetwork reflects a wide range of phenomena reported inthe lexical processing literature including the effects offrequency of occurrence family size [55] relative entropy[56] phonaesthemes (nonmorphemic sounds with a con-sistent semantic contribution such as gl in glimmer glowand glisten) [57] and morpho-orthographic segmentation[58]

However the model of Baayen et al [54] adopted a starkform of naive realism albeit just for reasons of computationalconvenience A wordrsquos meaning was defined in terms of thepresence or absence of an outcome (120575119895 in (1)) Subsequentwork sought to address this shortcoming by developingsemantic vectors using learning rule (1) to predict wordsfrom words as explained above [58 59] The term ldquolexomerdquowas introduced as a technical term for the outcomes ina form-to-meaning network conceptualized as pointers tosemantic vectors

In this approach lexomes do double duty In a firstnetwork lexomes are the outcomes that the network istrained to discriminate between given visual input In asecond network the lexomes are the lsquoatomicrsquo units in acorpus that serve as the input for building a distributionalmodel with semantic vectors Within the theory unfoldedin this study the term lexome refers only to the elementsfrom which a semantic vector space is constructed Thedimension of this vector space is equal to the number oflexomes and each lexome is associated with its own semanticvector

It turns out however that mathematically a networkdiscriminating between lexomes given orthographic cues asin naive discriminative learning models is suboptimal Inorder to explain why the set-up of ndl is suboptimal weneed some basic concepts and notation from linear algebrasuch as matrix multiplication and matrix inversion Readersunfamiliar with these concepts are referred to Appendix Awhich provides an informal introduction

241 Limitations of Naive Discriminative Learning Math-ematically naive discrimination learning works with twomatrices a cue matrix C that specifies wordsrsquo form featuresand a matrix S that specifies the targeted lexomes [61] Weillustrate the cuematrix for four words aaa aab abb and abband their letter bigrams aa ab and bb A cell 119888119894119895 is 1 if word 119894has bigram 119895 and zero otherwise

Complexity 5

C =aa ab bb

aaaaababbabb

( 11000111

0011 ) (2)

The third and fourth words are homographs they shareexactly the same bigrams We next define a target matrix Sthat defines for each of the four words the correspondinglexome 120582 This is done by setting one bit to 1 and all otherbits to zero in the row vectors of this matrix

S =1205821 1205822 1205823 1205824

aaaaababbabb

( 10000100

00100001 ) (3)

The problem that arises at this point is that the word formsjointly set up a space with three dimensions whereas thetargeted lexomes set up a four-dimensional space This isproblematic for the following reason A linear mapping of alower-dimensional spaceA onto a higher-dimensional spaceB will result in a subspace of B with a dimensionality thatcannot be greater than that of A [62] If A is a space in R2

and B is a space in R3 a linear mapping of A onto B willresult in a plane inB All the points inB that are not on thisplane cannot be reached fromA with the linear mapping

As a consequence it is impossible for ndl to perfectlydiscriminate between the four lexomes (which set up a four-dimensional space) given their bigram cues (which jointlydefine a three-dimensional space) For the input bb thenetwork therefore splits its support equally over the lexomes1205823 and 1205824 The transformation matrix F is

F = C1015840S = 1205821 1205822 1205823 1205824aaabbb

( 1minus1101minus1

00050005 ) (4)

and the matrix with predicted vectors S is

S = CF =1205821 1205822 1205823 1205824

aaaaababbabb

( 10000100

000505000505 ) (5)

When a cuematrixC and a target matrix S are set up for largenumbers of words the dimension of the ndl cue space willbe substantially smaller than the space of S the dimensionof which is equal to the number of lexomes Thus in the set-up of naive discriminative learning we do take into accountthat words tend to have similar forms but we do not take into

account that words are also similar in meaning By settingup S as completely orthogonal we are therefore making itunnecessarily hard to relate form to meaning

This study therefore implements discrimination learningwith semantic vectors of reals replacing the one-bit-on rowvectors of the target matrix S By doing so we properlyreduce the dimensionality of the target space which in turnmakes discriminative learningmore accurateWewill refer tothis new implementation of discrimination learning as lineardiscriminative learning (ldl) instead of naive discriminativelearning as the outcomes are no longer assumed to beindependent and the networks aremathematically equivalentto linear mappings onto continuous target matrices

25 Linear Transformations and Proportional Analogy Acentral concept in word and paradigm morphology is thatthe form of a regular inflected form stands in a relationof proportional analogy to other words in the inflectionalsystem As explained by Matthews ([25] p 192f)

In effect we are predicting the inflections of servusby analogy with those of dominus As GenitiveSingular domini is to Nominative Singular domi-nus so x (unknown) must be to NominativeSingular servus What then is x Answer it mustbe servi In notation dominus domini = servusservi ([25] p 192f)

Here form variation is associated with lsquomorphosyntacticfeaturesrsquo Such features are often naively assumed to functionas proxies for a notion of lsquogrammatical meaningrsquo Howeverin word and paradigm morphology these features actu-ally represent something more like distribution classes Forexample the accusative singular forms of nouns belongingto different declensions will typically differ in form and beidentified as the lsquosamersquo case in different declensions by virtueof distributional parallels The similarity in meaning thenfollows from the distributional hypothesis which proposesthat linguistic items with similar distributions have similarmeanings [31 32] Thus the analogy of forms

dominus domini = servus servi (6)

is paralleled by an analogy of distributions 119889119889 (dominus) 119889 (domini) = 119889 (servus) 119889 (servi) (7)

Borrowing notational conventions of matrix algebra we canwrite

(dominusdominiservusservi

) sim (119889 (dominus)119889 (domini)119889 (servus)119889 (servi) ) (8)

In this study we operationalize the proportional analogy ofword and paradigmmorphology by replacing the word formsin (8) by vectors of features that are present or absent in aword and by replacing wordsrsquo distributions by semantic vec-tors Linear mappings between form and meaning formalize

6 Complexity

two distinct proportional analogies one analogy for goingfrom form to meaning and a second analogy for going frommeaning to form Consider for instance a semantic matrix Sand a form matrix C with rows representing words

S = (1198861 11988621198871 1198872)C = (1199011 11990121199021 1199022) (9)

The transformation matrix G that maps S onto C111988611198872 minus 11988621198871 ( 1198872 minus1198862minus1198871 1198861 )⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟Sminus1

(1199011 11990121199022 1199022)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟C= 111988611198872 minus 11988621198871 [( 11988721199011 11988721199012minus11988711199011 minus11988711199012) + (minus11988621199021 minus1198862119902211988611199021 11988611199022 )]⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

G

(10)

can be written as the sum of two matrices (both of whichare scaled by 11988611198872 minus 11988621198871) The first matrix takes the firstform vector of C and weights it by the elements of thesecond semantic vector The second matrix takes the secondform vector of C and weights this by the elements of thefirst semantic vector of S Thus with this linear mapping apredicted form vector is a semantically weighted mixture ofthe form vectors of C

The remainder of this paper is structured as follows Wefirst introduce the semantic vector space S that we will beusing which we derived from the tasa corpus [63 64] Nextwe show that we canmodel word comprehension bymeans ofa matrix (or network) F that transforms the cue row vectorsof C into the semantic row vectors of S ie CF = S We thenshow that we can model the production of word forms giventheir semantics by a transformation matrix G ie SG = CwhereC is a matrix specifying for each word which triphonesit contains As the network is informative only about whichtriphones are at issue for a word but remains silent abouttheir order an algorithm building on graph theory is usedto properly order the triphones All models are evaluated ontheir accuracy and are also tested against experimental data

In the general discussion we will reflect on the conse-quences for linguistic theory of our finding that it is indeedpossible to model the lexicon and lexical processing usingsystemic discriminative learning without requiring lsquohiddenunitsrsquo such as morphemes and stems

3 A Semantic Vector Space Derived fromthe TASA Corpus

The first part of this section describes how we constructedsemantic vectors Specific to our approach is that we con-structed semantic vectors not only for content words butalso for inflectional and derivational functions The semanticvectors for these functions are of especial importance forthe speech production component of the model The second

part of this section addresses the validation of the semanticvectors for which we used paired associate learning scoressemantic relatedness ratings and semantic plausibility andsemantic transparency ratings for derived words

31 Constructing the Semantic Vector Space The semanticvector space that we use in this study is derived from thetasa corpus [63 64] We worked with 752130 sentencesfrom this corpus to a total of 10719386 word tokens Weused the treetagger software [65] to obtain for eachword token its stem and a part of speech tag Compoundswritten with a space such as apple pie and jigsaw puzzlewere when listed in the celex lexical database [66] joinedinto one onomasiological unit Information about wordsrsquomorphological structure was retrieved from celex

In computational morphology several options have beenexplored for representing themeaning of affixesMitchell andLapata [67] proposed to model the meaning of an affix as avector that when added to the vector b of a base word givesthe vector d of the derived word They estimated this vectorby calculating d119894 minus b119894 for all available pairs 119894 of base andderived words and taking the average of the resulting vectorsLazaridou et al [68] andMarelli and Baroni [60]modeled thesemantics of affixes by means of matrix operations that takeb as input and produce d as output ie

d = Mb (11)

whereas Cotterell et al [69] constructed semantic vectorsas latent variables in a Gaussian graphical model Whatthese approaches have in common is that they derive orimpute semantic vectors given the semantic vectors of wordsproduced by algorithms such as word2vec algorithmswhichwork with (stemmed) words as input units

From the perspective of discriminative linguistics how-ever it does not make sense to derive one meaning fromanother Furthermore rather than letting writing conven-tions dictate what units are accepted as input to onersquosfavorite tool for constructing semantic vectors includingnot and again but excluding the prefixes un- in- and re-we included not only lexomes for content words but alsolexomes for prefixes and suffixes as units for constructinga semantic vector space As a result semantic vectors forprefixes and suffixes are obtained straightforwardly togetherwith semantic vectors for content words

To illustrate this procedure consider the sentence theboysrsquo happiness was great to observe Standard methods willapply stemming and remove stop words resulting in thelexomes boy happiness great and observe being theinput to the algorithm constructing semantic vectors fromlexical co-occurrences In our approach by contrast thelexomes considered are the boy happiness be greatto and observe and in addition plural ness past Stopwords are retained (although including function words mayseem counterproductive from the perspective of semanticvectors in natural language processing applications they areretained in the present approach for two reasons First sincein our model semantic vectors drive speech production inorder to model the production of function words semantic

Complexity 7

vectors for function words are required Second althoughthe semantic vectors of function words typically are lessinformative (they typically have small association strengthsto very large numbers of words) they still show structureas illustrated in Appendix C for pronouns and prepositions)inflectional endings are replaced by the inflectional functionsthey subserve and for derived words the semantic functionof the derivational suffix is identified as well In what followswe outline in more detail what considerations motivate thisway of constructing the input for our algorithm constructingsemantic vectors We note here that our method for handlingmorphologically complex words can be combined with anyof the currently available algorithms for building semanticvectors We also note here that we are not concerned withthe question of how lexomes for plurality or tense might beinduced from word forms from world knowledge or anycombination of the two All we assume is first that anyoneunderstanding the sentence the boysrsquo happiness was great toobserve knows that more than one boy is involved and thatthe narrator situates the event in the past Second we takethis understanding to drive learning

We distinguished the following seven inflectional func-tions comparative and superlative for adjectives sin-gular and plural for nouns and past perfective con-tinuous persistence (denotes the persistent relevance ofthe predicate in sentences such as London is a big city) andperson3 (third person singular) for verbs

The semantics of derived words tend to be characterizedby idiosyncracies [70] that can be accompanied by formalidiosyncracies (eg business and resound) but often areformally unremarkable (eg heady with meanings as diverseas intoxicating exhilerating impetuous and prudent) Wetherefore paired derived words with their own content lex-omes as well as with a lexome for the derivational functionexpressed in the derivedwordWe implemented the followingderivational lexomes ordinal for ordinal numbers notfor negative un and in undo for reversative un other fornonnegative in and its allomorphs ion for ation ution itionand ion ee for ee agent for agents with er instrumentfor instruments with er impagent for impersonal agentswith er causer for words with er expressing causers (thedifferentiation of different semantic functions for er wasbased on manual annotation of the list of forms with er theassignment of a given form to a category was not informedby sentence context and hence can be incorrect) again forre and ness ity ism ist ic able ive ous ize enceful ish under sub self over out mis and dis for thecorresponding prefixes and suffixes This set-up of lexomes isinformed by the literature on the semantics of English wordformation [71ndash73] and targets affixal semantics irrespective ofaffixal forms (following [74])

This way of setting up the lexomes for the semantic vectorspace model illustrates an important aspect of the presentapproach namely that lexomes target what is understoodand not particular linguistic forms Lexomes are not unitsof form For example in the study of Geeraert et al [75]the forms die pass away and kick the bucket are all linkedto the same lexome die In the present study we have notattempted to identify idioms and likewise no attempt was

made to disambiguate homographs These are targets forfurther research

In the present study we did implement the classicaldistinction between inflection and word formation by rep-resenting inflected words with a content lexome for theirstem but derived words with a content lexome for the derivedword itself In this way the sentence scientists believe thatexposure to shortwave ultraviolet rays can cause blindnessis associated with the following lexomes scientist plbelieve persistence that exposure sg to shortwaveultraviolet ray can present cause blindness andness In our model sentences constitute the learning eventsie for each sentence we train the model to predict thelexomes in that sentence from the very same lexomes inthat sentence Sentences or for spoken corpora utterancesare more ecologically valid than windows of say two wordspreceeding and two words following a target word mdash theinterpretation of a word may depend on the company thatit keeps at long distances in a sentence We also did notremove any function words contrary to standard practicein distributional semantics (see Appendix C for a heatmapillustrating the kind of clustering observable for pronounsand prepositions) The inclusion of semantic vectors for bothaffixes and function words is necessitated by the generalframework of our model which addresses not only com-prehension but also production and which in the case ofcomprehension needs to move beyond lexical identificationas inflectional functions play an important role during theintegration of words in sentence and discourse

Only words with a frequency exceeding 8 occurrences inthe tasa corpus were assigned lexomes This threshold wasimposed in order to avoid excessive data sparseness whenconstructing the distributional vector space The number ofdifferent lexomes that met the criterion for inclusion was23562

We constructed a semantic vector space by training anndl network on the tasa corpus using the ndl2 packagefor R [76] Weights on lexome-to-lexome connections wererecalibrated sentence by sentence in the order in which theyappear in the tasa corpus using the learning rule of ndl(ie a simplified version of the learning rule of Rescorlaand Wagner [43] that has only two free parameters themaximum amount of learning 120582 (set to 1) and a learning rate120588 (set to 0001) This resulted in a 23562 times 23562 matrixSentence-based training keeps the carbon footprint of themodel down as the number of learning events is restrictedto the number of utterances (approximately 750000) ratherthan the number of word tokens (approximately 10000000)The row vectors of the resulting matrix henceforth S are thesemantic vectors that we will use in the remainder of thisstudy (work is in progress to further enhance the S matrixby including word sense disambiguation and named entityrecognition when setting up lexomes The resulting lexomicversion of the TASA corpus and scripts (in python as wellas in R) for deriving the S matrix will be made available athttpwwwsfsuni-tuebingendesimhbaayen)

The weights on themain diagonal of thematrix tend to behigh unsurprisingly as in each sentence on which the modelis trained each of the words in that sentence is an excellent

8 Complexity

predictor of that sameword occurring in that sentenceWhenthe focus is on semantic similarity it is useful to set the maindiagonal of this matrix to zero However for more generalassociation strengths between words the diagonal elementsare informative and should be retained

Unlike standard models of distributional semantics wedid not carry out any dimension reduction using forinstance singular value decomposition Because we do notwork with latent variables each dimension of our semanticspace is given by a column vector of S and hence eachdimension is linked to a specific lexomeThus the rowvectorsof S specify for each of the lexomes how well this lexomediscriminates between all the lexomes known to the model

Although semantic vectors of length 23562 can providegood results vectors of this length can be challenging forstatistical evaluation Fortunately it turns out that many ofthe column vectors of S are characterized by extremely smallvariance Such columns can be removed from the matrixwithout loss of accuracy In practice we have found it sufficestoworkwith approximately 4 to 5 thousand columns selectedcalculating column variances and using only those columnswith a variance exceeding a threshold value

32 Validation of the Semantic Vector Space As shown byBaayen et al [59] and Milin et al [58 77] measures basedon matrices such as S are predictive for behavioral measuressuch as reaction times in the visual lexical decision taskas well as for self-paced reading latencies In what followswe first validate the semantic vectors of S on two data setsone data set with accuracies in paired associate learningand one dataset with semantic similarity ratings We thenconsidered specifically the validity of semantic vectors forinflectional and derivational functions by focusing first onthe correlational structure of the pertinent semantic vectorsfollowed by an examination of the predictivity of the semanticvectors for semantic plausibility and semantic transparencyratings for derived words

321 PairedAssociate Learning Thepaired associate learning(PAL) task is a widely used test in psychology for evaluatinglearning and memory Participants are given a list of wordpairs tomemorize Subsequently at testing they are given thefirst word and have to recall the second wordThe proportionof correctly recalled words is the accuracy measure on thePAL task Accuracy on the PAL test decreases with age whichhas been attributed to cognitive decline over the lifetimeHowever Ramscar et al [78] and Ramscar et al [79] provideevidence that the test actually measures the accumulationof lexical knowledge In what follows we use the data onPAL performance reported by desRosiers and Ivison [80]We fitted a linear mixed model to accuracy in the PAL taskas a function of the Pearson correlation 119903 of paired wordsrsquosemantic vectors in S (but with weights on the main diagonalincluded and using the 4275 columns with highest variance)with random intercepts for word pairs sex and age as controlvariables and crucially an interaction of 119903 by age Given thefindings of Ramscar and colleagues we expect to find that theslope of 119903 (which is always negative) as a predictor of PAL

00 01 02 03 04 05r

010

2030

4050

simila

rity

ratin

g

Figure 1 Similarity rating in the MEN dataset as a function of thecorrelation 119903 between the row vectors in S of the words in the testpairs

accuracy increases with age (indicating decreasing accuracy)Table 1 shows that this prediction is born out For age group20ndash29 (the reference level of age) the slope for 119903 is estimatedat 031 For the next age level this slope is adjusted upward by012 and as age increases these upward adjustments likewiseincrease to 021 024 and 032 for age groups 40ndash49 50-59 and 60-69 respectively Older participants know theirlanguage better and hence are more sensitive to the semanticsimilarity (or lack thereof) of thewords thatmake up PAL testpairs For the purposes of the present study the solid supportfor 119903 as a predictor for PAL accuracy contributes to validatingthe row vectors of S as semantic vectors

322 Semantic Relatedness Ratings We also examined per-formance of S on the MEN test collection [81] that providesfor 3000 word pairs crowd sourced ratings of semanticsimilarity For 2267 word pairs semantic vectors are availablein S for both words Figure 1 shows that there is a nonlinearrelation between the correlation of wordsrsquo semantic vectorsin S and the ratings The plot shows as well that for lowcorrelations the variability in the MEN ratings is larger Wefitted a Gaussian location scale additive model to this dataset summarized in Table 2 which supported 119903 as a predictorfor both mean MEN rating and the variability in the MENratings

To put this performance in perspective wecollected the latent semantic analysis (LSA) similarityscores for the MEN word pairs using the website athttplsacoloradoedu The Spearman correlationfor LSA scores andMEN ratings was 0697 and that for 119903 was0704 (both 119901 lt 00001) Thus our semantic vectors performon a par with those of LSA a well-established older techniquethat still enjoys wide use in psycholinguistics Undoubtedlyoptimized techniques from computational linguistics such asword2vec will outperform our model The important pointhere is that even with training on full sentences rather thanusing small windows and even when including function

Complexity 9

Table 1 Linear mixed model fit to paired associate learning scores with the correlation 119903 of the row vectors in S of the paired words aspredictor Treatment coding was used for factors For the youngest age group 119903 is not predictive but all other age groups show increasinglylarge slopes compared to the slope of the youngest age group The range of 119903 is [minus488 minus253] hence larger values for the coefficients of 119903 andits interactions imply worse performance on the PAL task

A parametric coefficients Estimate Std Error t-value p-valueintercept 36220 06822 53096 lt 00001119903 03058 01672 18293 00691age=39 02297 01869 12292 02207age=49 04665 01869 24964 00135age=59 06078 01869 32528 00014age=69 08029 01869 42970 lt 00001sex=male -01074 00230 -46638 lt 00001119903age=39 01167 00458 25490 00117119903age=49 02090 00458 45640 lt 00001119903age=59 02463 00458 53787 lt 00001119903age=69 03239 00458 70735 lt 00001B smooth terms edf Refdf F-value p-valuerandom intercepts word pair 178607 180000 1282283 lt 00001

Table 2 Summary of a Gaussian location scale additive model fitted to the similarity ratings in the MEN dataset with as predictor thecorrelation 119903 of wordrsquos semantic vectors in the Smatrix TPRS thin plate regression spline b theminimum standard deviation for the logb linkfunction Location parameter estimating the mean scale parameter estimating the variance through the logb link function (120578 = log(120590 minus 119887))A parametric coefficients Estimate Std Error t-value p-valueIntercept [location] 252990 01799 1406127 lt 00001Intercept [scale] 21297 00149 1430299 lt 00001B smooth terms edf Refdf F-value p-valueTPRS 119903 [location] 83749 88869 27662399 lt 00001TPRS 119903 [scale] 59333 70476 875384 lt 00001

words performance of our linear network with linguisticallymotivated lexomes is sufficiently high to serve as a basis forfurther analysis

323 Correlational Structure of Inflectional and DerivationalVectors Now that we have established that the semantic vec-tor space S obtainedwith perhaps the simplest possible error-driven learning algorithm discriminating between outcomesgiven multiple cues (1) indeed captures semantic similaritywe next consider how the semantic vectors of inflectional andderivational lexomes cluster in this space Figure 2 presents aheatmap for the correlation matrix of the functional lexomesthat we implemented as described in Section 31

Nominal and verbal inflectional lexomes as well as adver-bial ly cluster in the lower left corner and tend to be eithernot correlated with derivational lexomes (indicated by lightyellow) or to be negatively correlated (more reddish colors)Some inflectional lexomes however are interspersed withderivational lexomes For instance comparative superla-tive and perfective form a small subcluster togetherwithin the group of derivational lexomes

The derivational lexomes show for a majority of pairssmall positive correlationsand stronger correlations in thecase of the verb-forming derivational lexomes mis over

under and undo which among themselves show thevstrongest positive correlations of allThe inflectional lexomescreating abstract nouns ion ence ity and ness also form asubcluster In other words Figure 2 shows that there is somestructure to the distribution of inflectional and derivationallexomes in the semantic vector space

Derived words (but not inflected words) have their owncontent lexomes and hence their own row vectors in S Do thesemantic vectors of these words cluster according to the theirformatives To address this question we extracted the rowvectors from S for a total of 3500 derivedwords for 31 differentformatives resulting in a 3500 times 23562 matrix 119863 sliced outof S To this matrix we added the semantic vectors for the 31formatives We then used linear discriminant analysis (LDA)to predict a lexomersquos formative from its semantic vector usingthe lda function from the MASS package [82] (for LDA towork we had to remove columns from D with very smallvariance as a consequence the dimension of the matrix thatwas the input to LDA was 3531 times 814) LDA accuracy for thisclassification task with 31 possible outcomes was 072 All 31derivational lexomes were correctly classified

When the formatives are randomly permuted thus break-ing the relation between the semantic vectors and theirformatives LDA accuracy was on average 0528 (range across

10 Complexity

SGPR

ESEN

TPE

RSIS

TEN

CE PLCO

NTI

NU

OU

SPA

ST LYG

ENPE

RSO

N3

PASS

IVE

CAN

FUTU

REU

ND

OU

ND

ERO

VER MIS

ISH

SELF IZ

EIN

STRU

MEN

TD

IS IST

OU

GH

T EE ISM

SUB

COM

PARA

TIV

ESU

PERL

ATIV

EPE

RFEC

TIV

EPE

RSO

N1

ION

ENCE IT

YN

ESS Y

SHA

LLO

RDIN

AL

OU

TAG

AIN

FUL

LESS

ABL

EAG

ENT

MEN

T ICO

US

NO

TIV

E

SGPRESENTPERSISTENCEPLCONTINUOUSPASTLYGENPERSON3PASSIVECANFUTUREUNDOUNDEROVERMISISHSELFIZEINSTRUMENTDISISTOUGHTEEISMSUBCOMPARATIVESUPERLATIVEPERFECTIVEPERSON1IONENCEITYNESSYSHALLORDINALOUTAGAINFULLESSABLEAGENTMENTICOUSNOTIVE

minus02 0 01 02Value

020

060

010

00Color Key

and Histogram

Cou

nt

Figure 2 Heatmap for the correlation matrix of the row vectors in S of derivational and inflectional function lexomes (individual words willbecome legible by zooming in on the figure at maximal magnification)

10 permutations 0519ndash0537) From this we conclude thatderived words show more clustering in the semantic space ofS than can be expected under randomness

How are the semantic vectors of derivational lexomespositioned with respect to the clusters of their derived wordsTo address this question we constructed heatmaps for thecorrelation matrices of the pertinent semantic vectors Anexample of such a heatmap is presented for ness in Figure 3Apart from the existence of clusters within the cluster of nesscontent lexomes it is striking that the ness lexome itself isfound at the very left edge of the dendrogram and at the very

left column and bottom row of the heatmapThe color codingindicates that surprisingly the ness derivational lexome isnegatively correlated with almost all content lexomes thathave ness as formative Thus the semantic vector of nessis not a prototype at the center of its cloud of exemplarsbut an antiprototype This vector is close to the cloud ofsemantic vectors but it is outside its periphery This patternis not specific to ness but is found for the other derivationallexomes as well It is intrinsic to our model

The reason for this is straightforward During learningalthough for a derived wordrsquos lexome 119894 and its derivational

Complexity 11

NES

Ssle

eple

ssne

ssas

sert

iven

ess

dire

ctne

ssso

undn

ess

bign

ess

perm

issiv

enes

sap

prop

riate

ness

prom

ptne

sssh

rew

dnes

sw

eakn

ess

dark

ness

wild

erne

ssbl

ackn

ess

good

ness

kind

ness

lone

lines

sfo

rgiv

enes

ssti

llnes

sco

nsci

ousn

ess

happ

ines

ssic

knes

saw

aren

ess

blin

dnes

sbr

ight

ness

wild

ness

mad

ness

grea

tnes

sso

ftnes

sth

ickn

ess

tend

erne

ssbi

ttern

ess

effec

tiven

ess

hard

ness

empt

ines

sw

illin

gnes

sfo

olish

ness

eage

rnes

sne

atne

sspo

liten

ess

fresh

ness

high

ness

nerv

ousn

ess

wet

ness

unea

sines

slo

udne

ssco

olne

ssid

lene

ssdr

ynes

sbl

uene

ssda

mpn

ess

fond

ness

liken

ess

clean

lines

ssa

dnes

ssw

eetn

ess

read

ines

scle

vern

ess

noth

ingn

ess

cold

ness

unha

ppin

ess

serio

usne

ssal

ertn

ess

light

ness

uglin

ess

flatn

ess

aggr

essiv

enes

sse

lfminusco

nsci

ousn

ess

ruth

lessn

ess

quie

tnes

sva

guen

ess

shor

tnes

sbu

sines

sho

lines

sop

enne

ssfit

ness

chee

rful

ness

earn

estn

ess

right

eous

ness

unpl

easa

ntne

ssqu

ickn

ess

corr

ectn

ess

disti

nctiv

enes

sun

cons

ciou

snes

sfie

rcen

ess

orde

rline

ssw

hite

ness

talln

ess

heav

ines

sw

icke

dnes

sto

ughn

ess

lawle

ssne

ssla

zine

ssnu

mbn

ess

sore

ness

toge

ther

ness

resp

onsiv

enes

scle

arne

ssne

arne

ssde

afne

ssill

ness

dizz

ines

sse

lfish

ness

fatn

ess

rest

lessn

ess

shyn

ess

richn

ess

thor

ough

ness

care

less

ness

tired

ness

fairn

ess

wea

rines

stig

htne

ssge

ntle

ness

uniq

uene

ssfr

iend

lines

sva

stnes

sco

mpl

eten

ess

than

kful

ness

slow

ness

drun

kenn

ess

wre

tche

dnes

sfr

ankn

ess

exac

tnes

she

lples

snes

sat

trac

tiven

ess

fulln

ess

roug

hnes

sm

eann

ess

hope

lessn

ess

drow

sines

ssti

ffnes

sclo

sene

ssgl

adne

ssus

eful

ness

firm

ness

rude

ness

bold

ness

shar

pnes

sdu

llnes

sw

eigh

tless

ness

stra

ngen

ess

smoo

thne

ss

NESSsleeplessnessassertivenessdirectnesssoundnessbignesspermissivenessappropriatenesspromptnessshrewdnessweaknessdarknesswildernessblacknessgoodnesskindnesslonelinessforgivenessstillnessconsciousnesshappinesssicknessawarenessblindnessbrightnesswildnessmadnessgreatnesssoftnessthicknesstendernessbitternesseffectivenesshardnessemptinesswillingnessfoolishnesseagernessneatnesspolitenessfreshnesshighnessnervousnesswetnessuneasinessloudnesscoolnessidlenessdrynessbluenessdampnessfondnesslikenesscleanlinesssadnesssweetnessreadinessclevernessnothingnesscoldnessunhappinessseriousnessalertnesslightnessuglinessflatnessaggressivenessselfminusconsciousnessruthlessnessquietnessvaguenessshortnessbusinessholinessopennessfitnesscheerfulnessearnestnessrighteousnessunpleasantnessquicknesscorrectnessdistinctivenessunconsciousnessfiercenessorderlinesswhitenesstallnessheavinesswickednesstoughnesslawlessnesslazinessnumbnesssorenesstogethernessresponsivenessclearnessnearnessdeafnessillnessdizzinessselfishnessfatnessrestlessnessshynessrichnessthoroughnesscarelessnesstirednessfairnesswearinesstightnessgentlenessuniquenessfriendlinessvastnesscompletenessthankfulnessslownessdrunkennesswretchednessfranknessexactnesshelplessnessattractivenessfullnessroughnessmeannesshopelessnessdrowsinessstiffnessclosenessgladnessusefulnessfirmnessrudenessboldnesssharpnessdullnessweightlessnessstrangenesssmoothness

Color Keyand Histogram

0 05minus05

Value

020

0040

0060

0080

00C

ount

Figure 3 Heatmap for the correlation matrix of lexomes for content words with ness as well as the derivational lexome of ness itself Thislexome is found at the very left edge of the dendrograms and is negatively correlated with almost all content lexomes (Individual words willbecome legible by zooming in on the figure at maximal magnification)

lexome ness are co-present cues the derivational lexomeoccurs in many other words 119895 and each time anotherword 119895 is encountered weights are reduced from ness to119894 As this happens for all content lexomes the derivationallexome is during learning slowly but steadily discriminatedaway from its content lexomes We shall see that this is animportant property for our model to capture morphologicalproductivity for comprehension and speech production

When the additive model of Mitchell and Lapata [67]is used to construct a semantic vector for ness ie when

the average vector is computed for the vectors obtained bysubtracting the vector of the derived word from that of thebase word the result is a vector that is embedded insidethe cluster of derived vectors and hence inherits semanticidiosyncracies from all these derived words

324 Semantic Plausibility and Transparency Ratings forDerived Words In order to obtain further evidence for thevalidity of inflectional and derivational lexomes we re-analyzed the semantic plausibility judgements for word pairs

12 Complexity

consisting of a base and a novel derived form (eg accentaccentable) reported byMarelli and Baroni [60] and availableat httpcliccimecunitnitcomposesFRACSSThe number of pairs available to us for analysis 236 wasrestricted compared to the original 2559 pairs due to theconstraints that the base words had to occur in the tasacorpus Furthermore we also explored the role of wordsrsquoemotional valence arousal and dominance as availablein Warriner et al [83] which restricted the number ofitems even further The reason for doing so is that humanjudgements such as those of age of acquisition may reflectdimensions of emotion [59]

A measure derived from the semantic vectors of thederivational lexomes activation diversity and the L1-normof the semantic vector (the sum of the absolute values of thevector elements ie its city-block distance) turned out tobe predictive for these plausibility ratings as documented inTable 3 and the left panel of Figure 4 Activation diversity is ameasure of lexicality [58] In an auditory word identificationtask for instance speech that gives rise to a low activationdiversity elicits fast rejections whereas speech that generateshigh activation diversity elicits higher acceptance rates but atthe cost of longer response times [18]

Activation diversity interacted with word length Agreater word length had a strong positive effect on ratedplausibility but this effect progressively weakened as acti-vation diversity increases In turn activation diversity hada positive effect on plausibility for shorter words and anegative effect for longer words Apparently the evidencefor lexicality that comes with higher L1-norms contributespositively when there is little evidence coming from wordlength As word length increases and the relative contributionof the formative in theword decreases the greater uncertaintythat comes with higher lexicality (a greater L1-norm impliesmore strong links to many other lexomes) has a detrimentaleffect on the ratings Higher arousal scores also contributedto higher perceived plausibility Valence and dominance werenot predictive and were therefore left out of the specificationof themodel reported here (word frequency was not includedas predictor because all derived words are neologisms withzero frequency addition of base frequency did not improvemodel fit 119901 gt 091)

The degree to which a complex word is semanticallytransparent with respect to its base word is of both practi-cal and theoretical interest An evaluation of the semantictransparency of complex words using transformation matri-ces is developed in Marelli and Baroni [60] Within thepresent framework semantic transparency can be examinedstraightforwardly by comparing the correlations of (i) thesemantic vectors of the base word to which the semanticvector of the affix is added with (ii) the semantic vector ofthe derived word itself The more distant the two vectors arethe more negative their correlation should be This is exactlywhat we find For ness for instance the six most negativecorrelations are business (119903 = minus066) effectiveness (119903 = minus051)awareness (119903 = minus050) loneliness (119903 = minus045) sickness (119903 =minus044) and consciousness (119903 = minus043) Although ness canhave an anaphoric function in discourse [84] words such asbusiness and consciousness have a much deeper and richer

semantics than just reference to a previously mentioned stateof affairs A simple comparison of the wordrsquos actual semanticvector in S (derived from the TASA corpus) and its semanticvector obtained by accumulating evidence over base and affix(ie summing the vectors of base and affix) brings this outstraightforwardly

However evaluating correlations for transparency isprone to researcher bias We therefore also investigated towhat extent the semantic transparency ratings collected byLazaridou et al [68] can be predicted from the activationdiversity of the semantic vector of the derivational lexomeThe original dataset of Lazaridou and colleagues comprises900 words of which 330 meet the criterion of occurringmore than 8 times in the tasa corpus and for which wehave semantic vectors available The summary of a Gaussianlocation-scale additivemodel is given in Table 4 and the rightpanel of Figure 4 visualizes the interaction of word lengthby activation diversity Although modulated by word lengthoverall transparency ratings decrease as activation diversityis increased Apparently the stronger the connections aderivational lexome has with other lexomes the less clearit becomes what its actual semantic contribution to a novelderived word is In other words under semantic uncertaintytransparency ratings decrease

4 Comprehension

Now that we have established that the present semanticvectors make sense even though they are based on a smallcorpus we next consider a comprehension model that hasform vectors as input and semantic vectors as output (a pack-age for R implementing the comprehension and productionalgorithms of linear discriminative learning is available athttpwwwsfsuni-tuebingendesimhbaayenpublicationsWpmWithLdl 10targz The package isdescribed in Baayen et al [85]) We begin with introducingthe central concepts underlying mappings from form tomeaning We then discuss visual comprehension and thenturn to auditory comprehension

41 Setting Up the Mapping Let C denote the cue matrix amatrix that specifies for each word (rows) the form cues ofthat word (columns) For a toy lexicon with the words onetwo and three the Cmatrix is

C

= wV wVn Vn tu tu Tr Tri rionetwothree

( 100100

100010

010001

001001 ) (12)

where we use the disc keyboard phonetic alphabet (theldquoDIstinct SingleCharacterrdquo representationwith one characterfor each phoneme was introduced by CELEX [86]) fortriphones the symbol denotes the word boundary Supposethat the semantic vectors for these words are the row vectors

Complexity 13

26 27

28

29

29

3 3

31

31

32

32

33

33

34

34

35

35

3

3

35

4

45 36

38

40

42

44

Activ

atio

n D

iver

sity

of th

e Der

ivat

iona

l Lex

ome

8 10 12 14 16 186Word Length

36

38

40

42

44

Activ

atio

n D

iver

sity

of th

e Der

ivat

iona

l Lex

ome

6 8 10 12 144Word Length

Figure 4 Interaction of word length by activation diversity in GAMs fitted to plausibility ratings for complex words (left) and to semantictransparency ratings (right)

Table 3 GAM fitted to plausibility ratings for derivational neologisms with the derivational lexomes able again agent ist less and notdata from Marelli and Baroni [60]

A parametric coefficients Estimate Std Error t-value p-valueIntercept -18072 27560 -06557 05127Word Length 05901 02113 27931 00057Activation Diversity 11221 06747 16632 00976Arousal 03847 01521 25295 00121Word Length times Activation Diversity -01318 00500 -26369 00089B smooth terms edf Refdf F-value p-valueRandom Intercepts Affix 33610 40000 556723 lt 00001

of the following matrix S

S = one two threeonetwothree

( 100201031001

040110 ) (13)

We are interested in a transformation matrix 119865 such that

CF = S (14)

The transformation matrix is straightforward to obtain LetC1015840 denote the Moore-Penrose generalized inverse of Cavailable in R as the ginv function in theMASS package [82]Then

F = C1015840S (15)

For the present example

F =one two three

wVwVnVntutuTrTriri

((((((((((((

033033033010010003003003

010010010050050003003003

013013013005005033033033

))))))))))))

(16)

and for this simple example CF is exactly equal to SIn the remainder of this section we investigate how well

this very simple end-to-end model performs for visual wordrecognition as well as auditory word recognition For visualword recognition we use the semantic matrix S developed inSection 3 but we consider two different cue matrices C oneusing letter trigrams (following [58]) and one using phonetrigramsA comparison of the performance of the twomodels

14 Complexity

Table 4 Gaussian location-scale additive model fitted to the semantic transparency ratings for derived words te tensor product smooth ands thin plate regression spline

A parametric coefficients Estimate Std Error t-value p-valueintercept [location] 55016 02564 214537 lt 00001intercept [scale] -04135 00401 -103060 lt 00001B smooth terms edf Refdf F-value p-valuete(activation diversity word length) [location] 53879 62897 237798 00008s(derivational lexome) [location] 143180 150000 3806254 lt 00001s(activation diversity) [scale] 21282 25993 284336 lt 00001s(word length) [scale] 17722 21403 1327311 lt 00001

sheds light on the role of wordsrsquo ldquosound imagerdquo on wordrecognition in reading (cf [87ndash89] for phonological effectsin visual word Recognition) For auditory word recognitionwe make use of the acoustic features developed in Arnold etal [18]

Although the features that we selected as cues are basedon domain knowledge they do not have an ontological statusin our framework in contrast to units such as phonemes andmorphemes in standard linguistic theories In these theoriesphonemes and morphemes are Russelian atomic units offormal phonological and syntactic calculi and considerableresearch has been directed towards showing that these unitsare psychologically real For instance speech errors involvingmorphemes have been interpreted as clear evidence for theexistence in the mind of morphemes [90] Research hasbeen directed at finding behavioral and neural correlates ofphonemes and morphemes [6 91 92] ignoring fundamentalcriticisms of both the phoneme and the morpheme astheoretical constructs [25 93 94]

The features that we use for representing aspects of formare heuristic features that have been selected or developedprimarily because they work well as discriminators We willgladly exchange the present features for other features if theseother features can be shown to afford higher discriminativeaccuracy Nevertheless the features that we use are groundedin domain knowledge For instance the letter trigrams usedfor modeling reading (see eg[58]) are motivated by thefinding in stylometry that letter trigrams are outstandingfeatures for discriminating between authorial hands [95] (inwhat follows we work with letter triplets and triphoneswhich are basically contextually enriched letter and phoneunits For languages with strong phonotactic restrictionssuch that the syllable inventory is quite small (eg Viet-namese) digraphs work appear to work better than trigraphs[96] Baayen et al [85] show that working with four-gramsmay enhance performance and current work in progress onmany other languages shows that for highly inflecting lan-guages with long words 4-grams may outperform 3-gramsFor computational simplicity we have not experimented withmixtures of units of different length nor with algorithmswithwhich such units might be determined) Letter n-grams arealso posited by Cohen and Dehaene [97] for the visual wordform system at the higher endof a hierarchy of neurons tunedto increasingly large visual features of words

Triphones the units that we use for representing thelsquoacoustic imagersquo or lsquoauditory verbal imageryrsquo of canonicalword forms have the same discriminative potential as lettertriplets but have as additional advantage that they do betterjustice compared to phonemes to phonetic contextual inter-dependencies such as plosives being differentiated primarilyby formant transitions in adjacent vowels The acousticfeatures that we use for modeling auditory comprehensionto be introduced in more detail below are motivated in partby the sensitivity of specific areas on the cochlea to differentfrequencies in the speech signal

Evaluation of model predictions proceeds by comparingthe predicted semantic vector s obtained by multiplying anobserved cue vector c with the transformation matrix F (s =cF) with the corresponding target row vector s of S A word 119894is counted as correctly recognized when s119894 is most stronglycorrelated with the target semantic vector s119894 of all targetvectors s119895 across all words 119895

Inflected words do not have their own semantic vectors inS We therefore created semantic vectors for inflected wordsby adding the semantic vectors of stem and affix and addedthese as additional row vectors to S before calculating thetransformation matrix F

We note here that there is only one transformationmatrix ie one discrimination network that covers allaffixes inflectional and derivational as well as derived andmonolexomic (simple) words This approach contrasts withthat of Marelli and Baroni [60] who pair every affix with itsown transformation matrix

42 Visual Comprehension For visual comprehension wefirst consider a model straightforwardly mapping form vec-tors onto semantic vectors We then expand the model withan indirect route first mapping form vectors onto the vectorsfor the acoustic image (derived from wordsrsquo triphone repre-sentations) and thenmapping the acoustic image vectors ontothe semantic vectors

The dataset on which we evaluated our models com-prised 3987 monolexomic English words 6595 inflectedvariants of monolexomic words and 898 derived words withmonolexomic base words to a total of 11480 words Thesecounts follow from the simultaneous constraints of (i) a wordappearing with sufficient frequency in TASA (ii) the wordbeing available in the British Lexicon Project (BLP [98]) (iii)

Complexity 15

the word being available in the CELEX database and theword being of the abovementioned morphological type Inthe present study we did not include inflected variants ofderived words nor did we include compounds

421 The Direct Route Straight from Orthography to Seman-tics To model single word reading as gauged by the visuallexical decision task we used letter trigrams as cues In ourdataset there were a total of 3465 unique trigrams resultingin a 11480 times 3465 orthographic cue matrix C The semanticvectors of the monolexomic and derived words were takenfrom the semantic weight matrix described in Section 3 Forinflected words semantic vectors were obtained by summa-tion of the semantic vectors of base words and inflectionalfunctions For the resulting semantic matrix S we retainedthe 5030 column vectors with the highest variances settingthe cutoff value for the minimal variance to 034times10minus7 FromS and C we derived the transformation matrix F which weused to obtain estimates s of the semantic vectors s in S

For 59 of the words s had the highest correlation withthe targeted semantic vector s (for the correctly predictedwords the mean correlation was 083 and the median was086 With respect to the incorrectly predicted words themean and median correlation were both 048) The accuracyobtained with naive discriminative learning using orthog-onal lexomes as outcomes instead of semantic vectors was27Thus as expected performance of linear discriminationlearning (ldl) is substantially better than that of naivediscriminative learning (ndl)

To assess whether this model is productive in the sensethat it canmake sense of novel complex words we considereda separate set of inflected and derived words which were notincluded in the original dataset For an unseen complexwordboth base word and the inflectional or derivational functionappeared in the training set but not the complex word itselfThe network F therefore was presented with a novel formvector which it mapped onto a novel vector in the semanticspace Recognition was successful if this novel vector wasmore strongly correlated with the semantic vector obtainedby summing the semantic vectors of base and inflectional orderivational function than with any other semantic vector

Of 553 unseen inflected words 43 were recognized suc-cessfully The semantic vectors predicted from their trigramsby the network F were overall well correlated in the meanwith the targeted semantic vectors of the novel inflectedwords (obtained by summing the semantic vectors of theirbase and inflectional function) 119903 = 067 119901 lt 00001The predicted semantic vectors also correlated well with thesemantic vectors of the inflectional functions (119903 = 061 119901 lt00001) For the base words the mean correlation dropped to119903 = 028 (119901 lt 00001)

For unseen derivedwords (514 in total) we also calculatedthe predicted semantic vectors from their trigram vectorsThe resulting semantic vectors s had moderate positivecorrelations with the semantic vectors of their base words(119903 = 040 119901 lt 00001) but negative correlations withthe semantic vectors of their derivational functions (119903 =minus013 119901 lt 00001) They did not correlate with the semantic

vectors obtained by summation of the semantic vectors ofbase and affix (119903 = 001 119901 = 056)

The reduced correlations for the derived words as com-pared to those for the inflected words likely reflects tosome extent that the model was trained on many moreinflected words (in all 6595) than derived words (898)whereas the number of different inflectional functions (7)was much reduced compared to the number of derivationalfunctions (24) However the negative correlations of thederivational semantic vectors with the semantic vectors oftheir derivational functions fit well with the observation inSection 323 that derivational functions are antiprototypesthat enter into negative correlations with the semantic vectorsof the corresponding derived words We suspect that theabsence of a correlation of the predicted vectors with thesemantic vectors obtained by integrating over the vectorsof base and derivational function is due to the semanticidiosyncracies that are typical for derivedwords For instanceausterity can denote harsh discipline or simplicity or apolicy of deficit cutting or harshness to the taste Anda worker is not just someone who happens to work butsomeone earning wages or a nonreproductive bee or waspor a thread in a computer program Thus even within theusages of one and the same derived word there is a lotof semantic heterogeneity that stands in stark contrast tothe straightforward and uniform interpretation of inflectedwords

422 The Indirect Route from Orthography via Phonology toSemantics Although reading starts with orthographic inputit has been shown that phonology actually plays a role duringthe reading process as well For instance developmentalstudies indicate that childrenrsquos reading development can bepredicted by their phonological abilities [87 99] Evidence ofthe influence of phonology on adultsrsquo reading has also beenreported [89 100ndash105] As pointed out by Perrone-Bertolottiet al [106] silent reading often involves an imagery speechcomponent we hear our own ldquoinner voicerdquo while readingSince written words produce a vivid auditory experiencealmost effortlessly they argue that auditory verbal imageryshould be part of any neurocognitive model of readingInterestingly in a study using intracranial EEG recordingswith four epileptic neurosurgical patients they observed thatsilent reading elicits auditory processing in the absence ofany auditory stimulation suggesting that auditory images arespontaneously evoked during reading (see also [107])

To explore the role of auditory images in silent readingwe operationalized sound images bymeans of phone trigrams(in our model phone trigrams are independently motivatedas intermediate targets of speech production see Section 5for further discussion) We therefore ran the model againwith phone trigrams instead of letter triplets The semanticmatrix S was exactly the same as above However a newtransformation matrix F was obtained for mapping a 11480times 5929 cue matrix C of phone trigrams onto S To our sur-prise the triphone model outperformed the trigram modelsubstantially Overall accuracy rose to 78 an improvementof almost 20

16 Complexity

In order to integrate this finding into our model weimplemented an additional network K that maps ortho-graphic vectors (coding the presence or absence of trigrams)onto phonological vectors (indicating the presence or absenceof triphones) The result is a dual route model for (silent)reading with a first route utilizing a network mappingstraight from orthography to semantics and a secondindirect route utilizing two networks one mapping fromtrigrams to triphones and the other subsequently mappingfrom triphones to semantics

The network K which maps trigram vectors onto tri-phone vectors provides good support for the relevant tri-phones but does not provide guidance as to their orderingUsing a graph-based algorithm detailed in the Appendix (seealso Section 52) we found that for 92 of the words thecorrect sequence of triphones jointly spanning the wordrsquossequence of letters received maximal support

We then constructed amatrix Twith the triphone vectorspredicted from the trigrams by networkK and defined a newnetwork H mapping these predicted triphone vectors ontothe semantic vectors SThemean correlation for the correctlyrecognized words was 090 and the median correlation was094 For words that were not recognized correctly the meanand median correlation were 045 and 043 respectively Wenow have for each word two estimated semantic vectors avector s1 obtained with network F of the first route and avector s2 obtained with network H from the auditory targetsthat themselves were predicted by the orthographic cuesThemean of the correlations of these pairs of vectors was 119903 = 073(119901 lt 00001)

To assess to what extent the networks that we definedare informative for actual visual word recognition we madeuse of the reaction times in the visual lexical decision taskavailable in the BLPAll 11480words in the current dataset arealso included in BLPWe derived threemeasures from each ofthe two networks

The first measure is the sum of the L1-norms of s1 ands2 to which we will refer as a wordrsquos total activation diversityThe total activation diversity is an estimate of the support fora wordrsquos lexicality as provided by the two routes The secondmeasure is the correlation of s1 and s2 to which we will referas a wordrsquos route congruency The third measure is the L1-norm of a lexomersquos column vector in S to which we referas its prior This last measure has previously been observedto be a strong predictor for lexical decision latencies [59] Awordrsquos prior is an estimate of a wordrsquos entrenchment and prioravailability

We fitted a generalized additive model to the inverse-transformed response latencies in the BLP with these threemeasures as key covariates of interest including as controlpredictors word length andword type (derived inflected andmonolexomic with derived as reference level) As shown inTable 5 inflected words were responded to more slowly thanderived words and the same holds for monolexomic wordsalbeit to a lesser extent Response latencies also increasedwith length Total activation diversity revealed a U-shapedeffect (Figure 5 left panel) For all but the lowest totalactivation diversities we find that response times increase as

total activation diversity increases This result is consistentwith previous findings for auditory word recognition [18]As expected given the results of Baayen et al [59] the priorwas a strong predictor with larger priors affording shorterresponse times There was a small effect of route congruencywhich emerged in a nonlinear interaction with the priorsuch that for large priors a greater route congruency affordedfurther reduction in response time (Figure 5 right panel)Apparently when the two routes converge on the samesemantic vector uncertainty is reduced and a faster responsecan be initiated

For comparison the dual-route model was implementedwith ndl as well Three measures were derived from thenetworks and used to predict the same response latenciesThese measures include activation activation diversity andprior all of which have been reported to be reliable predictorsfor RT in visual lexical decision [54 58 59] However sincehere we are dealing with two routes the three measures canbe independently derived from both routes We thereforesummed up the measures of the two routes obtaining threemeasures total activation total activation diversity and totalprior With word type and word length as control factorsTable 6 shows that thesemeasures participated in a three-wayinteraction presented in Figure 6 Total activation showed aU-shaped effect on RT that is increasingly attenuated as totalactivation diversity is increased (left panel) Total activationalso interacted with the total prior (right panel) For mediumrange values of total activation RTs increased with totalactivation and as expected decreased with the total priorThecenter panel shows that RTs decrease with total prior formostof the range of total activation diversity and that the effect oftotal activation diversity changes sign going from low to highvalues of the total prior

Model comparison revealed that the generalized additivemodel with ldl measures (Table 5) provides a substantiallyimproved fit to the data compared to the GAM using ndlmeasures (Table 6) with a difference of no less than 65101AIC units At the same time the GAM based on ldl is lesscomplex

Given the substantially better predictability of ldl mea-sures on human behavioral data one remaining question iswhether it is really the case that two routes are involved insilent reading After all the superior accuracy of the secondroute might be an artefact of a simple machine learningtechnique performing better for triphones than for trigramsThis question can be addressed by examining whether thefit of the GAM summarized in Table 5 improves or worsensdepending on whether the activation diversity of the first orthe second route is taken out of commission

When the GAM is provided access to just the activationdiversity of s2 the AIC increased by 10059 units Howeverwhen themodel is based only on the activation diversity of s1themodel fit increased by no less than 14790 AIC units Fromthis we conclude that at least for the visual lexical decisionlatencies in the BLP the second route first mapping trigramsto triphones and then mapping triphones onto semanticvectors plays the more important role

The superiority of the second route may in part bedue to the number of triphone features being larger

Complexity 17

minus0295

minus00175

026

part

ial e

ffect

1 2 3 4 50Total activation diversity

000

005

010

015

Part

ial e

ffect

05

10

15

20

Prio

r

02 04 06 08 1000Selfminuscorrelation

Figure 5 The partial effects of total activation diversity (left) and the interaction of route congruency and prior (right) on RT in the BritishLexicon Project

Table 5 Summary of a generalized additive model fitted to response latencies in visual lexical decision using measures based on ldl s thinplate regression spline smooth te tensor product smooth

A parametric coefficients Estimate Std Error t-value p-valueintercept -17774 00087 -205414 lt0001word typeInflected 01110 00059 18742 lt0001word typeMonomorphemic 00233 00054 4322 lt0001word length 00117 00011 10981 lt0001B smooth terms edf Refdf F p-values(total activation diversity) 6002 7222 236 lt0001te(route congruency prior) 14673 17850 2137 lt0001

02 06 10 14

NDL total activation

minus0171

0031

0233

part

ial e

ffect

minus4 minus2 0 2 4

10

15

20

25

30

35

40

NDL total prior

ND

L to

tal a

ctiv

atio

n di

vers

ity

minus0416

01915

0799

part

ial e

ffect

02 06 10 14

minus4

minus2

02

4

NDL total activation

ND

L to

tal p

rior

minus0158

01635

0485

part

ial e

ffect

10

15

20

25

30

35

40

ND

L to

tal a

ctiv

atio

n di

vers

ity

Figure 6The interaction of total activation and total activation diversity (left) of total prior by total activation diversity (center) and of totalactivation by total prior (right) on RT in the British Lexicon Project based on ndl

18 Complexity

Table 6 Summary of a generalized additive model fitted to response latencies in visual lexical decision using measures from ndl te tensorproduct smooth

A parametric coefficients Estimate Std Error t-value p-valueintercept -16934 00086 -195933 lt0001word typeInflected 00286 00052 5486 lt0001word typeMonomorphemic 00042 00055 0770 = 4word length 00066 00011 5893 lt0001B smooth terms edf Refdf F p-valuete(activation activation diversity prior) 4134 4972 1084 lt0001

than the number of trigram features (3465 trigrams ver-sus 5929 triphones) More features which mathematicallyamount to more predictors enable more precise map-pings Furthermore the heavy use made in English ofletter sequences such as ough (with 10 different pro-nunciations httpswwwdictionarycomesough)reduces semantic discriminability compared to the corre-sponding spoken forms It is noteworthy however that thebenefits of the triphone-to-semantics mapping are possibleonly thanks to the high accuracy with which orthographictrigram vectors are mapped onto phonological triphonevectors (92)

It is unlikely that the lsquophonological routersquo is always dom-inant in silent reading Especially in fast lsquodiagonalrsquo readingthe lsquodirect routersquomay bemore dominantThere is remarkablealthough for the present authors unexpected convergencewith the dual route model for reading aloud of Coltheartet al [108] and Coltheart [109] However while the text-to-phonology route of their model has as primary function toexplain why nonwords can be pronounced our results showthat both routes can actually be active when silently readingreal words A fundamental difference is however that in ourmodel wordsrsquo semantics play a central role

43 Auditory Comprehension For the modeling of readingwe made use of letter trigrams as cues These cues abstractaway from the actual visual patterns that fall on the retinapatterns that are already transformed at the retina beforebeing sent to the visual cortex Our hypothesis is that lettertrigrams represent those high-level cells or cell assemblies inthe visual system that are critical for reading andwe thereforeleave the modeling possibly with deep learning networksof how patterns on the retina are transformed into lettertrigrams for further research

As a consequence of the high level of abstraction of thetrigrams a word form is represented by a unique vectorspecifying which of a fixed set of letter trigrams is present inthe word Although one might consider modeling auditorycomprehension with phone triplets (triphones) replacing theletter trigrams of visual comprehension such an approachwould not do justice to the enormous variability of actualspeech Whereas the modeling of reading printed wordscan depart from the assumption that the pixels of a wordrsquosletters on a computer screen are in a fixed configurationindependently of where the word is shown on the screenthe speech signal of the same word type varies from token

to token as illustrated in Figure 7 for the English wordcrisis A survey of the Buckeye corpus [110] of spontaneousconversations recorded at Columbus Ohio [111] indicatesthat around 5 of the words are spoken with one syllablemissing and that a little over 20 of words have at least onephone missing

It is widely believed that the phoneme as an abstractunit of sound is essential for coming to grips with thehuge variability that characterizes the speech signal [112ndash114]However the phoneme as theoretical linguistic constructis deeply problematic [93] and for many spoken formscanonical phonemes do not do justice to the phonetics ofthe actual sounds [115] Furthermore if words are defined assequences of phones the problem arises what representationsto posit for words with two ormore reduced variants Addingentries for reduced forms to the lexicon turns out not toafford better overal recognition [116] Although exemplarmodels have been put forward to overcome this problem[117] we take a different approach here and following Arnoldet al [18] lay out a discriminative approach to auditorycomprehension

The cues that we make use of to represent the acousticsignal are the Frequency Band Summary Features (FBSFs)introduced by Arnold et al [18] as input cues FBSFs summa-rize the information present in the spectrogram of a speechsignal The algorithm that derives FBSFs first chunks theinput at the minima of the Hilbert amplitude envelope of thesignalrsquos oscillogram (see the upper panel of Figure 8) Foreach chunk the algorithm distinguishes 21 frequency bandsin the MEL scaled spectrum and intensities are discretizedinto 5 levels (lower panel in Figure 8) for small intervalsof time For each chunk and for each frequency band inthese chunks a discrete feature is derived that specifies chunknumber frequency band number and a description of thetemporal variation in the band bringing together minimummaximum median initial and final intensity values The21 frequency bands are inspired by the 21 receptive areason the cochlear membrane that are sensitive to differentranges of frequencies [118] Thus a given FBSF is a proxyfor cell assemblies in the auditory cortex that respond to aparticular pattern of changes over time in spectral intensityThe AcousticNDLCodeR package [119] for R [120] wasemployed to extract the FBSFs from the audio files

We tested ldl on 20 hours of speech sampled from theaudio files of the ucla library broadcast newsscapedata a vast repository of multimodal TV news broadcasts

Complexity 19

0 5000 10000 15000 20000 25000 30000 35000 40000minus6000

minus4000

minus2000

0

2000

4000

6000

8000

0 5000 10000 15000 20000 25000 30000 35000 40000minus6000

minus4000

minus2000

0

2000

4000

6000

8000

Figure 7Oscillogram for twodifferent realizations of theword crisiswith different degrees of reduction in theNewsScape archive [kraız](left)and [khraısıs](right)

00 01 02 03 04 05 06time

minus020minus015minus010minus005

000005010015

ampl

itude

1

21

frequ

ency

ban

d

Figure 8 Oscillogram of the speech signal for a realization of the word economic with Hilbert envelope (in orange) outlining the signal isshown on the top panel The lower panel depicts the discretized MEL scaled spectrum of the signalThe vertical bars are the boundary pointsthat partition the signal into 4 chunks For one chunk horizontal lines (in red) highlight one of the frequency bands for which the FBSFsprovide summaries of the variation over time in that frequency band The FBSF for the highlighted band is ldquoband11-start3-median3-min2-max5-end3-part2rdquo

provided to us by the Distributed Little Red Hen Lab Theaudio files of this resource were automatically classified asclean for relatively clean parts where there is speech withoutbackground noise or music and noisy for speech snippetswhere background noise or music is present Here we reportresults for 20 hours of clean speech to a total of 131673word tokens (representing 4779word types) with in all 40639distinct FBSFs

The FBSFs for the word tokens are brought together ina matrix C119886 with dimensions 131673 audio tokens times 40639FBSFs The targeted semantic vectors are taken from the Smatrix which is expanded to a matrix with 131673 rows onefor each audio token and 4609 columns the dimension of

the semantic vectors Although the transformation matrix Fcould be obtained by calculating C1015840S the calculation of C1015840is numerically expensive To reduce computational costs wecalculated F as follows (the transpose of a square matrix Xdenoted by X119879 is obtained by replacing the upper triangleof the matrix by the lower triangle and vice versa Thus( 3 87 2 )119879 = ( 3 78 2 ) For a nonsquare matrix the transpose isobtained by switching rows and columnsThus a 4times2 matrixbecomes a 2 times 4 matrix when transposed)

CF = S

C119879CF = C119879S

20 Complexity

(C119879C)minus1 (C119879C) F = (C119879C)minus1 C119879SF = (C119879C)minus1 C119879S

(17)

In this way matrix inversion is required for a much smallermatrixC119879C which is a square matrix of size 40639 times 40639

To evaluate model performance we compared the esti-mated semantic vectors with the targeted semantic vectorsusing the Pearson correlation We therefore calculated the131673 times 131673 correlation matrix for all possible pairs ofestimated and targeted semantic vectors Precisionwas calcu-lated by comparing predicted vectors with the gold standardprovided by the targeted vectors Recognition was defined tobe successful if the correlation of the predicted vector withthe targeted gold vector was the highest of all the pairwisecorrelations of this predicted vector with any of the other goldsemantic vectors Precision defined as the proportion of cor-rect recognitions divided by the total number of audio tokenswas at 3361 For the correctly identified words the meancorrelation was 072 and for the incorrectly identified wordsit was 055 To place this in perspective a naive discrimi-nation model with discrete lexomes as output performed at12 and a deep convolution network Mozilla DeepSpeech(httpsgithubcommozillaDeepSpeech based onHannun et al [9]) performed at 6 The low performanceofMozilla DeepSpeech is due primarily due to its dependenceon a languagemodelWhenpresentedwith utterances insteadof single words it performs remarkably well

44 Discussion A problem with naive discriminative learn-ing that has been puzzling for a long time is that measuresbased on ndl performed well as predictors of processingtimes [54 58] whereas accuracy for lexical decisions waslow An accuracy of 27 for a model performing an 11480-classification task is perhaps reasonable but the lack ofprecision is unsatisfactory when the goal is to model humanvisual lexicality decisions Bymoving fromndl to ldlmodelaccuracy is substantially improved (to 59) At the sametime predictions for reaction times improved considerablyas well As lexicality decisions do not require word identifica-tion further improvement in predicting decision behavior isexpected to be possible by considering not only whether thepredicted semantic is closest to the targeted vector but alsomeasures such as how densely the space around the predictedsemantic vector is populated

Accuracy for auditory comprehension is lower for thedata we considered above at around 33 Interestingly aseries of studies indicates that recognizing isolated wordstaken out of running speech is a nontrivial task also forhuman listeners [121ndash123] Correct identification by nativespeakers of 1000 randomly sampled word tokens from aGerman corpus of spontaneous conversational speech rangedbetween 21 and 44 [18] For both human listeners andautomatic speech recognition systems recognition improvesconsiderably when words are presented in their naturalcontext Given that ldl with FBSFs performs very wellon isolated word recognition it seems worth investigating

further whether the present approach can be developed into afully-fledged model of auditory comprehension that can takefull utterances as input For a blueprint of how we plan toimplement such a model see Baayen et al [124]

5 Speech Production

This section examines whether we can predict wordsrsquo formsfrom the semantic vectors of S If this is possible for thepresent dataset with reasonable accuracy we have a proofof concept that discriminative morphology is feasible notonly for comprehension but also for speech productionThe first subsection introduces the computational imple-mentation The next subsection reports on the modelrsquosperformance which is evaluated first for monomorphemicwords then for inflected words and finally for derivedwords Section 53 provides further evidence for the pro-duction network by showing that as the support from thesemantics for the triphones becomes weaker the amount oftime required for articulating the corresponding segmentsincreases

51 Computational Implementation For a productionmodelsome representational format is required for the output thatin turn drives articulation In what follows we make useof triphones as output features Triphones capture part ofthe contextual dependencies that characterize speech andthat render problematic the phoneme as elementary unit ofa phonological calculus [93] Triphones are in many waysnot ideal in that they inherit the limitations that come withdiscrete units Other output features structured along thelines of gestural phonology [125] or time series of movementsof key articulators registered with electromagnetic articulog-raphy or ultrasound are on our list for further explorationFor now we use triphones as a convenience construct andwe will show that given the support from the semantics forthe triphones the sequence of phones can be reconstructedwith high accuracy As some models of speech productionassemble articulatory targets from phone segments (eg[126]) the present implementation can be seen as a front-endfor this type of model

Before putting the model to the test we first clarify theway the model works by means of our toy lexicon with thewords one two and three Above we introduced the semanticmatrix S (13) which we repeat here for convenience

S = one two threeonetwothree

( 100201031001

040110 ) (18)

as well as an C indicator matrix specifying which triphonesoccur in which words (12) As in what follows this matrixspecifies the triphones targeted for productionwe henceforthrefer to this matrix as the Tmatrix

Complexity 21

T

= wV wVn Vn tu tu Tr Tri rionetwothree

( 100100

100010

010001

001001 ) (19)

For production our interest is in thematrixG that transformsthe row vectors of S into the row vectors of T ie we need tosolve

SG = T (20)

Given G we can predict for any semantic vector s the vectorof triphones t that quantifies the support for the triphonesprovided by s simply by multiplying s with G

t = sG (21)

As before the transformation matrix G is straightforwardto estimate Let S1015840 denote the Moore-Penrose generalizedinverse of S Since

S1015840SG = S1015840T (22)

we have

G = S1015840T (23)

Given G we can predict the triphone matrix T from thesemantic matrix S

SG = T (24)

For the present example the inverse of S and S1015840 is

S1015840 = one two threeonetwothree

( 110minus021minus009minus029107minus008

minus041minus002104 ) (25)

and the transformation matrix G is

G = wV wVn Vn tu tu Tr Tri ri

onetwothree

( 110minus021minus009110minus021minus009

110minus021minus009minus029107minus008

minus029107minus008minus041minus002104

minus041minus002104minus041minus002104 ) (26)

For this simple example T is virtually identical to T Forrealistic data T will not be identical to T but will be anapproximation of it that is optimal in the least squares senseThe triphones with the strongest support are expected to bethe most likely to be the triphones defining a wordrsquos form

We made use of the S matrix which we derived fromthe tasa corpus as described in Section 3 The majority ofcolumns of the 23561 times 23561 matrix S show very smalldeviations from zero and hence are uninformative As beforewe reduced the number of columns of S by removing columnswith very low variance Here one option is to remove allcolumns with a variance below a preset threshold 120579 Howeverto avoid adding a threshold as a free parameter we setthe number of columns retained to the number of differenttriphones in a given dataset a number which is around 119899 =4500 In summary S denotes a 119908times119899 matrix that specifies foreach of 119908 words an 119899-dimensional semantic vector

52 Model Performance

521 Performance on Monolexomic Words We first exam-ined model performance on a dataset comprising monolex-omicwords that did not carry any inflectional exponentsThisdataset of 3987words comprised three irregular comparatives

(elder less andmore) two irregular superlatives (least most)28 irregular past tense forms and one irregular past participle(smelt) For this set of words we constructed a 3987 times 4446matrix of semantic vectors S119898 (a submatrix of the S matrixintroduced above in Section 3) and a 3987 times 4446 triphonematrix T119898 (a submatrix of T) The number of colums of S119898was set to the number of different triphones (the columndimension of T119898) Those column vectors of S119898 were retainedthat had the 4446 highest variances We then estimated thetransformation matrix G and used this matrix to predict thetriphones that define wordsrsquo forms

We evaluated model performance in two ways First weinspected whether the triphones with maximal support wereindeed the targeted triphones This turned out to be the casefor all words Targeted triphones had an activation valueclose to one and nontargeted triphones an activation closeto zero As the triphones are not ordered we also investi-gated whether the sequence of phones could be constructedcorrectly for these words To this end we set a thresholdof 099 extracted all triphones with an activation exceedingthis threshold and used the all simple paths (a path issimple if the vertices it visits are not visited more than once)function from the igraph package [127] to calculate all pathsstarting with any left-edge triphone in the set of extractedtriphones From the resulting set of paths we selected the

22 Complexity

longest path which invariably was perfectly aligned with thesequence of triphones that defines wordsrsquo forms

We also evaluated model performance with a secondmore general heuristic algorithm that also makes use of thesame algorithm from graph theory Our algorithm sets up agraph with vertices collected from the triphones that are bestsupported by the relevant semantic vectors and considers allpaths it can find that lead from an initial triphone to a finaltriphoneThis algorithm which is presented inmore detail inthe appendix and which is essential for novel complex wordsproduced the correct form for 3982 out of 3987 words Itselected a shorter form for five words int for intent lin forlinnen mis for mistress oint for ointment and pin for pippinThe correct forms were also found but ranked second due toa simple length penalty that is implemented in the algorithm

From these analyses it is clear that mapping nearly4000 semantic vectors on their corresponding triphone pathscan be accomplished with very high accuracy for Englishmonolexomic words The question to be addressed nextis how well this approach works for complex words Wefirst address inflected forms and limit ourselves here to theinflected variants of the present set of 3987 monolexomicwords

522 Performance on Inflected Words Following the classicdistinction between inflection and word formation inflectedwords did not receive semantic vectors of their own Never-theless we can create semantic vectors for inflected words byadding the semantic vector of an inflectional function to thesemantic vector of its base However there are several ways inwhich themapping frommeaning to form for inflectedwordscan be set up To explain this we need some further notation

Let S119898 and T119898 denote the submatrices of S and T thatcontain the semantic and triphone vectors of monolexomicwords Assume that a subset of 119896 of these monolexomicwords is attested in the training corpus with inflectionalfunction 119886 Let T119886 denote the matrix with the triphonevectors of these inflected words and let S119886 denote thecorresponding semantic vectors To obtain S119886 we take thepertinent submatrix S119898119886 from S119898 and add the semantic vectors119886 of the affix

S119886 = S119898119886 + i⨂ s119886 (27)

Here i is a unit vector of length 119896 and ⨂ is the generalizedKronecker product which in (27) stacks 119896 copies of s119886 row-wise As a first step we could define a separate mapping G119886for each inflectional function 119886

S119886G119886 = T119886 (28)

but in this set-up learning of inflected words does not benefitfrom the knowledge of the base words This can be remediedby a mapping for augmented matrices that contain the rowvectors for both base words and inflected words[S119898

S119886]G119886 = [T119898

T119886] (29)

The dimension of G119886 (length of semantic vector by lengthof triphone vector) remains the same so this option is notmore costly than the preceding one Nevertheless for eachinflectional function a separate large matrix is required Amuch more parsimonious solution is to build augmentedmatrices for base words and all inflected words jointly

[[[[[[[[[[S119898S1198861S1198862S119886119899

]]]]]]]]]]G = [[[[[[[[[[

T119898T1198861T1198862T119886119899

]]]]]]]]]](30)

The dimension of G is identical to that of G119886 but now allinflectional functions are dealt with by a single mappingIn what follows we report the results obtained with thismapping

We selected 6595 inflected variants which met the cri-terion that the frequency of the corresponding inflectionalfunction was at least 50 This resulted in a dataset with 91comparatives 97 superlatives 2401 plurals 1333 continuousforms (eg walking) 859 past tense forms and 1086 formsclassed as perfective (past participles) as well as 728 thirdperson verb forms (eg walks) Many forms can be analyzedas either past tenses or as past participles We followed theanalyses of the treetagger which resulted in a dataset inwhichboth inflectional functions are well-attested

Following (30) we obtained an (augmented) 10582 times5483 semantic matrix S whereas before we retained the 5483columns with the highest column variance The (augmented)triphone matrix T for this dataset had the same dimensions

Inspection of the activations of the triphones revealedthat targeted triphones had top activations for 85 of themonolexomic words and 86 of the inflected words Theproportion of words with at most one intruding triphone was97 for both monolexomic and inflected words The graph-based algorithm performed with an overall accuracy of 94accuracies broken down bymorphology revealed an accuracyof 99 for the monolexomic words and an accuracy of 92for inflected words One source of errors for the productionalgorithm is inconsistent coding in the celex database Forinstance the stem of prosper is coded as having a final schwafollowed by r but the inflected forms are coded without ther creating a mismatch between a partially rhotic stem andcompletely non-rhotic inflected variants

We next put model performance to a more stringent testby using 10-fold cross-validation for the inflected variantsFor each fold we trained on all stems and 90 of all inflectedforms and then evaluated performance on the 10of inflectedforms that were not seen in training In this way we can ascer-tain the extent to which our production system (network plusgraph-based algorithm for ordering triphones) is productiveWe excluded from the cross-validation procedure irregularinflected forms forms with celex phonological forms withinconsistent within-paradigm rhoticism as well as forms thestem of which was not available in the training set Thus

Complexity 23

cross-validation was carried out for a total of 6236 inflectedforms

As before the semantic vectors for inflected words wereobtained by addition of the corresponding content and inflec-tional semantic vectors For each training set we calculatedthe transformationmatrixG from the S andTmatrices of thattraining set For an out-of-bag inflected form in the test setwe calculated its semantic vector and multiplied this vectorwith the transformation matrix (using (21)) to obtain thepredicted triphone vector t

The proportion of forms that were predicted correctlywas 062 The proportion of forms ranked second was 025Forms that were incorrectly ranked first typically were otherinflected forms (including bare stems) that happened toreceive stronger support than the targeted form Such errorsare not uncommon in spoken English For instance inthe Buckeye corpus [110] closest is once reduced to closFurthermore Dell [90] classified forms such as concludementfor conclusion and he relax for he relaxes as (noncontextual)errors

The algorithm failed to produce the targeted form for3 of the cases Examples of the forms produced instead arethe past tense for blazed being realized with [zId] insteadof [zd] the plural mouths being predicted as having [Ts] ascoda rather than [Ds] and finest being reduced to finst Thevoiceless production formouth does not follow the dictionarynorm but is used as attested by on-line pronunciationdictionaries Furthermore the voicing alternation is partlyunpredictable (see eg [128] for final devoicing in Dutch)and hence model performance here is not unreasonable Wenext consider model accuracy for derived words

523 Performance on Derived Words When building thevector space model we distinguished between inflectionand word formation Inflected words did not receive theirown semantic vectors By contrast each derived word wasassigned its own lexome together with a lexome for itsderivational function Thus happiness was paired with twolexomes happiness and ness Since the semantic matrix forour dataset already contains semantic vectors for derivedwords we first investigated how well forms are predictedwhen derived words are assessed along with monolexomicwords without any further differentiation between the twoTo this end we constructed a semantic matrix S for 4885words (rows) by 4993 (columns) and constructed the trans-formation matrix G from this matrix and the correspondingtriphone matrix T The predicted form vectors of T =SG supported the targeted triphones above all other tri-phones without exception Furthermore the graph-basedalgorithm correctly reconstructed 99 of all forms withonly 5 cases where it assigned the correct form secondrank

Next we inspected how the algorithm performs whenthe semantic vectors of derived forms are obtained fromthe semantic vectors of their base words and those of theirderivational lexomes instead of using the semantic vectors ofthe derived words themselves To allow subsequent evalua-tion by cross-validation we selected those derived words that

contained an affix that occured at least 30 times in our data set(again (38) agent (177) ful (45) instrument (82) less(54) ly (127) and ness (57)) to a total of 770 complex words

We first combined these derived words with the 3987monolexomic words For the resulting 4885 words the Tand S matrices were constructed from which we derivedthe transformation matrix G and subsequently the matrixof predicted triphone strengths T The proportion of wordsfor which the targeted triphones were the best supportedtriphones was 096 and the graph algorithm performed withan accuracy of 989

In order to assess the productivity of the system weevaluated performance on derived words by means of 10-fold cross-validation For each fold we made sure that eachaffix was present proportionally to its frequency in the overalldataset and that a derived wordrsquos base word was included inthe training set

We first examined performance when the transformationmatrix is estimated from a semantic matrix that contains thesemantic vectors of the derived words themselves whereassemantic vectors for unseen derived words are obtained bysummation of the semantic vectors of base and affix It turnsout that this set-up results in a total failure In the presentframework the semantic vectors of derived words are tooidiosyncratic and too finely tuned to their own collocationalpreferences They are too scattered in semantic space tosupport a transformation matrix that supports the triphonesof both stem and affix

We then used exactly the same cross-validation procedureas outlined above for inflected words constructing semanticvectors for derived words from the semantic vectors of baseand affix and calculating the transformation matrix G fromthese summed vectors For unseen derivedwordsGwas usedto transform the semantic vectors for novel derived words(obtained by summing the vectors of base and affix) intotriphone space

For 75 of the derived words the graph algorithmreconstructed the targeted triphones For 14 of the derivednouns the targeted form was not retrieved These includecases such as resound which the model produced with [s]instead of the (unexpected) [z] sewer which the modelproduced with R instead of only R and tumbler where themodel used syllabic [l] instead of nonsyllabic [l] given in thecelex target form

53 Weak Links in the Triphone Graph and Delays in SpeechProduction An important property of the model is that thesupport for trigrams changes where a wordrsquos graph branchesout for different morphological variants This is illustratedin Figure 9 for the inflected form blending Support for thestem-final triphone End is still at 1 but then blend can endor continue as blends blended or blending The resultinguncertainty is reflected in the weights on the edges leavingEnd In this example the targeted ing form is driven by theinflectional semantic vector for continuous and hence theedge to ndI is best supported For other forms the edgeweights will be different and hence other paths will be bettersupported

24 Complexity

1

1

1051

024023

03

023

1

001 001

1

02

03

023

002

1 0051

bl

blE

lEn

EndndI

dIN

IN

INI

NIN

dId

Id

IdI

dII

IIN

ndz

dz

dzI

zIN

nd

EnI

nIN

lEI

EIN

Figure 9 The directed graph for blending Vertices represent triphones including triphones such as IIN that were posited by the graphalgorithm to bridge potentially relevant transitions that are not instantiated in the training data Edges are labelled with their edge weightsThe graph the edge weights of which are specific to the lexomes blend and continuous incorporates not only the path for blending butalso the paths for blend blends and blended

The uncertainty that arises at branching points in thetriphone graph is of interest in the light of several exper-imental results For instance inter keystroke intervals intyping become longer at syllable and morph boundaries[129 130] Evidence from articulography suggests variabilityis greater at morph boundaries [131] Longer keystroke exe-cution times and greater articulatory variability are exactlywhat is expected under reduced edge support We thereforeexamined whether the edge weights at the first branchingpoint are predictive for lexical processing To this end weinvestigated the acoustic duration of the segment at the centerof the first triphone with a reduced LDL edge weight (in thepresent example d in ndI henceforth lsquobranching segmentrsquo)for those words in our study that are attested in the Buckeyecorpus [110]

The dataset we extracted from the Buckeye corpus com-prised 15105 tokens of a total of 1327 word types collectedfrom 40 speakers For each of these words we calculatedthe relative duration of the branching segment calculatedby dividing segment duration by word duration For thepurposes of statistical evaluation the distribution of thisrelative duration was brought closer to normality bymeans ofa logarithmic transformationWe fitted a generalized additivemixed model [132] to log relative duration with randomintercepts for word and speaker log LDL edge weight aspredictor of interest and local speech rate (Phrase Rate)log neighborhood density (Ncount) and log word frequencyas control variables A summary of this model is presentedin Table 7 As expected an increase in LDL Edge Weight

goes hand in hand with a reduction in the duration of thebranching segment In other words when the edge weight isreduced production is slowed

The adverse effects of weaknesses in a wordrsquos path inthe graph where the path branches is of interest againstthe discussion about the function of weak links in diphonetransitions in the literature on comprehension [133ndash138] Forcomprehension it has been argued that bigram or diphonelsquotroughsrsquo ie low transitional probabilities in a chain of hightransitional probabilities provide points where sequencesare segmented and parsed into their constituents Fromthe perspective of discrimination learning however low-probability transitions function in exactly the opposite wayfor comprehension [54 124] Naive discriminative learningalso predicts that troughs should give rise to shorter process-ing times in comprehension but not because morphologicaldecomposition would proceed more effectively Since high-frequency boundary bigrams and diphones are typically usedword-internally across many words they have a low cuevalidity for thesewords Conversely low-frequency boundarybigrams are much more typical for very specific base+affixcombinations and hence are better discriminative cues thatafford enhanced activation of wordsrsquo lexomes This in turngives rise to faster processing (see also Ramscar et al [78] forthe discriminative function of low-frequency bigrams in low-frequency monolexomic words)

There is remarkable convergence between the directedgraphs for speech production such as illustrated in Figure 9and computational models using temporal self-organizing

Complexity 25

Table 7 Statistics for the partial effects of a generalized additive mixed model fitted to the relative duration of edge segments in the Buckeyecorpus The trend for log frequency is positive accelerating TPRS thin plate regression spline

A parametric coefficients Estimate Std Error t-value p-valueIntercept -19165 00592 -323908 lt 00001Log LDL Edge Weight -01358 00400 -33960 00007Phrase Rate 00047 00021 22449 00248Log Ncount 01114 00189 58837 lt 00001B smooth terms edf Refdf F-value p-valueTPRS log Frequency 19007 19050 75495 00004random intercepts word 11498839 13230000 314919 lt 00001random intercepts speaker 302995 390000 345579 lt 00001

maps (TSOMs [139ndash141]) TSOMs also use error-drivenlearning and in recent work attention is also drawn toweak edges in wordsrsquo paths in the TSOM and the conse-quences thereof for lexical processing [142] We are not usingTSOMs however one reason being that in our experiencethey do not scale up well to realistically sized lexiconsA second reason is that we are pursuing the hypothesisthat form serves meaning and that self-organization ofform by itself is not necessary This hypothesis is likelytoo strong especially as we do not provide a rationale forthe trigram and triphone units that we use to representaspects of form It is worth noting that spatial organizationof form features is not restricted to TSOMs Although inour approach the spatial organization of triphone features isleft unspecified such an organization can be easily enforced(see [85] for further details) by using an algorithm suchas graphopt which self-organizes the vertices of a graphinto a two-dimensional plane Figure 9 was obtained withthis algorithm (Graphopt was developed for the layout oflarge graphs httpwwwschmuhlorggraphopt andis implemented in the igraph package Graphopt uses basicprinciples of physics to iteratively determine an optimallayout Each node in the graph is given both mass and anelectric charge and edges between nodes are modeled asspringsThis sets up a system inwhich there are attracting andrepelling forces between the vertices of the graph and thisphysical system is simulated until it reaches an equilibrium)

54 Discussion Our production network reconstructsknown wordsrsquo forms with a high accuracy 999 formonolexomic words 92 for inflected words and 99 forderived words For novel complex words accuracy under10-fold cross-validation was 62 for inflected words and 75for derived words

The drop in performance for novel forms is perhapsunsurprising given that speakers understand many morewords than they themselves produce even though they hearnovel forms on a fairly regular basis as they proceed throughlife [79 143]

However we also encountered several technical problemsthat are not due to the algorithm but to the representationsthat the algorithm has had to work with First it is surprisingthat accuracy is as high as it is given that the semanticvectors are constructed from a small corpus with a very

simple discriminative algorithm Second we encounteredinconsistencies in the phonological forms retrieved from thecelex database inconsistencies that in part are due to theuse of discrete triphones Third many cases where the modelpredictions do not match the targeted triphone sequence thetargeted forms have a minor irregularity (eg resound with[z] instead of [s]) Fourth several of the typical errors that themodel makes are known kinds of speech errors or reducedforms that one might encounter in engaged conversationalspeech

It is noteworthy that themodel is almost completely data-driven The triphones are derived from celex and otherthan somemanual corrections for inconsistencies are derivedautomatically from wordsrsquo phone sequences The semanticvectors are based on the tasa corpus and were not in any wayoptimized for the production model Given the triphone andsemantic vectors the transformation matrix is completelydetermined No by-hand engineering of rules and exceptionsis required nor is it necessary to specify with hand-codedlinks what the first segment of a word is what its secondsegment is etc as in the weaver model [37] It is only in theheuristic graph algorithm that three thresholds are requiredin order to avoid that graphs become too large to remaincomputationally tractable

The speech production system that emerges from thisapproach comprises first of all an excellent memory forforms that have been encountered before Importantly thismemory is not a static memory but a dynamic one It is not arepository of stored forms but it reconstructs the forms fromthe semantics it is requested to encode For regular unseenforms however a second network is required that projects aregularized semantic space (obtained by accumulation of thesemantic vectors of content and inflectional or derivationalfunctions) onto the triphone output space Importantly itappears that no separate transformations or rules are requiredfor individual inflectional or derivational functions

Jointly the two networks both of which can also betrained incrementally using the learning rule ofWidrow-Hoff[44] define a dual route model attempts to build a singleintegrated network were not successfulThe semantic vectorsof derived words are too idiosyncratic to allow generalizationfor novel forms It is the property of the semantic vectorsof derived lexomes described in Section 323 namely thatthey are close to but not inside the cloud of their content

26 Complexity

lexomes which makes them productive Novel forms do notpartake in the idiosyncracies of lexicalized existing wordsbut instead remain bound to base and derivational lexomesIt is exactly this property that makes them pronounceableOnce produced a novel form will then gravitate as experi-ence accumulates towards the cloud of semantic vectors ofits morphological category meanwhile also developing itsown idiosyncracies in pronunciation including sometimeshighly reduced pronunciation variants (eg tyk for Dutchnatuurlijk ([natyrlk]) [111 144 145] It is perhaps possibleto merge the two routes into one but for this much morefine-grained semantic vectors are required that incorporateinformation about discourse and the speakerrsquos stance withrespect to the adressee (see eg [115] for detailed discussionof such factors)

6 Bringing in Time

Thus far we have not considered the role of time The audiosignal comes in over time longerwords are typically readwithmore than one fixation and likewise articulation is a temporalprocess In this section we briefly outline how time can bebrought into the model We do so by discussing the readingof proper names

Proper names (morphologically compounds) pose achallenge to compositional theories as the people referredto by names such as Richard Dawkins Richard NixonRichardThompson and SandraThompson are not obviouslysemantic composites of each other We therefore assignlexomes to names irrespective of whether the individu-als referred to are real or fictive alive or dead Further-more we assume that when the personal name and familyname receive their own fixations both names are under-stood as pertaining to the same named entity which istherefore coupled with its own unique lexome Thus forthe example sentences in Table 8 the letter trigrams ofJohn are paired with the lexome JohnClark and like-wise the letter trigrams of Clark are paired with this lex-ome

By way of illustration we obtained thematrix of semanticvectors training on 100 randomly ordered tokens of thesentences of Table 8 We then constructed a trigram cuematrix C specifying for each word (John Clark wrote )which trigrams it contains In parallel a matrix L specify-ing for each word its corresponding lexomes (JohnClarkJohnClark write ) was set up We then calculatedthe matrix F by solving CF = L and used F to cal-culate estimated (predicted) semantic vectors L Figure 10presents the correlations of the estimated semantic vectorsfor the word forms John John Welsch Janet and Clarkwith the targeted semantic vectors for the named entitiesJohnClark JohnWelsch JohnWiggam AnneHastieand JaneClark For the sequentially read words John andWelsch the semantic vector generated by John and thatgenerated by Welsch were summed to obtain the integratedsemantic vector for the composite name

Figure 10 upper left panel illustrates that upon readingthe personal name John there is considerable uncertainty

about which named entity is at issue When subsequentlythe family name is read (upper right panel) uncertainty isreduced and John Welsch now receives full support whereasother named entities have correlations close to zero or evennegative correlations As in the small world of the presenttoy example Janet is a unique personal name there is nouncertainty about what named entity is at issue when Janetis read For the family name Clark on its own by contrastJohn Clark and Janet Clark are both viable options

For the present example we assumed that every ortho-graphic word received one fixation However depending onwhether the eye lands far enough into the word namessuch as John Clark and compounds such as graph theorycan also be read with a single fixation For single-fixationreading learning events should comprise the joint trigrams ofthe two orthographic words which are then simultaneouslycalibrated against the targeted lexome (JohnClark graph-theory)

Obviously the true complexities of reading such asparafoveal preview are not captured by this example Never-theless as a rough first approximation this example showsa way forward for integrating time into the discriminativelexicon

7 General Discussion

We have presented a unified discrimination-driven modelfor visual and auditory word comprehension and wordproduction the discriminative lexicon The layout of thismodel is visualized in Figure 11 Input (cochlea and retina)and output (typing and speaking) systems are presented inlight gray the representations of themodel are shown in darkgray For auditory comprehension we make use of frequencyband summary (FBS) features low-level cues that we deriveautomatically from the speech signal The FBS features serveas input to the auditory network F119886 that maps vectors of FBSfeatures onto the semantic vectors of S For reading lettertrigrams represent the visual input at a level of granularitythat is optimal for a functional model of the lexicon (see [97]for a discussion of lower-level visual features) Trigrams arerelatively high-level features compared to the FBS featuresFor features implementing visual input at a much lower levelof visual granularity using histograms of oriented gradients[146] see Linke et al [15] Letter trigrams are mapped by thevisual network F119900 onto S

For reading we also implemented a network heredenoted by K119886 that maps trigrams onto auditory targets(auditory verbal images) represented by triphones Theseauditory triphones in turn are mapped by network H119886onto the semantic vectors S This network is motivated notonly by our observation that for predicting visual lexicaldecision latencies triphones outperform trigrams by a widemargin Network H119886 is also part of the control loop forspeech production see Hickok [147] for discussion of thiscontrol loop and the necessity of auditory targets in speechproduction Furthermore the acoustic durations with whichEnglish speakers realize the stem vowels of English verbs[148] as well as the duration of word final s in English [149]

Complexity 27

Table 8 Example sentences for the reading of proper names To keep the example simple inflectional lexomes are not taken into account

John Clark wrote great books about antsJohn Clark published a great book about dangerous antsJohn Welsch makes great photographs of airplanesJohn Welsch has a collection of great photographs of airplanes landingJohn Wiggam will build great harpsichordsJohn Wiggam will be a great expert in tuning harpsichordsAnne Hastie has taught statistics to great second year studentsAnne Hastie has taught probability to great first year studentsJanet Clark teaches graph theory to second year studentsJohnClark write great book about antJohnClark publish a great book about dangerous antJohnWelsch make great photograph of airplaneJohnWelsch have a collection of great photograph airplane landingJohnWiggam future build great harpsichordJohnWiggam future be a great expert in tuning harpsichordAnneHastie have teach statistics to great second year studentAnneHastie have teach probability to great first year studentJanetClark teach graphtheory to second year student

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

John

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

John Welsch

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

Janet

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

Clark

Pear

son

corr

elat

ion

minus0

50

00

51

0

Figure 10 Correlations of estimated semantic vectors for John JohnWelsch Janet andClarkwith the targeted semantic vectors of JohnClarkJohnWelsch JohnWiggam AnneHastie and JaneClark

can be predicted from ndl networks using auditory targets todiscriminate between lexomes [150]

Speech production is modeled by network G119886 whichmaps semantic vectors onto auditory targets which in turnserve as input for articulation Although current modelsof speech production assemble articulatory targets fromphonemes (see eg [126 147]) a possibility that is certainlycompatible with our model when phones are recontextual-ized as triphones we think that it is worthwhile investigatingwhether a network mapping semantic vectors onto vectors ofarticulatory parameters that in turn drive the articulators canbe made to work especially since many reduced forms arenot straightforwardly derivable from the phone sequences oftheir lsquocanonicalrsquo forms [111 144]

We have shown that the networks of our model havereasonable to high production and recognition accuraciesand also generate a wealth of well-supported predictions forlexical processing Central to all networks is the hypothesisthat the relation between form and meaning can be modeleddiscriminatively with large but otherwise surprisingly simplelinear networks the underlyingmathematics of which (linearalgebra) is well-understood Given the present results it isclear that the potential of linear algebra for understandingthe lexicon and lexical processing has thus far been severelyunderestimated (the model as outlined in Figure 11 is amodular one for reasons of computational convenience andinterpretabilityThe different networks probably interact (seefor some evidence [47]) One such interaction is explored

28 Complexity

typing speaking

orthographic targets

trigrams

auditory targets

triphones

semantic vectors

TASA

auditory cues

FBS features

orthographic cues

trigrams

cochlea retina

To Ta

Ha

GoGa

KaS

Fa Fo

CoCa

Figure 11 Overview of the discriminative lexicon Input and output systems are presented in light gray the representations of the model areshown in dark gray Each of these representations is a matrix with its own set of features Arcs are labeled with the discriminative networks(transformationmatrices) thatmapmatrices onto each otherThe arc from semantic vectors to orthographic vectors is in gray as thismappingis not explored in the present study

in Baayen et al [85] In their production model for Latininflection feedback from the projected triphone paths backto the semantics (synthesis by analysis) is allowed so thatof otherwise equally well-supported paths the path that bestexpresses the targeted semantics is selected For informationflows between systems in speech production see Hickok[147]) In what follows we touch upon a series of issues thatare relevant for evaluating our model

Incremental learning In the present study we esti-mated the weights on the connections of these networkswith matrix operations but importantly these weights canalso be estimated incrementally using the learning rule ofWidrow and Hoff [44] further improvements in accuracyare expected when using the Kalman filter [151] As allmatrices can be updated incrementally (and this holds aswell for the matrix with semantic vectors which are alsotime-variant) and in theory should be updated incrementally

whenever information about the order of learning events isavailable the present theory has potential for studying lexicalacquisition and the continuing development of the lexiconover the lifetime [79]

Morphology without compositional operations Wehave shown that our networks are predictive for a rangeof experimental findings Important from the perspective ofdiscriminative linguistics is that there are no compositionaloperations in the sense of Frege and Russell [1 2 152] Thework on compositionality in logic has deeply influencedformal linguistics (eg [3 153]) and has led to the beliefthat the ldquoarchitecture of the language facultyrdquo is groundedin a homomorphism between a calculus (or algebra) ofsyntactic representations and a calculus (or algebra) based onsemantic primitives Within this tradition compositionalityarises when rules combining representations of form arematched with rules combining representations of meaning

Complexity 29

The approach of Marelli and Baroni [60] who derive thesemantic vector of a complex word from the semantic vectorof its base and a dedicated affix-specific mapping from thepertinent base words to the corresponding derived wordsis in the spirit of compositionality in that a change in form(through affixation) goes hand in hand with a change inmeaning (modeled with a linear mapping in semantic space)

The perspective on morphological processing developedin the present study breaks with this tradition Althoughsemantic vectors for inflected words are obtained by summa-tion of the semantic vectors of the base word and those ofits inflectional functions (see [85] for modeling of the richverbal paradigms of Latin) the form vectors correspondingto the resulting semantic vectors are not obtained by thesummation of vectors for chunks of form nor by any otheroperations on forms Attempts to align chunks of wordsrsquo formwith their semantic structures mdash typically represented bygraphs constructed over semantic primitives (see eg [154])mdash are destined to end in quagmires of arbitrary decisionsabout the number of inflectional classes required (Estonian30 or 300) what the chunks should look like (does Germanhave a morpheme d in its articles and demonstratives derdie das diese ) whether Semitic morphology requiresoperations on form on one or two tiers [155ndash157] andhow many affixal position slots are required for Athebascanlanguages such asNavajo [158] A language that is particularlypainful for compositional theories is Yelı Dnye an Indo-Pacific language spoken on Rossel island (in the south-east ofPapuaNewGuinea)The language has a substantial inventoryof over a 1000 function words used for verb agreementTypically these words are monosyllables that simultaneouslyspecify values for negation tense aspect person and numberof subject deixis evidentiality associated motion and coun-terfactuality Furthermore verbs typically have no less thaneight different stems the use of which depends on particularconstellations of inflectional functions [159 160] Yelı Dnyethus provides a striking counterexample to the hypothesisthat a calculus of meaning would be paralleled by a calculusof form

Whereas the theory of linear discriminative learningradically rejects the idea that morphological processing isgrounded in compositionality as defined in mainstream for-mal linguistic theory it does allow for the conjoint expressionin form of lexical meanings and inflectional functions andfor forms to map onto such conjoint semantic vectors Butjust asmultiple inflectional functions can be expressed jointlyin a single word multiple words can map onto nonconjointsemantic vectors as was illustrated above for named entityrecognition John Wiggam can be understood to refer to avery specific person without this person being a composi-tional function of the proper name John and the surnameWiggam In both these cases there is no isomorphy betweeninflectional functions and lexical meanings on the one handand chunks of form on the other hand

Of course conjoint vectors for lexical meanings andinflectional functions can be implemented andmade to workonly because a Latin verb form such as sapıvissemus isunderstood when building the semantic space S to expresslexomes for person (first) number (plural) voice (active)

tense (pluperfect) mood (subjunctive) and lexical meaning(to know) (see [85] for computational modeling details)Although this could loosely be described as a decomposi-tional approach to inflected words we prefer not to use theterm lsquodecompositionalrsquo as in formal linguistics this termas outlined above has a very different meaning A moreadequate terminology would be that the system is conjointrealizational for production and conjoint inferential for com-prehension (we note here that we have focused on temporallyconjoint realization and inference leaving temporally disjointrealization and inference for sequences of words and possiblycompounds and fixed expression for further research)

Naive versus linear discriminative learning There isone important technical difference between the learningengine of the current study ldl and naive discriminativelearning (ndl) Because ndl makes the simplifying assump-tion that outcomes are orthogonal the weights from the cuesto a given outcome are independent of the weights from thecues to the other outcomes This independence assumptionwhich motivates the word naive in naive discriminativelearning makes it possible to very efficiently update networkweights during incremental learning In ldl by contrast thisindependence no longer holds learning is no longer naiveWe therefore refer to our networks not as ndl networksbut simply as linear discriminative learning networks Actualincremental updating of ldl networks is computationallymore costly but further numerical optimization is possibleand implementations are underway

Scaling An important property of the present approachis that good results are obtained already for datasets ofmodestsize Our semantic matrix is derived from the tasa corpuswhich with only 10 million words is dwarfed by the 28 billionwords of the ukWaC corpus used by Marelli and Baroni [60]more words than any individual speaker can ever encounterin their lifetime Similarly Arnold et al [18] report goodresults for word identification when models are trained on20 hours of speech which contrasts with the huge volumesof speech required for training deep learning networks forspeech recognition Although performance increases some-whatwithmore training data [161] it is clear that considerableheadway can be made with relatively moderate volumesof speech It thus becomes feasible to train models onfor instance Australian English and New Zealand Englishusing already existing speech corpora and to implementnetworks that make precise quantitative predictions for howunderstanding is mediated by listenersrsquo expectations abouttheir interlocutors (see eg [162])

In this study we have obtained good results with asemantic matrix with a dimensionality of around 4000lexomes By itself we find it surprising that a semantic vectorspace can be scaffolded with a mere 4000 words But thisfindingmay also shed light on the phenomenon of childhoodamnesia [163 164] Bauer and Larkina [165] pointed outthat young children have autobiographical memories thatthey may not remember a few years later They argue thatthe rate of forgetting diminishes as we grow older andstabilizes in adulthood Part of this process of forgettingmay relate to the changes in the semantic matrix Cruciallywe expect the correlational structure of lexomes to change

30 Complexity

considerably over time as experience accumulates If this isindeed the case the semantic vectors for the lexomes foryoung children are different from the corresponding lexomesfor older children which will again differ from those ofadults As a consequence autobiographical memories thatwere anchored to the lexomes at a young age can no longerbe accessed as the semantic vectors to which these memorieswere originally anchored no longer exist

No examplars Given how we estimate the transforma-tion matrices with which we navigate between form andmeaning it might seem that our approach is in essence animplementation of an exemplar-based theory of the mentallexicon After all for instance the C119886 matrix specifies foreach auditory word form (exemplar) its pertinent frequencyband summary features However the computational imple-mentation of the present study should not be confused withthe underlying algorithmic conceptualization The underly-ing conceptualization is that learning proceeds incrementallytrial by trial with the weights of transformation networksbeing updated with the learning rule of Widrow and Hoff[44] In the mean trial-by-trial updating with the Widrow-Hoff learning rule results in the same expected connectionweights as obtained by the present estimates using thegeneralized inverse The important conceptual advantage ofincremental learning is however that there is no need to storeexemplars By way of example for auditory comprehensionacoustic tokens are given with the input to the learningsystemThese tokens the ephemeral result of interactingwiththe environment leave their traces in the transformationmatrix F119886 but are not themselves stored The same holdsfor speech production We assume that speakers dynamicallyconstruct the semantic vectors from their past experiencesrather than retrieving these semantic vectors from somedictionary-like fixed representation familiar from currentfile systems The dynamic nature of these semantic vectorsemerged loud and clear from the modeling of novel complexwords for speech production under cross-validation In otherwords our hypothesis is that all input and output vectors areephemeral in the sense that they represent temporary statesof the cognitive system that are not themselves represented inthat system but that leave their traces in the transformationmatrices that jointly characterize and define the cognitivesystem that we approximate with the discriminative lexicon

This dynamic perspective on the mental lexicon asconsisting of several coupled networks that jointly bringthe cognitive system into states that in turn feed into fur-ther networks (not modeled here) for sentence integration(comprehension) and articulation (production) raises thequestion about the status of units in this theory Units playan important role when constructing numeric vectors forform and meaning Units for letter trigrams and phonetrigrams are of course context-enriched letter and phoneunits Content lexomes as well as lexomes for derivationaland inflectional functions are crutches for central semanticfunctions ranging from functions realizing relatively concreteonomasiological functions (lsquothis is a dog not a catrsquo) toabstract situational functions allowing entities events andstates to be specified on the dimensions of time spacequantity aspect etc However crucial to the central thrust of

the present study there are no morphological symbols (unitscombining form andmeaning) normorphological formunitssuch as stems and exponents of any kind in the modelAbove we have outlined the many linguistic considerationsthat have led us not to want to incorporate such units aspart of our theory Importantly as there are no hidden layersin our model it is not possible to seek for lsquosubsymbolicrsquoreflexes of morphological units in patterns of activation overhidden layers Here we depart from earlier connectionistmodels which rejected symbolic representations but retainedthe belief that there should be morphological representationalbeit subsymbolic ones Seidenberg and Gonnerman [166]for instance discuss morphological representations as beinglsquointerlevel representationsrsquo that are emergent reflections ofcorrelations among orthography phonology and semanticsThis is not to say that themore agglutinative a language is themore the present model will seem to the outside observer tobe operating withmorphemes But themore a language tendstowards fusional or polysyntheticmorphology the less strongthis impression will be

Model complexity Next consider the question of howdifficult lexical processing actually is Sidestepping the com-plexities of the neural embedding of lexical processing [126147] here we narrow down this question to algorithmiccomplexity State-of-the-art models in psychology [37 114167] implementmany layers of hierarchically organized unitsand many hold it for an established fact that such units arein some sense psychologically real (see eg [19 91 168])However empirical results can be equally well understoodwithout requiring the theoretical construct of the morpheme(see eg [47 54 58 166 169 170]) The present studyillustrates this point for lsquoboundaryrsquo effects in written andoral production When paths in the triphone graph branchuncertainty increases and processing slows down Althoughobserved lsquoboundaryrsquo effects are compatible with a post-Bloomfieldian lexicon the very same effects arise in ourmodel even though morphemes do not figure in any way inour computations

Importantly the simplicity of the linear mappings in ourmodel of the discriminative lexicon allows us to sidestep theproblem that typically arise in computationally implementedfull-scale hierarchical systems namely that errors arisingat lower levels propagate to higher levels The major stepsforward made in recent end-to-end models in languageengineering suggest that end-to-end cognitive models arealso likely to perform much better One might argue thatthe hidden layers of deep learning networks represent thetraditional lsquohidden layersrsquo of phonemes morphemes andword forms mediating between form and meaning in lin-guistic models and offshoots thereof in psychology Howeverthe case of baboon lexical learning [12 13] illustrates thatthis research strategy is not without risk and that simplerdiscriminative networks can substantially outperform deeplearning networks [15] We grant however that it is conceiv-able that model predictions will improve when the presentlinear networks are replaced by deep learning networks whenpresented with the same input and output representationsused here (see Zhao et al [171] for a discussion of lossfunctions for deep learning targeting numeric instead of

Complexity 31

categorical output vectors) What is unclear is whethersuch networks will improve our understandingmdashthe currentlinear networks offer the analyst connection weights betweeninput and output representations that are straightforwardlylinguistically interpretable Where we think deep learningnetworks really will come into their own is bridging the gapbetween for instance the cochlea and retina on the one handand the heuristic representations (letter trigrams) that wehave used as a starting point for our functional theory of themappings between form and meaning

Network flexibility An important challenge for deeplearning networks designed to model human lexical process-ing is to keep the networks flexible and open to rapid updat-ing This openness to continuous learning is required notonly by findings on categorization [40 41 41 42] includingphonetic categorization [172 173] but is also visible in timeseries of reaction times Baayen and Hendrix [17] reportedimproved prediction of visual lexical decision reaction timesin the British Lexicon Project when a naive discriminativelearning network is updated from trial to trial simulating thewithin-experiment learning that goes on as subjects proceedthrough the experiment In natural speech the phenomenonof phonetic convergence between interlocutors [174] likewisebears witness to a lexicon that is constantly recalibratedin interaction with its environment (for the many ways inwhich interlocutorsrsquo speech aligns see [175]) Likewise therapidity with which listeners adapt to foreign accents [176]within a few minutes shows that lexical networks exhibitfast local optimization As deep learning networks typicallyrequire huge numbers of cycles through the training dataonce trained they are at risk of not having the requiredflexibility for fast local adaptation This risk is reduced for(incremental) wide learning as illustrated by the phoneticconvergence evidenced by the model of Arnold et al [18] forauditory comprehension (a very similar point was made byLevelt [177] in his critique of connectionist models in thenineties)

Open questions This initial study necessarily leavesmany questions unanswered Compounds and inflectedderived words have not been addressed and morphologicaloperations such as reduplication infixation and nonconcate-native morphology provide further testing grounds for thepresent approach We also need to address the processingof compounds with multiple constituents ubiquitous inlanguages as different as German Mandarin and EstonianFurthermore in actual utterances words undergo assimila-tion at their boundaries and hence a crucial test case forthe discriminative lexicon is to model the production andcomprehension of words in multi-word utterances For ourproduction model we have also completely ignored stresssyllabification and pitch which however likely has renderedthe production task more difficult than necessary

A challenge for discriminative morphology is the devel-opment of proper semantic matrices For languages withrich morphology such as Estonian current morphologicalparsers will identify case-inflected words as nominativesgenitives or partitives (among others) but these labelsfor inflectional forms do not correspond to the semanticfunctions encoded in these forms (see eg [178 179]) What

is required are computational tools that detect the appropriateinflectional lexomes for these words in the sentences orutterances in which they are embedded

A related problem specifically for the speech productionmodel is how to predict the strongly reduced word formsthat are rampant in conversational speech [111] Our auditorycomprehension network F119886 is confronted with reduced formsin training and the study of Arnold et al [18] indicates thatpresent identification accuracymay approximate human per-formance for single-word recognition Whereas for auditorycomprehension we are making progress what is required forspeech production is a much more detailed understanding ofthe circumstances under which speakers actually use reducedword forms The factors driving reduction may be in partcaptured by the semantic vectors but we anticipate thataspects of the unfolding discourse play an important role aswell which brings us back to the unresolved question of howto best model not isolated words but words in context

An important challenge for our theory which formalizesincremental implicit learning is how to address and modelthe wealth of explicit knowledge that speakers have abouttheir language We play with the way words sound in poetrywe are perplexed by wordsrsquo meanings and reinterpret themso that they make more sense (eg the folk etymology [180]reflected in modern English crawfish a crustacean and nota fish originally Middle English crevis compare ecrevissein French) we teach morphology in schools and we evenfind that making morphological relations explicit may bebeneficial for aphasic patients [181] The rich culture of worduse cannot but interact with the implicit system but howexactly this interaction unfolds andwhat its consequences areat both sides of the consciousunconscious divide is presentlyunclear

Although many questions remain our model is remark-ably successful in capturing many aspects of lexical andmorphological processing in both comprehension and pro-duction Since themodel builds on a combination of linguisti-callymotivated lsquosmartrsquo low-level representations for form andmeaning and large but otherwise straightforward andmathe-matically well-understood two-layer linear networks (to beclear networks without any hidden layers) the conclusionseems justified that algorithmically the ldquomental lexiconrdquomaybe much simpler than previously thought

Appendix

A Vectors Matrices and MatrixMultiplication

Figure 12 presents two data points 119886 and 119887 with coordinates(3 4) and (-2 -3) Arrows drawn from the origin to thesepoints are shown in blue with a solid and dashed linerespectively We refer to these points as vectors Vectors aredenoted with lower case letters in bold and we place the 119909coordinate of a point next to the 119910 coordinate

a = (3 4) b = (minus2 minus3) (A1)

32 Complexity

minus4 minus2 0 2 4

minus4

minus2

02

4

x

y

a

b

x

y

Figure 12 Points 119886 and 119887 can be transformed into points 119901 and 119902 bya linear transformation

We can represent a and b jointly by placing them next to eachother in a matrix For matrices we use capital letters in boldfont

A = ( 3 4minus2 minus3) (A2)

In addition to the points 119886 and 119887 Figure 12 also shows twoother datapoints 119909 and 119910 The matrix for these two points is

B = (minus5 24 minus1) (A3)

Let us assume that points 119886 and 119887 represent the forms oftwo words 1205961 and 1205962 and that the points 119909 and 119910 representthe meanings of these two words (below we discuss in detailhow exactly wordsrsquo forms and meanings can be representedas vectors of numbers) As morphology is the study of therelation between wordsrsquo forms and their meanings we areinterested in how to transform points 119886 and b into points119909 and 119910 and likewise in how to transform points 119909 and119910 back into points 119886 and 119887 The first transformation isrequired for comprehension and the second transformationfor production

We begin with the transformation for comprehensionFormally we have matrix A which we want to transforminto matrix B A way of mapping A onto B is by means ofa linear transformation using matrix multiplication For 2 times2 matrices matrix multiplication is defined as follows(1198861 11988621198871 1198872) (1199091 11990921199101 1199102) = (11988611199091 + 11988621199101 11988611199092 + 1198862119910211988711199091 + 11988721199101 11988711199092 + 11988721199102) (A4)

Such a mapping of A onto B is given by a matrix F

F = ( 1 2minus2 minus1) (A5)

and it is straightforward to verify that indeed( 3 4minus2 minus3) ( 1 2minus2 minus1) = (minus5 24 minus1) (A6)

Using matrix notation we can write

AF = B (A7)

Given A and B how do we obtain F The answeris straightforward but we need two further concepts theidentity matrix and the matrix inverse For multiplication ofnumbers the identity multiplication is multiplication with 1For matrices multiplication with the identity matrix I leavesthe locations of the points unchangedThe identitymatrix hasones on the main diagonal and zeros elsewhere It is easilyverified that indeed(1 00 1) (1199091 11990921199101 1199102) = (1199091 11990921199101 1199102) (1 00 1) = (1199091 11990921199101 1199102) (A8)

For numbers the inverse of multiplication with 119904 is dividingby 119904 (1119904 ) (119904119909) = 119909 (A9)

For matrices the inverse of a square matrix X is that matrixXminus1 such that their product is the identity matrix

Xminus1X = XXminus1 = I (A10)

For nonsquare matrices the inverse Yminus1 is defined such that

Yminus1 (YX) = (XY) 119884minus1 = X (A11)

We find the square matrix F that maps the square matrices Aonto B as follows

AF = B

Aminus1AF = Aminus1B

IF = Aminus1B

F = Aminus1B(A12)

Since for the present example A and its inverse happen to beidentical (Aminus1 = A) we obtain (A6)

Fmaps wordsrsquo form vectors onto wordsrsquo semantic vectorsLet us now consider the reverse and see how we can obtainwordsrsquo form vectors from wordsrsquo semantic vectors That isgiven B we want to find that matrix G which maps B ontoA ie

BG = A (A13)

Solving this equation proceeds exactly as above

BG = A

Bminus1BG = Bminus1A

IG = Bminus1A

G = Bminus1A(A14)

Complexity 33

The inverse of B is

Bminus1 = (13 2343 53) (A15)

and hence we have

(13 2343 53) ( 3 4minus2 minus3) = (minus13 minus2323 13 ) = G (A16)

The inverse of a matrix needs not exist A square matrixthat does not have an inverse is referred to as a singularmatrix Almost all matrices with which we work in theremainder of this study are singular For singular matricesan approximation of the inverse can be used such as theMoore-Penrose generalized inverse (In R the generalizedinverse is available in theMASS package function ginv Thenumeric library that is used by ginv is LAPACK availableat httpwwwnetliborglapack) In this study wedenote the generalized inverse of a matrix X by X1015840

The examples presented here are restricted to 2 times 2matrices but themathematics generalize tomatrices of largerdimensions When multiplying two matrices X and Y theonly constraint is that the number of columns ofX is the sameas the number of rows ofYThe resultingmatrix has the samenumber of rows as X and the same number of columns as Y

In the present study the rows of matrices represent wordsand columns the features of thesewords For instance the rowvectors of A can represent wordsrsquo form features and the rowvectors of and B the corresponding semantic features Thefirst row vector of A is mapped onto the first row vector ofB as follows (3 4) ( 1 2minus2 minus1) = (minus5 2) (A17)

A transformationmatrix such as F can be represented as atwo-layer networkThe network corresponding to F is shownin Figure 13 When the input vector (3 4) is presented to thenetwork the nodes 1198941 and 1198942 are set to 3 and 4 respectively Toobtain the activations 1199001 = minus5 and 1199002 = 2 for the output vector(minus5 2) we cumulate the evidence at the input weighted bythe connection strengths specified on the edges of the graphThus for 1199001 we obtain the value 3 times 1 + 4 times minus2 = minus5 and for1199002 we have 3 times 2 + 4 times minus1 = 2 In other words the networkmaps the input vector (3 4) onto the output vector (minus5 2)

An important property of linear maps is that they areproductive We can present a novel form vector say

s = (2 2) (A18)

to the network and it will map this vector onto a novelsemantic vector Using matrices this new semantic vector isstraightforwardly calculated(2 2) ( 1 2minus2 minus1) = (minus2 2) (A19)

1 2 minus2 minus1

i1 i2

o1 o2

semantic vectors come out here

form features go in here

connection weightsare the transformationmatrix

Figure 13 The linear network corresponding to the transformationmatrix F There are two input nodes 1198941 and 1198942 (in blue)and twooutput nodes 1199001 and 1199002 in red When the input vector (3 4) ispresented to the network the value at 1199001 is 3 times 1 + 4 times minus2 = minus5and that at 1199002 is 3 times 2 + 4 times minus1 = 2 The network maps the point(3 4) onto the point (minus5 2)B Graph-Based Triphone Sequencing

The hypothesis underlying graph-based triphone sequencingis that in the directed graph that has triphones as verticesand an edge between any two vertices that overlap (iesegments 2 and 3 of triphone 1 are identical to segments 1and 2 of triphone 2) the path from a wordrsquos initial triphone(the triphone starting with ) to a wordrsquos final triphone (thetriphone ending with a ) is the path receiving the bestsupport of all possible paths from the initial to the finaltriphone Unfortunately the directed graph containing allvertices and edges is too large tomake computations tractablein a reasonable amount of computation time The followingalgorithm is a heuristic algorithm that first collects potentiallyrelevant vertices and edges and then calculates all pathsstarting fromany initial vertex using theall simple pathsfunction from the igraph package [127] As a first step allvertices that are supported by the stem and whole wordwith network support exceeding 01 are selected To theresulting set those vertices are added that are supported bythe affix which is conditional on these vertices having anetwork support exceeding 095 A directed graph can nowbe constructed and for each initial vertex all paths startingat these vertices are calculated The subset of those pathsreaching a final vertex is selected and the support for eachpath is calculated As longer paths trivially will have greatersupport the support for a path is weighted for its lengthsimply by dividing the raw support by the path length Pathsare ranked by weighted path support and the path withmaximal support is selected as the acoustic image drivingarticulation

It is possible that no path from an initial vertex to a finalvertex is found due to critical boundary triphones not beinginstantiated in the data on which the model was trainedThis happens when the model is assessed on novel inflected

34 Complexity

nobo

dyso

met

hing

ever

ythi

ngno

thin

gev

eryo

nean

ythi

ngan

yone

som

eone

hers

him his

via

than I

exce

ptou

tside

thei

rsw

hoev

erth

ence

min

ein

side

amon

gst its

thei

rou

rm

yyo

uyo

ur her

she

your

selv

esw

hate

ver us we it

they he

them m

em

yself

him

self

hers

elfou

rselv

esyo

urse

lfits

elfth

emse

lves

abou

tw

ithou

tw

ithin

beyo

ndam

ong

betw

een

thro

ugho

utdu

ring

since by

from in of

with at fo

r tobe

low

abov

e up offdo

wn

upon on

agai

nst

tow

ard

tow

ards

arou

ndne

aral

ong

unde

rbe

hind

besid

eth

roug

hon

toin

toov

erac

ross

nobodysomethingeverythingnothingeveryoneanythinganyonesomeonehershimhisviathanIexceptoutsidetheirswhoeverthencemineinsideamongstitstheirourmyyouyourhersheyourselveswhateverusweittheyhethemmemyselfhimselfherselfourselvesyourselfitselfthemselvesaboutwithoutwithinbeyondamongbetweenthroughoutduringsincebyfrominofwithatfortobelowaboveupoffdownupononagainsttowardtowardsaroundnearalongunderbehindbesidethroughontointooveracross

minus01 0 01 02Value

Color Keyand Histogram

050

015

0025

00C

ount

Figure 14 Heatmap for the correlations of the semantic vectors of pronouns and prepositions Note that quantifier pronouns form a groupof their own that prepositions and the other pronouns form distinguisable groups and that within groups further subclusters are visible egforwe and us she and her and across and over All diagonal elements represent correlations equal to 1 (this is not brought out by the heatmapwhich color-codes diagonal elements with the color for the highest correlation of the range we specified 025)

and derived words under cross-validationTherefore verticesand edges are added to the graph for all nonfinal verticesthat are at the end of any of the paths starting at any initialtriphone Such novel vertices and edges are assigned zeronetwork support Typically paths from the initial vertex tothe final vertex can now be constructed and path support isevaluated as above

For an algorithm that allows triphone paths to includecycles which may be the case in languages with much richer

morphology than English see Baayen et al [85] and for codethe R packageWpmWithLdl described therein

C A Heatmap for Function Words(Pronouns and Prepositions)

The heatmap for the correlations of the semantic vectors ofpronouns and prepositions is presented in Figure 14

Complexity 35

Data Availability

With the exception of the data from theDistributed Little RedHen Lab all primary data are publicly available from the citedsources

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The authors are indebted to Geoff Hollis Jessie Nixon ChrisWestbury and LuisMienhardt for their constructive feedbackon earlier versions of this manuscript This research wassupported by an ERC advancedGrant (no 742545) to the firstauthor

References

[1] G Frege ldquoBegriffsschrift a formula language modeled uponthat of arithmetic for pure thoughtrdquo in From Frege to GodelA Source Book in Mathematical Logic 1879-1931 pp 1ndash82 1967

[2] B RussellAn Inquiry intoMeaning andTruth Allen andUnwinLondon UK 1942

[3] R Montague ldquoThe proper treatment of quantification in ordi-nary englishrdquo in Approaches to Natural Language pp 221ndash242Springer 1973

[4] N Chomsky Syntactic Structures vol 33 Mouton The Hague1957

[5] W J M Levelt Formal Grammars in Linguistics And Psycholin-guistics Volume 1 An Introduction to the Theory of FormalLanguages and Automata Volume 2 Applications in LinguisticTheory Volume 3 Psycholinguistic Applications John BenjaminsPublishing 2008

[6] D F Kleinschmidt and T Florian Jaeger ldquoRobust speechperception Recognize the familiar generalize to the similar andadapt to the novelrdquo Psychological Review vol 122 no 2 pp 148ndash203 2015

[7] L Breiman ldquoRandom forestsrdquo Machine Learning vol 45 no 1pp 5ndash32 2001

[8] C Strobl A-L Boulesteix A Zeileis and T Hothorn ldquoBiasin random forest variable importance measures illustrationssources and a solutionrdquo BMC Bioinformatics vol 8 2007

[9] A Hannun C Case J Casper et al ldquoDeep speechScaling up end-to-end speech recognitionrdquo 2014httpsarxivorgabs14125567

[10] Z Tuske P Golik R Schluter and H Ney ldquoAcoustic modelingwith deep neural networks using raw time signal for lvcsrrdquo inProceedings of the INTERSPEECH pp 890ndash894 2014

[11] J Schmidhuber ldquoDeep learning in neural networks anoverviewrdquo Neural Networks vol 61 pp 85ndash117 2015

[12] T Hannagan J C Ziegler S Dufau J Fagot and J GraingerldquoDeep learning of orthographic representations in baboonsrdquoPLoS ONE vol 9 no 1 2014

[13] J Grainger S Dufau M Montant J C Ziegler and J FagotldquoOrthographic processing in baboons (Papio papio)rdquo Sciencevol 336 no 6078 pp 245ndash248 2012

[14] D Scarf K Boy A U Reinert J Devine O Gunturkun andM Colombo ldquoOrthographic processing in pigeons (columba

livia)rdquo Proceedings of the National Academy of Sciences vol 113no 40 pp 11272ndash11276 2016

[15] M Linke F Broker M Ramscar and R H Baayen ldquoArebaboons learning ldquoorthographicrdquo representations Probablynotrdquo PLoS ONE vol 12 no 8 2017

[16] D H Hubel and T N Wiesel ldquoReceptive fields binocularinteraction and functional architecture in the catrsquos visualcortexrdquo The Journal of Physiology vol 160 pp 106ndash154 1962

[17] R H Baayen and P Hendrix ldquoTwo-layer networks non-linear separation and human learningrdquo in From Semantics toDialectometry Festschrift in honor of JohnNerbonne Tributes 32M Wieling M Kroon G Van Noord and G Bouma Eds pp13ndash22 College Publications 2017

[18] D Arnold F Tomaschek K Sering F Lopez and R H BaayenldquoWords from spontaneous conversational speech can be rec-ognized with human-like accuracy by an error-driven learningalgorithm that discriminates between meanings straight fromsmart acoustic features bypassing the phoneme as recognitionunitrdquo PLoS ONE vol 12 no 4 2017

[19] P Zwitserlood ldquoProcessing and representation of morphologi-cal complexity in native language comprehension and produc-tionrdquo in The Construction of Words G E Booij Ed pp 583ndash602 Springer 2018

[20] G Booij ldquoConstruction morphologyrdquo Language and LinguisticsCompass vol 4 no 7 pp 543ndash555 2010

[21] J P Blevins ldquoWord-based morphologyrdquo Journal of Linguisticsvol 42 no 3 pp 531ndash573 2006

[22] G Stump Inflectional Morphology A Theory of ParadigmStructure Cambridge University Press 2001

[23] R Beard ldquoOn the extent and nature of irregularity in thelexiconrdquo Lingua vol 42 no 4 pp 305ndash341 1977

[24] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1974

[25] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1991

[26] C Hockett ldquoThe origin of speechrdquo Scientific American vol 203pp 89ndash97 1960

[27] M Erelt Estonian Language M Erelt Ed Estonian AcademyPublishers Tallinn Estonia 2003

[28] P H Matthews Grammatical Theory in the United States fromBloomfield to Chomsky Cambridge University Press Cam-bridge 1993

[29] N Chomsky andMHalleTheSound Pattern of English Harperand Row NY USA 1968

[30] T Landauer and S Dumais ldquoA solution to Platorsquos problem Thelatent semantic analysis theory of acquisition induction andrepresentation of knowledgerdquo Psychological Review vol 104 no2 pp 211ndash240 1997

[31] W Weaver ldquoTranslationrdquo in Machine Translation of LanguagesFourteen Essays W N Locke and A D Booth Eds pp 15ndash23MIT Press Cambridge 1955

[32] J R Firth Selected Papers of J R Firth 1952-59 IndianaUniversity Press 1968

[33] K Lund and C Burgess ldquoProducing high-dimensional seman-tic spaces from lexical co-occurrencerdquo Behavior Research Meth-ods Instruments and Computers vol 28 no 2 pp 203ndash2081996

[34] C Shaoul and C Westbury ldquoExploring lexical co-occurrencespace using HiDExrdquo Behavior Research Methods vol 42 no 2pp 393ndash413 2010

36 Complexity

[35] T Mikolov I Sutskever K Chen G Corrado and J DeanldquoDistributed representations of words and phrases and theircompositionalityrdquo inAdvances in Neural Information ProcessingSystems pp 3111ndash3119 2013

[36] A Roelofs ldquoThe WEAVER model of word-form encoding inspeech productionrdquo Cognition vol 64 no 3 pp 249ndash284 1997

[37] W J M Levelt A Roelofs and A S Meyer ldquoA theory of lexicalaccess in speech productionrdquoBehavioral and Brain Sciences vol22 no 1 pp 1ndash38 1999

[38] M Taft ldquoInteractive-activation as a Framework for Under-standing Morphological Processingrdquo Language and CognitiveProcesses vol 9 no 3 pp 271ndash294 1994

[39] R Schreuder and R H Baayen ldquoModeling morphologicalprocessingrdquo inMorphological Aspects of Language Processing LB Feldman Ed pp 131ndash154 Lawrence ErlbaumHillsdale NewJersey USA 1995

[40] B C Love D L Medin and T M Gureckis ldquoSUSTAIN ANetwork Model of Category Learningrdquo Psychological Reviewvol 111 no 2 pp 309ndash332 2004

[41] C J Marsolek ldquoWhat antipriming reveals about primingrdquoTrends in Cognitive Sciences vol 12 no 5 pp 176ndash181 2008

[42] M Ramscar and R Port ldquoCategorization (without categories)rdquoin Handbook of Cognitive Linguistics E Dabrowska and DDivjak Eds pp 75ndash99 De Gruyter Berlin Germany 2015

[43] R A Rescorla and A R Wagner ldquoA theory of Pavlovianconditioning Variations in the effectiveness of reinforcementand nonreinforcementrdquo in Classical Conditioning II CurrentResearch and Theory A H Black and W F Prokasy Eds pp64ndash99 Appleton Century Crofts NY USA 1972

[44] B Widrow and M E Hoff ldquoAdaptive switching circuitsrdquo 1960WESCON Convention Record Part IV pp 96ndash104 1960

[45] R H Baayen R Schreuder and R Sproat ldquoMorphology inthe Mental Lexicon A Computational Model for Visual WordRecognitionrdquo in Lexicon Development for Speech and LanguageProcessing F van Eynde and D Gibbon Eds pp 267ndash291Kluwer Academic Publishers 2000

[46] M Taft ldquoA morphological-decomposition model of lexicalrepresentationrdquo Linguistics vol 26 no 4 pp 657ndash667 1988

[47] MWHarmandM S Seidenberg ldquoComputing themeanings ofwords in reading Cooperative division of labor between visualand phonological processesrdquo Psychological Review vol 111 no3 pp 662ndash720 2004

[48] P C Trimmer J M McNamara A Houston and J A RMarshall ldquoDoes natural selection favour the Rescorla-Wagnerrulerdquo Journal of Theoretical Biology vol 302 pp 39ndash52 2012

[49] R R Miller R C Barnet and N J Grahame ldquoAssessment ofthe Rescorla-Wagner modelrdquo Psychological Bulletin vol 117 no3 pp 363ndash386 1995

[50] L J Kamin ldquoPredictability surprise attention and condition-ingrdquo in Punishment and Aversive Behavior B A Campbell andRMChurch Eds pp 276ndash296Appleton-Century-CroftsNYUSA 1969

[51] R A Rescorla ldquoPavlovian conditioning Itrsquos not what you thinkit isrdquo American Psychologist vol 43 no 3 pp 151ndash160 1988

[52] M Ramscar D Yarlett M Dye K Denny and K ThorpeldquoThe Effects of Feature-Label-Order and Their Implications forSymbolic Learningrdquo Cognitive Science vol 34 no 6 pp 909ndash957 2010

[53] M Ramscar and D Yarlett ldquoLinguistic self-correction in theabsence of feedback A new approach to the logical problem oflanguage acquisitionrdquo Cognitive Science vol 31 no 6 pp 927ndash960 2007

[54] R H Baayen P Milin D F ETHurđevic P Hendrix and MMarelli ldquoAn Amorphous Model for Morphological Process-ing in Visual Comprehension Based on Naive DiscriminativeLearningrdquo Psychological Review vol 118 no 3 pp 438ndash4812011

[55] F M Del Prado Martın R Bertram T Haikio R Schreuderand R H Baayen ldquoMorphological family size in a morphologi-cally rich languageThe case of Finnish compared to Dutch andHebrewrdquo Journal of Experimental Psychology LearningMemoryand Cognition vol 30 pp 1271ndash1278 2004

[56] P Milin D F Durdevic and F M Del Prado Martın ldquoThesimultaneous effects of inflectional paradigms and classes onlexical recognition Evidence from Serbianrdquo Journal of Memoryand Language vol 60 no 1 pp 50ndash64 2009

[57] B K Bergen ldquoThe psychological reality of phonaesthemesrdquoLanguage vol 80 no 2 pp 290ndash311 2004

[58] P Milin L B Feldman M Ramscar P Hendrix and R HBaayen ldquoDiscrimination in lexical decisionrdquo PLoS ONE vol 12no 2 2017

[59] R H Baayen P Milin and M Ramscar ldquoFrequency in lexicalprocessingrdquo Aphasiology vol 30 no 11 pp 1174ndash1220 2016

[60] M Marelli and M Baroni ldquoAffixation in semantic space Mod-eling morpheme meanings with compositional distributionalsemanticsrdquo Psychological Review vol 122 no 3 pp 485ndash5152015

[61] T Sering PMilin andRH Baayen ldquoLanguage comprehensionas amultiple label classification problemrdquo StatisticaNeerlandicapp 1ndash15 2018

[62] R Kaye and RWilson Linear Algebra Oxford University Press1998

[63] S H Ivens and B L Koslin Demands for Reading LiteracyRequire New Accountability Methods Touchstone Applied Sci-ence Associates 1991

[64] T K Landauer P W Foltz and D Laham ldquoAn introduction tolatent semantic analysisrdquoDiscourse Processes vol 25 no 2-3 pp259ndash284 1998

[65] H Schmid ldquoImprovements in Part-of-Speech Tagging with anApplication to Germanrdquo in Proceedings of the ACL SIGDAT-Workshop pp 13ndash25 Dublin Ireland 1995

[66] R H Baayen R Piepenbrock and L Gulikers The CELEXLexical Database (CD-ROM) Linguistic Data ConsortiumUniversity of Pennsylvania Philadelphia PA 1995

[67] J Mitchell and M Lapata ldquoVector-based models of semanticcompositionrdquo in Proceedings of the ACL pp 236ndash244 2008

[68] A Lazaridou M Marelli R Zamparelli and M BaronildquoCompositionally derived representations of morphologicallycomplex words in distributional semanticsrdquo in Proceedings ofthe ACL (1) pp 1517ndash1526 2013

[69] R Cotterell H Schutze and J Eisner ldquoMorphological smooth-ing and extrapolation of word embeddingsrdquo in Proceedings ofthe 54th Annual Meeting of the Association for ComputationalLinguistics vol 1 pp 1651ndash1660 2016

[70] B D Zeller S Pado and J Snajder ldquoTowards semantic valida-tion of a derivational lexiconrdquo in Proceedings of the COLING2014 the 25th International Conference on Computational Lin-guistics Technical Papers pp 1728ndash1739 2014

[71] H Marchand The Categories and Types of Present-Day EnglishWord Formation A Synchronic-Diachronic Approach BeckrsquoscheVerlagsbuchhandlung Munchen Germany 1969

[72] L Bauer EnglishWord-Formation Cambridge University PressCambridge 1983

Complexity 37

[73] I PlagWord-Formation in English CambridgeUniversity PressCambridge 2003

[74] R Beard Lexeme-Morpheme Base Morphology A GeneralTheory of Inflection and Word Formation State University ofNew York Press Albany NY USA 1995

[75] K Geeraert J Newman and R H Baayen ldquoIdiom VariationExperimental Data and a Blueprint of a Computational ModelrdquoTopics in Cognitive Science vol 9 no 3 pp 653ndash669 2017

[76] C Shaoul S Bitschau N Schilling et al ldquondl2 Naive discrim-inative learning an implementation in R R packagerdquo 2015

[77] P Milin D Divjak and R H Baayen ldquoA learning perspectiveon individual differences in skilled reading Exploring andexploiting orthographic and semantic discrimination cuesrdquoJournal of Experimental Psychology Learning Memory andCognition vol 43 no 11 pp 1730ndash1751 2017

[78] M Ramscar P Hendrix C Shaoul P Milin and R H BaayenldquoNonlinear dynamics of lifelong learning the myth of cognitivedeclinerdquo Topics in Cognitive Science vol 6 pp 5ndash42 2014

[79] M Ramscar C C Sun P Hendrix and R H Baayen ldquoTheMis-measurement of Mind Life-Span Changes in Paired-Associate-Learning Scores Reflect the ldquoCostrdquo of Learning Not CognitiveDeclinerdquo Psychological Science vol 28 no 8 pp 1171ndash1179 2017

[80] G desRosiers and D Ivison ldquoPaired associate learning Form 1and form 2 of the Wechsler memory scalerdquo Archives of ClinicalNeuropsychology vol 3 no 1 pp 47ndash67 1988

[81] E Bruni N K Tran andM Baroni ldquoMultimodal distributionalsemanticsrdquo Journal of Artificial Intelligence Research vol 49 pp1ndash47 2014

[82] W N Venables and B D RipleyModern Applied Statistics withS-PLUS Springer NY USA 2002

[83] A B Warriner V Kuperman and M Brysbaert ldquoNorms ofvalence arousal and dominance for 13915 English lemmasrdquoBehavior Research Methods vol 45 no 4 pp 1191ndash1207 2013

[84] D Kastovsky ldquoProductivity in word formationrdquo Linguistics vol24 no 3 pp 585ndash600 1986

[85] R H Baayen Y Chuang and J P Blevins ldquoInflectionalmorphology with linear mappingsrdquoTheMental Lexicon vol 13no 2 pp 232ndash270 2018

[86] Burnage CELEX A Guide for Users Centre for Lexical Infor-mation Nijmegen The Netherlands 1988

[87] C McBride-Chang ldquoModels of Speech Perception and Phono-logical Processing in Readingrdquo Child Development vol 67 no4 pp 1836ndash1856 1996

[88] G C Van Orden H Kloos et al ldquoThe question of phonologyand readingrdquo inThe Science of Reading A Handbook pp 61ndash782005

[89] D Jared and K OrsquoDonnell ldquoSkilled adult readers activate themeanings of high-frequency words using phonology Evidencefrom eye trackingrdquoMemoryampCognition vol 45 no 2 pp 334ndash346 2017

[90] G S Dell ldquoA Spreading-Activation Theory of Retrieval inSentence Productionrdquo Psychological Review vol 93 no 3 pp283ndash321 1986

[91] A Marantz ldquoNo escape from morphemes in morphologicalprocessingrdquo Language and Cognitive Processes vol 28 no 7 pp905ndash916 2013

[92] M Bozic W D Marslen-Wilson E A Stamatakis M HDavis and L K Tyler ldquoDifferentiating morphology formand meaning Neural correlates of morphological complexityrdquoCognitive Neuroscience vol 19 no 9 pp 1464ndash1475 2007

[93] R F Port andA P Leary ldquoAgainst formal phonologyrdquoLanguagevol 81 no 4 pp 927ndash964 2005

[94] J P Blevins ldquoStems and paradigmsrdquo Language vol 79 no 4 pp737ndash767 2003

[95] R Forsyth and D Holmes ldquoFeature-finding for test classifica-tionrdquo Literary and Linguistic Computing vol 11 no 4 pp 163ndash174 1996

[96] H Pham and R H Baayen ldquoVietnamese compounds showan anti-frequency effect in visual lexical decisionrdquo LanguageCognition and Neuroscience vol 30 no 9 pp 1077ndash1095 2015

[97] L Cohen and S Dehaene ldquoVentral and dorsal contributions toword readingrdquo inTheCognitive Neurosciences M S GazzanigaEd pp 789ndash804 Massachusetts Institute of Technology 2009

[98] E Keuleers P Lacey K Rastle and M Brysbaert ldquoThe BritishLexicon Project Lexical decision data for 28730 monosyllabicand disyllabic English wordsrdquo Behavior Research Methods vol44 no 1 pp 287ndash304 2012

[99] R K Wagner J K Torgesen and C A Rashotte ldquoDevel-opment of reading-related phonological processing abilitiesnew evidence of bidirectional causality from a latent variablelongitudinal studyrdquoDevelopmental Psychology vol 30 no 1 pp73ndash87 1994

[100] K F EWong andH-C Chen ldquoOrthographic and phonologicalprocessing in reading Chinese text Evidence from eye fixa-tionsrdquo Language and Cognitive Processes vol 14 no 5-6 pp461ndash480 1999

[101] R L Newman D Jared and C A Haigh ldquoDoes phonologyplay a role when skilled readers read high-frequency wordsEvidence from ERPsrdquo Language and Cognitive Processes vol 27no 9 pp 1361ndash1384 2012

[102] D Jared J Ashby S J Agauas and B A Levy ldquoPhonologicalactivation of word meanings in grade 5 readersrdquo Journal ofExperimental Psychology LearningMemory andCognition vol42 no 4 pp 524ndash541 2016

[103] T Bitan A Kaftory A Meiri-Leib Z Eviatar and O PelegldquoPhonological ambiguity modulates resolution of semanticambiguity during reading An fMRI study of Hebrewrdquo Neu-ropsychology vol 31 no 7 pp 759ndash777 2017

[104] D Jared and S Bainbridge ldquoReading homophone puns Evi-dence from eye trackingrdquo Canadian Journal of ExperimentalPsychology vol 71 no 1 pp 2ndash13 2017

[105] S Amenta M Marelli and S Sulpizio ldquoFrom sound tomeaning Phonology-to-Semantics mapping in visual wordrecognitionrdquo Psychonomic Bulletin amp Review vol 24 no 3 pp887ndash893 2017

[106] M Perrone-Bertolotti J Kujala J R Vidal et al ldquoHow silent issilent reading intracerebral evidence for top-down activationof temporal voice areas during readingrdquo The Journal of Neuro-science vol 32 no 49 pp 17554ndash17562 2012

[107] B Yao P Belin andC Scheepers ldquoSilent reading of direct versusindirect speech activates voice-selective areas in the auditorycortexrdquo Journal of Cognitive Neuroscience vol 23 no 10 pp3146ndash3152 2011

[108] M Coltheart B Curtis P Atkins and M Haller ldquoMod-els of Reading Aloud Dual-Route and Parallel-Distributed-Processing Approachesrdquo Psychological Review vol 100 no 4pp 589ndash608 1993

[109] M Coltheart ldquoModeling reading The dual-route approachrdquo inThe Science of Reading A Handbook pp 6ndash23 2005

38 Complexity

[110] M Pitt K Johnson E Hume S Kiesling and W RaymondldquoThe Buckeye corpus of conversational speech Labeling con-ventions and a test of transcriber reliabilityrdquo Speech Communi-cation vol 45 no 1 pp 89ndash95 2005

[111] K Johnson ldquoMassive reduction in conversational AmericanEnglishrdquo in Proceedings of the Spontaneous Speech Data andAnalysis Proceedings of the 1st Session of the 10th InternationalSymposium pp 29ndash54 The National International Institute forJapanese Language Tokyo Japan 2004

[112] C Phillips ldquoLevels of representation in the electrophysiology ofspeech perceptionrdquo Cognitive Science vol 25 no 5 pp 711ndash7312001

[113] R L Diehl A J Lotto and L L Holt ldquoSpeech perceptionrdquoAnnual Review of Psychology vol 55 pp 149ndash179 2004

[114] D Norris and J M McQueen ldquoShortlist B a Bayesian model ofcontinuous speech recognitionrdquo Psychological Review vol 115no 2 pp 357ndash395 2008

[115] S Hawkins ldquoRoles and representations of systematic finephonetic detail in speech understandingrdquo Journal of Phoneticsvol 31 no 3-4 pp 373ndash405 2003

[116] CCucchiarini andH Strik ldquoAutomatic Phonetic TranscriptionAn overviewrdquo in Proceedings of the 15th ICPhS pp 347ndash350Barcelona Spain 2003

[117] K JohnsonTheAuditoryPerceptual Basis for Speech Segmenta-tion Ohio State University Working Papers in Linguistics 1997

[118] H Fletcher ldquoAuditory patternsrdquo Reviews of Modern Physics vol12 no 1 pp 47ndash65 1940

[119] D Arnold ldquoAcousticndlcoder Coding sound files for use withndl R package version 101rdquo 2017

[120] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2016

[121] J Pickett and I Pollack ldquoIntelligibility of excerpts from fluentspeech Effects of rate of utterance and duration of excerptrdquoLanguage and Speech vol 6 pp 151ndash164 1963

[122] L Shockey ldquoPerception of reduced forms by non-native speak-ers of Englishrdquo in Sound Patterns of Spontaneous Speech DDuez Ed pp 97ndash100 ESCA Aix 1998

[123] M Ernestus H Baayen and R Schreuder ldquoThe recognition ofreduced word formsrdquo Brain and Language vol 81 no 1-3 pp162ndash173 2002

[124] R H Baayen C Shaoul J Willits and M Ramscar ldquoCom-prehension without segmentation a proof of concept withnaive discriminative learningrdquo Language Cognition and Neu-roscience vol 31 no 1 pp 106ndash128 2016

[125] C P Browman and L Goldstein ldquoArticulatory phonology Anoverviewrdquo Phonetica vol 49 no 3-4 pp 155ndash180 1992

[126] J A Tourville and F H Guenther ldquoThe DIVA model A neuraltheory of speech acquisition and productionrdquo Language andCognitive Processes vol 26 no 7 pp 952ndash981 2011

[127] G Csardi and T Nepusz ldquoThe igraph software package forcomplex network researchrdquo InterJournal Complex Systems vol1695 no 5 pp 1ndash9 2006

[128] M Ernestus and R H Baayen ldquoPredicting the unpredictableInterpreting neutralized segments in Dutchrdquo Language vol 79no 1 pp 5ndash38 2003

[129] R Weingarten G Nottbusch and U Will ldquoMorphemes sylla-bles and graphemes in written word productionrdquo inMultidisci-plinary Approaches to Speech Production T Pechmann and CHabel Eds pp 529ndash572 Mouton de Gruyter Berlin Germany2004

[130] R Bertram F E Toslashnnessen S Stromqvist J Hyona andP Niemi ldquoCascaded processing in written compound wordproductionrdquo Frontiers in Human Neuroscience vol 9 pp 1ndash102015

[131] T Cho ldquoEffects of morpheme boundaries on intergesturaltiming Evidence from Koreanrdquo Phonetica vol 58 no 3 pp129ndash162 2001

[132] S N Wood Generalized Additive Models Chapman ampHallCRC NY USA 2017

[133] M Seidenberg ldquoSublexical structures in visual word recogni-tion Access units or orthographic redundancyrdquo in Attentionand Performance XII M Coltheart Ed pp 245ndash264 LawrenceErlbaum Associates Hove 1987

[134] J M McQueen ldquoSegmentation of continuous speech usingphonotacticsrdquo Journal of Memory and Language vol 39 no 1pp 21ndash46 1998

[135] J B Hay Causes and Consequences of Word Structure Rout-ledge NY USA London UK 2003

[136] J B Hay ldquoFrom speech perception to morphology Affix-ordering revisitedrdquo Language vol 78 pp 527ndash555 2002

[137] J B Hay and R H Baayen ldquoPhonotactics parsing and produc-tivityrdquo Italian Journal of Linguistics vol 15 no 1 pp 99ndash1302003

[138] J Hay J Pierrehumbert and M Beckman ldquoSpeech perceptionwell-formedness and the statistics of the lexiconrdquo Papers inlaboratory phonology VI pp 58ndash74 2004

[139] M Ferro C Marzi and V Pirrelli ldquoA self-organizing modelof word storage and processing implications for morphologylearningrdquo Lingue e Linguaggio vol 10 no 2 pp 209ndash226 2011

[140] F ChersiM Ferro G Pezzulo andV Pirrelli ldquoTopological self-organization and prediction learning support both action andlexical chains in the brainrdquo Topics in Cognitive Science vol 6no 3 pp 476ndash491 2014

[141] V Pirrelli M Ferro and C Marzi ldquoComputational complexityof abstractive morphologyrdquo in Understanding and MeasuringMorphological Complexity M Bearman D Brown and G GCorbett Eds pp 141ndash166 Oxford University Press 2015

[142] C Marzi M Ferro and V Pirrelli ldquoIs inflectional irregularitydysfunctional to human processingrdquo in Abstract Booklet TheMental Lexicon 2018 V Kuperman Ed University of AlbertaEdmonton 2018

[143] E Keuleers M Stevens P Mandera and M Brysbaert ldquoWordknowledge in the crowd Measuring vocabulary size and wordprevalence in a massive online experimentrdquo The QuarterlyJournal of Experimental Psychology vol 68 no 8 pp 1665ndash16922015

[144] M Ernestus Voice Assimilation And Segment Reduction inCasual Dutch A Corpus-Based Study of the Phonology-PhoneticsInterface LOT Utrecht the Netherlands 2000

[145] R Kemps M Ernestus R Schreuder and H Baayen ldquoProcess-ing reducedword formsThe suffix restoration effectrdquoBrain andLanguage vol 90 no 1-3 pp 117ndash127 2004

[146] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) vol 1 pp 886ndash893 June 2005

[147] G Hickok ldquoThe architecture of speech production and the roleof the phoneme in speech processingrdquo Language Cognition andNeuroscience vol 29 no 1 pp 2ndash20 2014

[148] B V Tucker M Sims and R H Baayen Opposing Forceson Acoustic Duration Manuscript University of Alberta andUniversity of Tubingen 2018

Complexity 39

[149] I Plag J Homann and G Kunter ldquoHomophony and mor-phology The acoustics of word-final S in Englishrdquo Journal ofLinguistics vol 53 no 1 pp 181ndash216 2017

[150] F Tomaschek I Plag M Ernestus and R H BaayenModeling the Duration of Word-Final s in English withNaive Discriminative Learning Manuscript University ofSiegenTubingenNijmegen 2018

[151] R E Kalman ldquoA new approach to linear filtering and predictionproblemsrdquo Journal of Fluids Engineering vol 82 no 1 pp 35ndash451960

[152] B Russell ldquoOn denotingrdquo Mind vol 14 no 56 pp 479ndash4931905

[153] N Hornstein Logical Form FromGB toMinimalism Blackwell1995

[154] R Jackendoff Semantic StructuresMITPress Cambridge 1990[155] J J McCarthy ldquoA prosodic theory of non-concatenative mor-

phologyrdquo Linguistic Inquiry vol 12 pp 373ndash418 1981[156] A Ussishkin ldquoA Fixed Prosodic Theory of Nonconcatenative

Templaticmorphologyrdquo Natural Language amp Linguistic Theoryvol 23 no 1 pp 169ndash218 2005

[157] A Ussishkin ldquoAffix-favored contrast inequity and psycholin-guistic grounding for non-concatenative morphologyrdquo Mor-phology vol 16 no 1 pp 107ndash125 2006

[158] RW Young andWMorganTheNavajo Language A Grammarand Colloquial Dictionary University of New Mexico Press1980

[159] S C Levinson and A Majid ldquoThe island of time Yelı Dnye thelanguage of Rossel islandrdquo Frontiers in psychology vol 4 2013

[160] S C Levinson ldquoThe language of space in Yelı Dnyerdquo inGrammars of Space Explorations in Cognitive Diversity pp 157ndash203 Cambridge University Press 2006

[161] E Shafaei-Bajestan and R H Baayen ldquoWide Learning forAuditory Comprehensionrdquo in Proceedings of the Interspeech2018 2018

[162] J Hay and K Drager ldquoStuffed toys and speech perceptionrdquoLinguistics vol 48 no 4 pp 865ndash892 2010

[163] V Henri and C Henri ldquoOn our earliest recollections ofchildhoodrdquo Psychological Review vol 2 pp 215-216 1895

[164] S Freud ldquoChildhood and concealing memoriesrdquo in The BasicWritings of Sigmund Freud The Modern Library NY USA19051953

[165] P J Bauer and M Larkina ldquoThe onset of childhood amnesiain childhood A prospective investigation of the course anddeterminants of forgetting of early-life eventsrdquoMemory vol 22no 8 pp 907ndash924 2014

[166] M S Seidenberg and L M Gonnerman ldquoExplaining deriva-tional morphology as the convergence of codesrdquo Trends inCognitive Sciences vol 4 no 9 pp 353ndash361 2000

[167] M Coltheart K Rastle C Perry R Langdon and J ZieglerldquoDRC a dual route cascaded model of visual word recognitionand reading aloudrdquoPsychological Review vol 108 no 1 pp 204ndash256 2001

[168] M V Butz and E F Kutter How the Mind Comes into BeingIntroducing Cognitive Science from a Functional and Computa-tional Perspective Oxford University Press 2016

[169] M W Harm and M S Seidenberg ldquoPhonology readingacquisition and dyslexia Insights from connectionist modelsrdquoPsychological Review vol 106 no 3 pp 491ndash528 1999

[170] L M Gonnerman M S Seidenberg and E S AndersenldquoGraded semantic and phonological similarity effects in prim-ing Evidence for a distributed connectionist approach to

morphologyrdquo Journal of Experimental Psychology General vol136 no 2 pp 323ndash345 2007

[171] H Zhao O Gallo I Frosio and J Kautz ldquoLoss Functions forImage Restoration With Neural Networksrdquo IEEE Transactionson Computational Imaging vol 3 no 1 pp 47ndash57 2017

[172] C Clarke and P Luce ldquoPerceptual adaptation to speakercharacteristics Vot boundaries in stop voicing categorizationrdquoin Proceedings of the ISCA Workshop on Plasticity in SpeechPerception 2005

[173] DNorris JMMcQueen andACutler ldquoPerceptual learning inspeechrdquo Cognitive Psychology vol 47 no 2 pp 204ndash238 2003

[174] A Schweitzer and N Lewandowski ldquoSocial factors in conver-gence of f1 and f2 in spontaneous speechrdquo in Proceedings ofthe 10th International Seminar on Speech Production CologneGermany 2014

[175] M J Pickering and S Garrod ldquoToward a mechanistic psychol-ogy of dialoguerdquo Behavioral and Brain Sciences vol 27 no 2 pp169ndash190 2004

[176] C M Clarke and M F Garrett ldquoRapid adaptation to foreign-accented Englishrdquo The Journal of the Acoustical Society ofAmerica vol 116 no 6 pp 3647ndash3658 2004

[177] W J Levelt ldquoDie konnektionistische moderdquo Sprache UndKognition vol 10 no 2 pp 61ndash72 1991

[178] A Kostic ldquoInformational load constraints on processinginflected morphologyrdquo in Morphological Aspects of LanguageProcessing L B Feldman Ed Lawrence Erlbaum Inc Publish-ers New Jersey USA 1995

[179] J P BlevinsWord andParadigmMorphology OxfordUniversityPress 2016

[180] E Forstemann Uber Deutsche volksetymologie Zeitschrift furvergleichende Sprachforschung auf dem Gebiete des DeutschenGriechischen und Lateinischen 1852

[181] K Nault Morphological Therapy Protocal [Ph D thesis] Uni-versity of Alberta Edmonton 2010

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 3: The Discriminative Lexicon: A Unified Computational Model ...downloads.hindawi.com/journals/complexity/2019/4895891.pdf · as developed within computational linguistics and com-puter

Complexity 3

problem that confronts morpheme-based theories namelythat systematicities in form can exist without correspondingsystematicities in meaning Minimal variation in form canserve to distinguish words or to predict patterns of variationelsewhere in a paradigm or class One striking exampleis given by the locative cases in Estonian which expressmeanings that in English would be realized with locativeand directional prepositions Interestingly most plural caseendings in Estonian are built on the form of the partitivesingular (a grammatical case that can mark for instancethe direct object) However the semantics of these pluralcase forms do not express in any way the semantics of thesingular and the partitive For instance the form for thepartitive singular of the noun for lsquolegrsquo is jalga The formfor expressing lsquoon the legsrsquo jalgadele takes the form of thepartitive singular and adds the formatives for plural and thelocative case for lsquoonrsquo the so-called adessive (see [21 27]for further details) Matthews [28 p 92] characterized theadoption by Chomsky and Halle [29] of the morpheme aslsquoa remarkable tribute to the inertia of ideasrsquo and the samecharacterization pertains to the adoption of themorpheme bymainstream psychology and cognitive science In the presentstudy we adopt from Word and Paradigm morphology theinsight that the word is the basic unit of analysis Wherewe take Word and Paradigm morphology a step furtheris in how we operationalize proportional analogy As willbe laid out in more detail below we construe analogies asbeing system-wide rather than constrained to paradigms andour mathematical formalization better integrates semanticanalogies with formal analogies

22 Distributional Semantics The question that arises at thispoint is what word meanings are In this study we adoptthe approach laid out by Landauer and Dumais [30] andapproximate word meanings by means of semantic vectorsreferred to as embeddings in computational linguisticsWeaver [31] and Firth [32] noted that words with similardistributions tend to have similar meanings This intuitioncan be formalized by counting how often words co-occuracross documents or within some window of text In thisway a wordrsquos meaning comes to be represented by a vectorof reals and the semantic similarity between two words isevaluated by means of the similarities of their vectors Onesuch measure is the cosine similarity of the vectors a relatedmeasure is the Pearson correlation of the vectors Manyimplementations of the same general idea are available suchas hal [33] HiDEx [34] and word2vec [35] Below weprovide further detail on how we estimated semantic vectors

There are two ways in which semantic vectors can beconceptualized within the context of theories of the mentallexicon First semantic vectors could be fixed entities thatare stored in memory in a way reminiscent of a standardprinted or electronic dictionary the entries of which consistof a search key (a wordrsquos form) and a meaning specification(the information accessed through the search key) Thisconceptualization is very close to the currently prevalentway of thinking in psycholinguistics which has adopted aform of naive realism in which word meanings are typicallyassociated with monadic concepts (see eg [36ndash39]) It is

worth noting that when one replaces these monadic conceptsby semantic vectors the general organization of (paper andelectronic) dictionaries can still be retained resulting inresearch questions addressing the nature of the access keys(morphemes whole-words perhaps both) and the processof accessing these keys

However there is good reason to believe that wordmeanings are not fixed static representations The literatureon categorization indicates that the categories (or classes)that arise as we interact with our environment are constantlyrecalibrated [40ndash42] A particularly eloquent example isgiven by Marsolek [41] In a picture naming study hepresented subjects with sequences of two pictures He askedsubjects to say aloud the name of the second picture Thecritical manipulation concerned the similarity of the twopicturesWhen the first picture was very similar to the second(eg a grand piano and a table) responses were slowercompared to the control condition in which visual similaritywas reduced (orange and table) What we see at work hereis error-driven learning when understanding the picture ofa grand piano as signifying a grand piano features such ashaving a large flat surface are strengthened to the grand pianoandweakened to the table As a consequence interpreting thepicture of a table has become more difficult which in turnslows the word naming response

23 Discrimination Learning of Semantic Vectors What isrequired therefore is a conceptualization of word meaningsthat allows for the continuous recalibration of thesemeaningsas we interact with the world To this end we constructsemantic vectors using discrimination learning We take thesentences from a text corpus and for each sentence in theorder in which these sentences occur in the corpus train alinear network to predict the words in that sentence fromthe words in that sentence The training of the network isaccomplished with a simplified version of the learning ruleof Rescorla and Wagner [43] basically the learning rule ofWidrow and Hoff [44] We denote the connection strengthfrom input word 119894 to output word 119895 by 119908119894119895 Let 120575119894 denote anindicator variable that is 1 when input word 119894 is present insentence 119905 and zero otherwise Likewise let 120575119895 denotewhetheroutput word 119895 is present in the sentence Given a learning rate120588 (120588 ≪ 1) the change in the connection strength from (input)word 119894 to (output) word 119895 for sentence (corpus time) 119905 Δ 119894119895 isgiven by Δ 119894119895 = 120575119894120588 (120575119895 minus sum

119896

120575119896119908119896119895) (1)

where 120575119894 and 120575119895 vary from sentence to sentence dependingon which cues and outcomes are present in a sentence Given119899 distinct words an 119899 times 119899 network is updated incrementallyin this way The row vectors of the networkrsquos weight matrixS define wordsrsquo semantic vectors Below we return in moredetail to how we derived the semantic vectors used in thepresent study Importantly we now have semantic represen-tations that are not fixed but dynamic they are constantlyrecalibrated from sentence to sentence (the property ofsemantic vectors changing over time as experience unfolds

4 Complexity

is not unique to the present approach but is shared withother algorithms for constructing semantic vectors such asword2vec However time-variant semantic vectors are instark contrast to the monadic units representing meaningsin models such as proposed by Baayen et al [45] Taft [46]Levelt et al [37] The distributed representation for wordsrsquomeanings in the triangle model [47] were derived fromWordNet and hence are also time-invariant) Thus in whatfollows a wordrsquos meaning is defined as a semantic vectorthat is a function of time it is subject to continuous changeBy itself without context a wordrsquos semantic vector at time 119905reflects its lsquoentanglementrsquo with other words

Obviously it is extremely unlikely that our understandingof the classes and categories that we discriminate betweenis well approximated just by lexical cooccurrence statisticsHowever we will show that text-based semantic vectors aregood enough as a first approximation for implementing acomputationally tractable discriminative lexicon

Returning to the learning rule (1) we note that it iswell-known to capture important aspects of the dynamics ofdiscrimination learning [48 49] For instance it captures theblocking effect [43 50 51] as well as order effects in learning[52] The blocking effect is the finding that when one featurehas been learned to perfection adding a novel feature doesnot improve learning This follows from (1) which is that as119908119894119895 tends towards 1 Δ 119894119895 tends towards 0 A novel featureeven though by itself it is perfectly predictive is blocked fromdeveloping a strong connection weight of its own The ordereffect has been observed for eg the learning of words forcolors Given objects of fixed shape and size but varying colorand the corresponding color words it is essential that theobjects are presented to the learner before the color wordThis ensures that the objectrsquos properties become the featuresfor discriminating between the color words In this caseapplication of the learning rule (1) will result in a strongconnection between the color features of the object to theappropriate color word If the order is reversed the colorwords become features predicting object properties In thiscase application of (1) will result in weights from a color wordto object features that are proportional to the probabilities ofthe objectrsquos features in the training data

Given the dynamics of discriminative learning staticlexical entries are not useful Anticipating more detaileddiscussion below we argue that actually the whole dictio-nary metaphor of lexical access is misplaced and that incomprehensionmeanings are dynamically created from formand that in production forms are dynamically created frommeanings Importantly what forms andmeanings are createdwill vary from case to case as all learning is fundamentallyincremental and subject to continuous recalibration In orderto make the case that this is actually possible and computa-tionally feasible we first introduce the framework of naivediscriminative learning discuss its limitations and then layout how this approach can be enhanced to meet the goals ofthis study

24 Naive Discriminative Learning Naive discriminativelearning (ndl) is a computational framework grounded

in learning rule (1) that was inspired by prior work ondiscrimination learning (eg [52 53]) A model for readingcomplex words was introduced by Baayen et al [54] Thisstudy implemented a two-layer linear network with letterpairs as input features (henceforth cues) and units repre-senting meanings as output classes (henceforth outcomes)Following Word and Paradigm morphology this model doesnot include form representations for morphemes or wordforms It is an end-to-end model for the relation betweenform and meaning that sets itself the task to predict wordmeanings from sublexical orthographic features Baayen etal [54] were able to show that the extent to which cuesin the visual input support word meanings as gaugedby the activation of the corresponding outcomes in thenetwork reflects a wide range of phenomena reported inthe lexical processing literature including the effects offrequency of occurrence family size [55] relative entropy[56] phonaesthemes (nonmorphemic sounds with a con-sistent semantic contribution such as gl in glimmer glowand glisten) [57] and morpho-orthographic segmentation[58]

However the model of Baayen et al [54] adopted a starkform of naive realism albeit just for reasons of computationalconvenience A wordrsquos meaning was defined in terms of thepresence or absence of an outcome (120575119895 in (1)) Subsequentwork sought to address this shortcoming by developingsemantic vectors using learning rule (1) to predict wordsfrom words as explained above [58 59] The term ldquolexomerdquowas introduced as a technical term for the outcomes ina form-to-meaning network conceptualized as pointers tosemantic vectors

In this approach lexomes do double duty In a firstnetwork lexomes are the outcomes that the network istrained to discriminate between given visual input In asecond network the lexomes are the lsquoatomicrsquo units in acorpus that serve as the input for building a distributionalmodel with semantic vectors Within the theory unfoldedin this study the term lexome refers only to the elementsfrom which a semantic vector space is constructed Thedimension of this vector space is equal to the number oflexomes and each lexome is associated with its own semanticvector

It turns out however that mathematically a networkdiscriminating between lexomes given orthographic cues asin naive discriminative learning models is suboptimal Inorder to explain why the set-up of ndl is suboptimal weneed some basic concepts and notation from linear algebrasuch as matrix multiplication and matrix inversion Readersunfamiliar with these concepts are referred to Appendix Awhich provides an informal introduction

241 Limitations of Naive Discriminative Learning Math-ematically naive discrimination learning works with twomatrices a cue matrix C that specifies wordsrsquo form featuresand a matrix S that specifies the targeted lexomes [61] Weillustrate the cuematrix for four words aaa aab abb and abband their letter bigrams aa ab and bb A cell 119888119894119895 is 1 if word 119894has bigram 119895 and zero otherwise

Complexity 5

C =aa ab bb

aaaaababbabb

( 11000111

0011 ) (2)

The third and fourth words are homographs they shareexactly the same bigrams We next define a target matrix Sthat defines for each of the four words the correspondinglexome 120582 This is done by setting one bit to 1 and all otherbits to zero in the row vectors of this matrix

S =1205821 1205822 1205823 1205824

aaaaababbabb

( 10000100

00100001 ) (3)

The problem that arises at this point is that the word formsjointly set up a space with three dimensions whereas thetargeted lexomes set up a four-dimensional space This isproblematic for the following reason A linear mapping of alower-dimensional spaceA onto a higher-dimensional spaceB will result in a subspace of B with a dimensionality thatcannot be greater than that of A [62] If A is a space in R2

and B is a space in R3 a linear mapping of A onto B willresult in a plane inB All the points inB that are not on thisplane cannot be reached fromA with the linear mapping

As a consequence it is impossible for ndl to perfectlydiscriminate between the four lexomes (which set up a four-dimensional space) given their bigram cues (which jointlydefine a three-dimensional space) For the input bb thenetwork therefore splits its support equally over the lexomes1205823 and 1205824 The transformation matrix F is

F = C1015840S = 1205821 1205822 1205823 1205824aaabbb

( 1minus1101minus1

00050005 ) (4)

and the matrix with predicted vectors S is

S = CF =1205821 1205822 1205823 1205824

aaaaababbabb

( 10000100

000505000505 ) (5)

When a cuematrixC and a target matrix S are set up for largenumbers of words the dimension of the ndl cue space willbe substantially smaller than the space of S the dimensionof which is equal to the number of lexomes Thus in the set-up of naive discriminative learning we do take into accountthat words tend to have similar forms but we do not take into

account that words are also similar in meaning By settingup S as completely orthogonal we are therefore making itunnecessarily hard to relate form to meaning

This study therefore implements discrimination learningwith semantic vectors of reals replacing the one-bit-on rowvectors of the target matrix S By doing so we properlyreduce the dimensionality of the target space which in turnmakes discriminative learningmore accurateWewill refer tothis new implementation of discrimination learning as lineardiscriminative learning (ldl) instead of naive discriminativelearning as the outcomes are no longer assumed to beindependent and the networks aremathematically equivalentto linear mappings onto continuous target matrices

25 Linear Transformations and Proportional Analogy Acentral concept in word and paradigm morphology is thatthe form of a regular inflected form stands in a relationof proportional analogy to other words in the inflectionalsystem As explained by Matthews ([25] p 192f)

In effect we are predicting the inflections of servusby analogy with those of dominus As GenitiveSingular domini is to Nominative Singular domi-nus so x (unknown) must be to NominativeSingular servus What then is x Answer it mustbe servi In notation dominus domini = servusservi ([25] p 192f)

Here form variation is associated with lsquomorphosyntacticfeaturesrsquo Such features are often naively assumed to functionas proxies for a notion of lsquogrammatical meaningrsquo Howeverin word and paradigm morphology these features actu-ally represent something more like distribution classes Forexample the accusative singular forms of nouns belongingto different declensions will typically differ in form and beidentified as the lsquosamersquo case in different declensions by virtueof distributional parallels The similarity in meaning thenfollows from the distributional hypothesis which proposesthat linguistic items with similar distributions have similarmeanings [31 32] Thus the analogy of forms

dominus domini = servus servi (6)

is paralleled by an analogy of distributions 119889119889 (dominus) 119889 (domini) = 119889 (servus) 119889 (servi) (7)

Borrowing notational conventions of matrix algebra we canwrite

(dominusdominiservusservi

) sim (119889 (dominus)119889 (domini)119889 (servus)119889 (servi) ) (8)

In this study we operationalize the proportional analogy ofword and paradigmmorphology by replacing the word formsin (8) by vectors of features that are present or absent in aword and by replacing wordsrsquo distributions by semantic vec-tors Linear mappings between form and meaning formalize

6 Complexity

two distinct proportional analogies one analogy for goingfrom form to meaning and a second analogy for going frommeaning to form Consider for instance a semantic matrix Sand a form matrix C with rows representing words

S = (1198861 11988621198871 1198872)C = (1199011 11990121199021 1199022) (9)

The transformation matrix G that maps S onto C111988611198872 minus 11988621198871 ( 1198872 minus1198862minus1198871 1198861 )⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟Sminus1

(1199011 11990121199022 1199022)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟C= 111988611198872 minus 11988621198871 [( 11988721199011 11988721199012minus11988711199011 minus11988711199012) + (minus11988621199021 minus1198862119902211988611199021 11988611199022 )]⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

G

(10)

can be written as the sum of two matrices (both of whichare scaled by 11988611198872 minus 11988621198871) The first matrix takes the firstform vector of C and weights it by the elements of thesecond semantic vector The second matrix takes the secondform vector of C and weights this by the elements of thefirst semantic vector of S Thus with this linear mapping apredicted form vector is a semantically weighted mixture ofthe form vectors of C

The remainder of this paper is structured as follows Wefirst introduce the semantic vector space S that we will beusing which we derived from the tasa corpus [63 64] Nextwe show that we canmodel word comprehension bymeans ofa matrix (or network) F that transforms the cue row vectorsof C into the semantic row vectors of S ie CF = S We thenshow that we can model the production of word forms giventheir semantics by a transformation matrix G ie SG = CwhereC is a matrix specifying for each word which triphonesit contains As the network is informative only about whichtriphones are at issue for a word but remains silent abouttheir order an algorithm building on graph theory is usedto properly order the triphones All models are evaluated ontheir accuracy and are also tested against experimental data

In the general discussion we will reflect on the conse-quences for linguistic theory of our finding that it is indeedpossible to model the lexicon and lexical processing usingsystemic discriminative learning without requiring lsquohiddenunitsrsquo such as morphemes and stems

3 A Semantic Vector Space Derived fromthe TASA Corpus

The first part of this section describes how we constructedsemantic vectors Specific to our approach is that we con-structed semantic vectors not only for content words butalso for inflectional and derivational functions The semanticvectors for these functions are of especial importance forthe speech production component of the model The second

part of this section addresses the validation of the semanticvectors for which we used paired associate learning scoressemantic relatedness ratings and semantic plausibility andsemantic transparency ratings for derived words

31 Constructing the Semantic Vector Space The semanticvector space that we use in this study is derived from thetasa corpus [63 64] We worked with 752130 sentencesfrom this corpus to a total of 10719386 word tokens Weused the treetagger software [65] to obtain for eachword token its stem and a part of speech tag Compoundswritten with a space such as apple pie and jigsaw puzzlewere when listed in the celex lexical database [66] joinedinto one onomasiological unit Information about wordsrsquomorphological structure was retrieved from celex

In computational morphology several options have beenexplored for representing themeaning of affixesMitchell andLapata [67] proposed to model the meaning of an affix as avector that when added to the vector b of a base word givesthe vector d of the derived word They estimated this vectorby calculating d119894 minus b119894 for all available pairs 119894 of base andderived words and taking the average of the resulting vectorsLazaridou et al [68] andMarelli and Baroni [60]modeled thesemantics of affixes by means of matrix operations that takeb as input and produce d as output ie

d = Mb (11)

whereas Cotterell et al [69] constructed semantic vectorsas latent variables in a Gaussian graphical model Whatthese approaches have in common is that they derive orimpute semantic vectors given the semantic vectors of wordsproduced by algorithms such as word2vec algorithmswhichwork with (stemmed) words as input units

From the perspective of discriminative linguistics how-ever it does not make sense to derive one meaning fromanother Furthermore rather than letting writing conven-tions dictate what units are accepted as input to onersquosfavorite tool for constructing semantic vectors includingnot and again but excluding the prefixes un- in- and re-we included not only lexomes for content words but alsolexomes for prefixes and suffixes as units for constructinga semantic vector space As a result semantic vectors forprefixes and suffixes are obtained straightforwardly togetherwith semantic vectors for content words

To illustrate this procedure consider the sentence theboysrsquo happiness was great to observe Standard methods willapply stemming and remove stop words resulting in thelexomes boy happiness great and observe being theinput to the algorithm constructing semantic vectors fromlexical co-occurrences In our approach by contrast thelexomes considered are the boy happiness be greatto and observe and in addition plural ness past Stopwords are retained (although including function words mayseem counterproductive from the perspective of semanticvectors in natural language processing applications they areretained in the present approach for two reasons First sincein our model semantic vectors drive speech production inorder to model the production of function words semantic

Complexity 7

vectors for function words are required Second althoughthe semantic vectors of function words typically are lessinformative (they typically have small association strengthsto very large numbers of words) they still show structureas illustrated in Appendix C for pronouns and prepositions)inflectional endings are replaced by the inflectional functionsthey subserve and for derived words the semantic functionof the derivational suffix is identified as well In what followswe outline in more detail what considerations motivate thisway of constructing the input for our algorithm constructingsemantic vectors We note here that our method for handlingmorphologically complex words can be combined with anyof the currently available algorithms for building semanticvectors We also note here that we are not concerned withthe question of how lexomes for plurality or tense might beinduced from word forms from world knowledge or anycombination of the two All we assume is first that anyoneunderstanding the sentence the boysrsquo happiness was great toobserve knows that more than one boy is involved and thatthe narrator situates the event in the past Second we takethis understanding to drive learning

We distinguished the following seven inflectional func-tions comparative and superlative for adjectives sin-gular and plural for nouns and past perfective con-tinuous persistence (denotes the persistent relevance ofthe predicate in sentences such as London is a big city) andperson3 (third person singular) for verbs

The semantics of derived words tend to be characterizedby idiosyncracies [70] that can be accompanied by formalidiosyncracies (eg business and resound) but often areformally unremarkable (eg heady with meanings as diverseas intoxicating exhilerating impetuous and prudent) Wetherefore paired derived words with their own content lex-omes as well as with a lexome for the derivational functionexpressed in the derivedwordWe implemented the followingderivational lexomes ordinal for ordinal numbers notfor negative un and in undo for reversative un other fornonnegative in and its allomorphs ion for ation ution itionand ion ee for ee agent for agents with er instrumentfor instruments with er impagent for impersonal agentswith er causer for words with er expressing causers (thedifferentiation of different semantic functions for er wasbased on manual annotation of the list of forms with er theassignment of a given form to a category was not informedby sentence context and hence can be incorrect) again forre and ness ity ism ist ic able ive ous ize enceful ish under sub self over out mis and dis for thecorresponding prefixes and suffixes This set-up of lexomes isinformed by the literature on the semantics of English wordformation [71ndash73] and targets affixal semantics irrespective ofaffixal forms (following [74])

This way of setting up the lexomes for the semantic vectorspace model illustrates an important aspect of the presentapproach namely that lexomes target what is understoodand not particular linguistic forms Lexomes are not unitsof form For example in the study of Geeraert et al [75]the forms die pass away and kick the bucket are all linkedto the same lexome die In the present study we have notattempted to identify idioms and likewise no attempt was

made to disambiguate homographs These are targets forfurther research

In the present study we did implement the classicaldistinction between inflection and word formation by rep-resenting inflected words with a content lexome for theirstem but derived words with a content lexome for the derivedword itself In this way the sentence scientists believe thatexposure to shortwave ultraviolet rays can cause blindnessis associated with the following lexomes scientist plbelieve persistence that exposure sg to shortwaveultraviolet ray can present cause blindness andness In our model sentences constitute the learning eventsie for each sentence we train the model to predict thelexomes in that sentence from the very same lexomes inthat sentence Sentences or for spoken corpora utterancesare more ecologically valid than windows of say two wordspreceeding and two words following a target word mdash theinterpretation of a word may depend on the company thatit keeps at long distances in a sentence We also did notremove any function words contrary to standard practicein distributional semantics (see Appendix C for a heatmapillustrating the kind of clustering observable for pronounsand prepositions) The inclusion of semantic vectors for bothaffixes and function words is necessitated by the generalframework of our model which addresses not only com-prehension but also production and which in the case ofcomprehension needs to move beyond lexical identificationas inflectional functions play an important role during theintegration of words in sentence and discourse

Only words with a frequency exceeding 8 occurrences inthe tasa corpus were assigned lexomes This threshold wasimposed in order to avoid excessive data sparseness whenconstructing the distributional vector space The number ofdifferent lexomes that met the criterion for inclusion was23562

We constructed a semantic vector space by training anndl network on the tasa corpus using the ndl2 packagefor R [76] Weights on lexome-to-lexome connections wererecalibrated sentence by sentence in the order in which theyappear in the tasa corpus using the learning rule of ndl(ie a simplified version of the learning rule of Rescorlaand Wagner [43] that has only two free parameters themaximum amount of learning 120582 (set to 1) and a learning rate120588 (set to 0001) This resulted in a 23562 times 23562 matrixSentence-based training keeps the carbon footprint of themodel down as the number of learning events is restrictedto the number of utterances (approximately 750000) ratherthan the number of word tokens (approximately 10000000)The row vectors of the resulting matrix henceforth S are thesemantic vectors that we will use in the remainder of thisstudy (work is in progress to further enhance the S matrixby including word sense disambiguation and named entityrecognition when setting up lexomes The resulting lexomicversion of the TASA corpus and scripts (in python as wellas in R) for deriving the S matrix will be made available athttpwwwsfsuni-tuebingendesimhbaayen)

The weights on themain diagonal of thematrix tend to behigh unsurprisingly as in each sentence on which the modelis trained each of the words in that sentence is an excellent

8 Complexity

predictor of that sameword occurring in that sentenceWhenthe focus is on semantic similarity it is useful to set the maindiagonal of this matrix to zero However for more generalassociation strengths between words the diagonal elementsare informative and should be retained

Unlike standard models of distributional semantics wedid not carry out any dimension reduction using forinstance singular value decomposition Because we do notwork with latent variables each dimension of our semanticspace is given by a column vector of S and hence eachdimension is linked to a specific lexomeThus the rowvectorsof S specify for each of the lexomes how well this lexomediscriminates between all the lexomes known to the model

Although semantic vectors of length 23562 can providegood results vectors of this length can be challenging forstatistical evaluation Fortunately it turns out that many ofthe column vectors of S are characterized by extremely smallvariance Such columns can be removed from the matrixwithout loss of accuracy In practice we have found it sufficestoworkwith approximately 4 to 5 thousand columns selectedcalculating column variances and using only those columnswith a variance exceeding a threshold value

32 Validation of the Semantic Vector Space As shown byBaayen et al [59] and Milin et al [58 77] measures basedon matrices such as S are predictive for behavioral measuressuch as reaction times in the visual lexical decision taskas well as for self-paced reading latencies In what followswe first validate the semantic vectors of S on two data setsone data set with accuracies in paired associate learningand one dataset with semantic similarity ratings We thenconsidered specifically the validity of semantic vectors forinflectional and derivational functions by focusing first onthe correlational structure of the pertinent semantic vectorsfollowed by an examination of the predictivity of the semanticvectors for semantic plausibility and semantic transparencyratings for derived words

321 PairedAssociate Learning Thepaired associate learning(PAL) task is a widely used test in psychology for evaluatinglearning and memory Participants are given a list of wordpairs tomemorize Subsequently at testing they are given thefirst word and have to recall the second wordThe proportionof correctly recalled words is the accuracy measure on thePAL task Accuracy on the PAL test decreases with age whichhas been attributed to cognitive decline over the lifetimeHowever Ramscar et al [78] and Ramscar et al [79] provideevidence that the test actually measures the accumulationof lexical knowledge In what follows we use the data onPAL performance reported by desRosiers and Ivison [80]We fitted a linear mixed model to accuracy in the PAL taskas a function of the Pearson correlation 119903 of paired wordsrsquosemantic vectors in S (but with weights on the main diagonalincluded and using the 4275 columns with highest variance)with random intercepts for word pairs sex and age as controlvariables and crucially an interaction of 119903 by age Given thefindings of Ramscar and colleagues we expect to find that theslope of 119903 (which is always negative) as a predictor of PAL

00 01 02 03 04 05r

010

2030

4050

simila

rity

ratin

g

Figure 1 Similarity rating in the MEN dataset as a function of thecorrelation 119903 between the row vectors in S of the words in the testpairs

accuracy increases with age (indicating decreasing accuracy)Table 1 shows that this prediction is born out For age group20ndash29 (the reference level of age) the slope for 119903 is estimatedat 031 For the next age level this slope is adjusted upward by012 and as age increases these upward adjustments likewiseincrease to 021 024 and 032 for age groups 40ndash49 50-59 and 60-69 respectively Older participants know theirlanguage better and hence are more sensitive to the semanticsimilarity (or lack thereof) of thewords thatmake up PAL testpairs For the purposes of the present study the solid supportfor 119903 as a predictor for PAL accuracy contributes to validatingthe row vectors of S as semantic vectors

322 Semantic Relatedness Ratings We also examined per-formance of S on the MEN test collection [81] that providesfor 3000 word pairs crowd sourced ratings of semanticsimilarity For 2267 word pairs semantic vectors are availablein S for both words Figure 1 shows that there is a nonlinearrelation between the correlation of wordsrsquo semantic vectorsin S and the ratings The plot shows as well that for lowcorrelations the variability in the MEN ratings is larger Wefitted a Gaussian location scale additive model to this dataset summarized in Table 2 which supported 119903 as a predictorfor both mean MEN rating and the variability in the MENratings

To put this performance in perspective wecollected the latent semantic analysis (LSA) similarityscores for the MEN word pairs using the website athttplsacoloradoedu The Spearman correlationfor LSA scores andMEN ratings was 0697 and that for 119903 was0704 (both 119901 lt 00001) Thus our semantic vectors performon a par with those of LSA a well-established older techniquethat still enjoys wide use in psycholinguistics Undoubtedlyoptimized techniques from computational linguistics such asword2vec will outperform our model The important pointhere is that even with training on full sentences rather thanusing small windows and even when including function

Complexity 9

Table 1 Linear mixed model fit to paired associate learning scores with the correlation 119903 of the row vectors in S of the paired words aspredictor Treatment coding was used for factors For the youngest age group 119903 is not predictive but all other age groups show increasinglylarge slopes compared to the slope of the youngest age group The range of 119903 is [minus488 minus253] hence larger values for the coefficients of 119903 andits interactions imply worse performance on the PAL task

A parametric coefficients Estimate Std Error t-value p-valueintercept 36220 06822 53096 lt 00001119903 03058 01672 18293 00691age=39 02297 01869 12292 02207age=49 04665 01869 24964 00135age=59 06078 01869 32528 00014age=69 08029 01869 42970 lt 00001sex=male -01074 00230 -46638 lt 00001119903age=39 01167 00458 25490 00117119903age=49 02090 00458 45640 lt 00001119903age=59 02463 00458 53787 lt 00001119903age=69 03239 00458 70735 lt 00001B smooth terms edf Refdf F-value p-valuerandom intercepts word pair 178607 180000 1282283 lt 00001

Table 2 Summary of a Gaussian location scale additive model fitted to the similarity ratings in the MEN dataset with as predictor thecorrelation 119903 of wordrsquos semantic vectors in the Smatrix TPRS thin plate regression spline b theminimum standard deviation for the logb linkfunction Location parameter estimating the mean scale parameter estimating the variance through the logb link function (120578 = log(120590 minus 119887))A parametric coefficients Estimate Std Error t-value p-valueIntercept [location] 252990 01799 1406127 lt 00001Intercept [scale] 21297 00149 1430299 lt 00001B smooth terms edf Refdf F-value p-valueTPRS 119903 [location] 83749 88869 27662399 lt 00001TPRS 119903 [scale] 59333 70476 875384 lt 00001

words performance of our linear network with linguisticallymotivated lexomes is sufficiently high to serve as a basis forfurther analysis

323 Correlational Structure of Inflectional and DerivationalVectors Now that we have established that the semantic vec-tor space S obtainedwith perhaps the simplest possible error-driven learning algorithm discriminating between outcomesgiven multiple cues (1) indeed captures semantic similaritywe next consider how the semantic vectors of inflectional andderivational lexomes cluster in this space Figure 2 presents aheatmap for the correlation matrix of the functional lexomesthat we implemented as described in Section 31

Nominal and verbal inflectional lexomes as well as adver-bial ly cluster in the lower left corner and tend to be eithernot correlated with derivational lexomes (indicated by lightyellow) or to be negatively correlated (more reddish colors)Some inflectional lexomes however are interspersed withderivational lexomes For instance comparative superla-tive and perfective form a small subcluster togetherwithin the group of derivational lexomes

The derivational lexomes show for a majority of pairssmall positive correlationsand stronger correlations in thecase of the verb-forming derivational lexomes mis over

under and undo which among themselves show thevstrongest positive correlations of allThe inflectional lexomescreating abstract nouns ion ence ity and ness also form asubcluster In other words Figure 2 shows that there is somestructure to the distribution of inflectional and derivationallexomes in the semantic vector space

Derived words (but not inflected words) have their owncontent lexomes and hence their own row vectors in S Do thesemantic vectors of these words cluster according to the theirformatives To address this question we extracted the rowvectors from S for a total of 3500 derivedwords for 31 differentformatives resulting in a 3500 times 23562 matrix 119863 sliced outof S To this matrix we added the semantic vectors for the 31formatives We then used linear discriminant analysis (LDA)to predict a lexomersquos formative from its semantic vector usingthe lda function from the MASS package [82] (for LDA towork we had to remove columns from D with very smallvariance as a consequence the dimension of the matrix thatwas the input to LDA was 3531 times 814) LDA accuracy for thisclassification task with 31 possible outcomes was 072 All 31derivational lexomes were correctly classified

When the formatives are randomly permuted thus break-ing the relation between the semantic vectors and theirformatives LDA accuracy was on average 0528 (range across

10 Complexity

SGPR

ESEN

TPE

RSIS

TEN

CE PLCO

NTI

NU

OU

SPA

ST LYG

ENPE

RSO

N3

PASS

IVE

CAN

FUTU

REU

ND

OU

ND

ERO

VER MIS

ISH

SELF IZ

EIN

STRU

MEN

TD

IS IST

OU

GH

T EE ISM

SUB

COM

PARA

TIV

ESU

PERL

ATIV

EPE

RFEC

TIV

EPE

RSO

N1

ION

ENCE IT

YN

ESS Y

SHA

LLO

RDIN

AL

OU

TAG

AIN

FUL

LESS

ABL

EAG

ENT

MEN

T ICO

US

NO

TIV

E

SGPRESENTPERSISTENCEPLCONTINUOUSPASTLYGENPERSON3PASSIVECANFUTUREUNDOUNDEROVERMISISHSELFIZEINSTRUMENTDISISTOUGHTEEISMSUBCOMPARATIVESUPERLATIVEPERFECTIVEPERSON1IONENCEITYNESSYSHALLORDINALOUTAGAINFULLESSABLEAGENTMENTICOUSNOTIVE

minus02 0 01 02Value

020

060

010

00Color Key

and Histogram

Cou

nt

Figure 2 Heatmap for the correlation matrix of the row vectors in S of derivational and inflectional function lexomes (individual words willbecome legible by zooming in on the figure at maximal magnification)

10 permutations 0519ndash0537) From this we conclude thatderived words show more clustering in the semantic space ofS than can be expected under randomness

How are the semantic vectors of derivational lexomespositioned with respect to the clusters of their derived wordsTo address this question we constructed heatmaps for thecorrelation matrices of the pertinent semantic vectors Anexample of such a heatmap is presented for ness in Figure 3Apart from the existence of clusters within the cluster of nesscontent lexomes it is striking that the ness lexome itself isfound at the very left edge of the dendrogram and at the very

left column and bottom row of the heatmapThe color codingindicates that surprisingly the ness derivational lexome isnegatively correlated with almost all content lexomes thathave ness as formative Thus the semantic vector of nessis not a prototype at the center of its cloud of exemplarsbut an antiprototype This vector is close to the cloud ofsemantic vectors but it is outside its periphery This patternis not specific to ness but is found for the other derivationallexomes as well It is intrinsic to our model

The reason for this is straightforward During learningalthough for a derived wordrsquos lexome 119894 and its derivational

Complexity 11

NES

Ssle

eple

ssne

ssas

sert

iven

ess

dire

ctne

ssso

undn

ess

bign

ess

perm

issiv

enes

sap

prop

riate

ness

prom

ptne

sssh

rew

dnes

sw

eakn

ess

dark

ness

wild

erne

ssbl

ackn

ess

good

ness

kind

ness

lone

lines

sfo

rgiv

enes

ssti

llnes

sco

nsci

ousn

ess

happ

ines

ssic

knes

saw

aren

ess

blin

dnes

sbr

ight

ness

wild

ness

mad

ness

grea

tnes

sso

ftnes

sth

ickn

ess

tend

erne

ssbi

ttern

ess

effec

tiven

ess

hard

ness

empt

ines

sw

illin

gnes

sfo

olish

ness

eage

rnes

sne

atne

sspo

liten

ess

fresh

ness

high

ness

nerv

ousn

ess

wet

ness

unea

sines

slo

udne

ssco

olne

ssid

lene

ssdr

ynes

sbl

uene

ssda

mpn

ess

fond

ness

liken

ess

clean

lines

ssa

dnes

ssw

eetn

ess

read

ines

scle

vern

ess

noth

ingn

ess

cold

ness

unha

ppin

ess

serio

usne

ssal

ertn

ess

light

ness

uglin

ess

flatn

ess

aggr

essiv

enes

sse

lfminusco

nsci

ousn

ess

ruth

lessn

ess

quie

tnes

sva

guen

ess

shor

tnes

sbu

sines

sho

lines

sop

enne

ssfit

ness

chee

rful

ness

earn

estn

ess

right

eous

ness

unpl

easa

ntne

ssqu

ickn

ess

corr

ectn

ess

disti

nctiv

enes

sun

cons

ciou

snes

sfie

rcen

ess

orde

rline

ssw

hite

ness

talln

ess

heav

ines

sw

icke

dnes

sto

ughn

ess

lawle

ssne

ssla

zine

ssnu

mbn

ess

sore

ness

toge

ther

ness

resp

onsiv

enes

scle

arne

ssne

arne

ssde

afne

ssill

ness

dizz

ines

sse

lfish

ness

fatn

ess

rest

lessn

ess

shyn

ess

richn

ess

thor

ough

ness

care

less

ness

tired

ness

fairn

ess

wea

rines

stig

htne

ssge

ntle

ness

uniq

uene

ssfr

iend

lines

sva

stnes

sco

mpl

eten

ess

than

kful

ness

slow

ness

drun

kenn

ess

wre

tche

dnes

sfr

ankn

ess

exac

tnes

she

lples

snes

sat

trac

tiven

ess

fulln

ess

roug

hnes

sm

eann

ess

hope

lessn

ess

drow

sines

ssti

ffnes

sclo

sene

ssgl

adne

ssus

eful

ness

firm

ness

rude

ness

bold

ness

shar

pnes

sdu

llnes

sw

eigh

tless

ness

stra

ngen

ess

smoo

thne

ss

NESSsleeplessnessassertivenessdirectnesssoundnessbignesspermissivenessappropriatenesspromptnessshrewdnessweaknessdarknesswildernessblacknessgoodnesskindnesslonelinessforgivenessstillnessconsciousnesshappinesssicknessawarenessblindnessbrightnesswildnessmadnessgreatnesssoftnessthicknesstendernessbitternesseffectivenesshardnessemptinesswillingnessfoolishnesseagernessneatnesspolitenessfreshnesshighnessnervousnesswetnessuneasinessloudnesscoolnessidlenessdrynessbluenessdampnessfondnesslikenesscleanlinesssadnesssweetnessreadinessclevernessnothingnesscoldnessunhappinessseriousnessalertnesslightnessuglinessflatnessaggressivenessselfminusconsciousnessruthlessnessquietnessvaguenessshortnessbusinessholinessopennessfitnesscheerfulnessearnestnessrighteousnessunpleasantnessquicknesscorrectnessdistinctivenessunconsciousnessfiercenessorderlinesswhitenesstallnessheavinesswickednesstoughnesslawlessnesslazinessnumbnesssorenesstogethernessresponsivenessclearnessnearnessdeafnessillnessdizzinessselfishnessfatnessrestlessnessshynessrichnessthoroughnesscarelessnesstirednessfairnesswearinesstightnessgentlenessuniquenessfriendlinessvastnesscompletenessthankfulnessslownessdrunkennesswretchednessfranknessexactnesshelplessnessattractivenessfullnessroughnessmeannesshopelessnessdrowsinessstiffnessclosenessgladnessusefulnessfirmnessrudenessboldnesssharpnessdullnessweightlessnessstrangenesssmoothness

Color Keyand Histogram

0 05minus05

Value

020

0040

0060

0080

00C

ount

Figure 3 Heatmap for the correlation matrix of lexomes for content words with ness as well as the derivational lexome of ness itself Thislexome is found at the very left edge of the dendrograms and is negatively correlated with almost all content lexomes (Individual words willbecome legible by zooming in on the figure at maximal magnification)

lexome ness are co-present cues the derivational lexomeoccurs in many other words 119895 and each time anotherword 119895 is encountered weights are reduced from ness to119894 As this happens for all content lexomes the derivationallexome is during learning slowly but steadily discriminatedaway from its content lexomes We shall see that this is animportant property for our model to capture morphologicalproductivity for comprehension and speech production

When the additive model of Mitchell and Lapata [67]is used to construct a semantic vector for ness ie when

the average vector is computed for the vectors obtained bysubtracting the vector of the derived word from that of thebase word the result is a vector that is embedded insidethe cluster of derived vectors and hence inherits semanticidiosyncracies from all these derived words

324 Semantic Plausibility and Transparency Ratings forDerived Words In order to obtain further evidence for thevalidity of inflectional and derivational lexomes we re-analyzed the semantic plausibility judgements for word pairs

12 Complexity

consisting of a base and a novel derived form (eg accentaccentable) reported byMarelli and Baroni [60] and availableat httpcliccimecunitnitcomposesFRACSSThe number of pairs available to us for analysis 236 wasrestricted compared to the original 2559 pairs due to theconstraints that the base words had to occur in the tasacorpus Furthermore we also explored the role of wordsrsquoemotional valence arousal and dominance as availablein Warriner et al [83] which restricted the number ofitems even further The reason for doing so is that humanjudgements such as those of age of acquisition may reflectdimensions of emotion [59]

A measure derived from the semantic vectors of thederivational lexomes activation diversity and the L1-normof the semantic vector (the sum of the absolute values of thevector elements ie its city-block distance) turned out tobe predictive for these plausibility ratings as documented inTable 3 and the left panel of Figure 4 Activation diversity is ameasure of lexicality [58] In an auditory word identificationtask for instance speech that gives rise to a low activationdiversity elicits fast rejections whereas speech that generateshigh activation diversity elicits higher acceptance rates but atthe cost of longer response times [18]

Activation diversity interacted with word length Agreater word length had a strong positive effect on ratedplausibility but this effect progressively weakened as acti-vation diversity increases In turn activation diversity hada positive effect on plausibility for shorter words and anegative effect for longer words Apparently the evidencefor lexicality that comes with higher L1-norms contributespositively when there is little evidence coming from wordlength As word length increases and the relative contributionof the formative in theword decreases the greater uncertaintythat comes with higher lexicality (a greater L1-norm impliesmore strong links to many other lexomes) has a detrimentaleffect on the ratings Higher arousal scores also contributedto higher perceived plausibility Valence and dominance werenot predictive and were therefore left out of the specificationof themodel reported here (word frequency was not includedas predictor because all derived words are neologisms withzero frequency addition of base frequency did not improvemodel fit 119901 gt 091)

The degree to which a complex word is semanticallytransparent with respect to its base word is of both practi-cal and theoretical interest An evaluation of the semantictransparency of complex words using transformation matri-ces is developed in Marelli and Baroni [60] Within thepresent framework semantic transparency can be examinedstraightforwardly by comparing the correlations of (i) thesemantic vectors of the base word to which the semanticvector of the affix is added with (ii) the semantic vector ofthe derived word itself The more distant the two vectors arethe more negative their correlation should be This is exactlywhat we find For ness for instance the six most negativecorrelations are business (119903 = minus066) effectiveness (119903 = minus051)awareness (119903 = minus050) loneliness (119903 = minus045) sickness (119903 =minus044) and consciousness (119903 = minus043) Although ness canhave an anaphoric function in discourse [84] words such asbusiness and consciousness have a much deeper and richer

semantics than just reference to a previously mentioned stateof affairs A simple comparison of the wordrsquos actual semanticvector in S (derived from the TASA corpus) and its semanticvector obtained by accumulating evidence over base and affix(ie summing the vectors of base and affix) brings this outstraightforwardly

However evaluating correlations for transparency isprone to researcher bias We therefore also investigated towhat extent the semantic transparency ratings collected byLazaridou et al [68] can be predicted from the activationdiversity of the semantic vector of the derivational lexomeThe original dataset of Lazaridou and colleagues comprises900 words of which 330 meet the criterion of occurringmore than 8 times in the tasa corpus and for which wehave semantic vectors available The summary of a Gaussianlocation-scale additivemodel is given in Table 4 and the rightpanel of Figure 4 visualizes the interaction of word lengthby activation diversity Although modulated by word lengthoverall transparency ratings decrease as activation diversityis increased Apparently the stronger the connections aderivational lexome has with other lexomes the less clearit becomes what its actual semantic contribution to a novelderived word is In other words under semantic uncertaintytransparency ratings decrease

4 Comprehension

Now that we have established that the present semanticvectors make sense even though they are based on a smallcorpus we next consider a comprehension model that hasform vectors as input and semantic vectors as output (a pack-age for R implementing the comprehension and productionalgorithms of linear discriminative learning is available athttpwwwsfsuni-tuebingendesimhbaayenpublicationsWpmWithLdl 10targz The package isdescribed in Baayen et al [85]) We begin with introducingthe central concepts underlying mappings from form tomeaning We then discuss visual comprehension and thenturn to auditory comprehension

41 Setting Up the Mapping Let C denote the cue matrix amatrix that specifies for each word (rows) the form cues ofthat word (columns) For a toy lexicon with the words onetwo and three the Cmatrix is

C

= wV wVn Vn tu tu Tr Tri rionetwothree

( 100100

100010

010001

001001 ) (12)

where we use the disc keyboard phonetic alphabet (theldquoDIstinct SingleCharacterrdquo representationwith one characterfor each phoneme was introduced by CELEX [86]) fortriphones the symbol denotes the word boundary Supposethat the semantic vectors for these words are the row vectors

Complexity 13

26 27

28

29

29

3 3

31

31

32

32

33

33

34

34

35

35

3

3

35

4

45 36

38

40

42

44

Activ

atio

n D

iver

sity

of th

e Der

ivat

iona

l Lex

ome

8 10 12 14 16 186Word Length

36

38

40

42

44

Activ

atio

n D

iver

sity

of th

e Der

ivat

iona

l Lex

ome

6 8 10 12 144Word Length

Figure 4 Interaction of word length by activation diversity in GAMs fitted to plausibility ratings for complex words (left) and to semantictransparency ratings (right)

Table 3 GAM fitted to plausibility ratings for derivational neologisms with the derivational lexomes able again agent ist less and notdata from Marelli and Baroni [60]

A parametric coefficients Estimate Std Error t-value p-valueIntercept -18072 27560 -06557 05127Word Length 05901 02113 27931 00057Activation Diversity 11221 06747 16632 00976Arousal 03847 01521 25295 00121Word Length times Activation Diversity -01318 00500 -26369 00089B smooth terms edf Refdf F-value p-valueRandom Intercepts Affix 33610 40000 556723 lt 00001

of the following matrix S

S = one two threeonetwothree

( 100201031001

040110 ) (13)

We are interested in a transformation matrix 119865 such that

CF = S (14)

The transformation matrix is straightforward to obtain LetC1015840 denote the Moore-Penrose generalized inverse of Cavailable in R as the ginv function in theMASS package [82]Then

F = C1015840S (15)

For the present example

F =one two three

wVwVnVntutuTrTriri

((((((((((((

033033033010010003003003

010010010050050003003003

013013013005005033033033

))))))))))))

(16)

and for this simple example CF is exactly equal to SIn the remainder of this section we investigate how well

this very simple end-to-end model performs for visual wordrecognition as well as auditory word recognition For visualword recognition we use the semantic matrix S developed inSection 3 but we consider two different cue matrices C oneusing letter trigrams (following [58]) and one using phonetrigramsA comparison of the performance of the twomodels

14 Complexity

Table 4 Gaussian location-scale additive model fitted to the semantic transparency ratings for derived words te tensor product smooth ands thin plate regression spline

A parametric coefficients Estimate Std Error t-value p-valueintercept [location] 55016 02564 214537 lt 00001intercept [scale] -04135 00401 -103060 lt 00001B smooth terms edf Refdf F-value p-valuete(activation diversity word length) [location] 53879 62897 237798 00008s(derivational lexome) [location] 143180 150000 3806254 lt 00001s(activation diversity) [scale] 21282 25993 284336 lt 00001s(word length) [scale] 17722 21403 1327311 lt 00001

sheds light on the role of wordsrsquo ldquosound imagerdquo on wordrecognition in reading (cf [87ndash89] for phonological effectsin visual word Recognition) For auditory word recognitionwe make use of the acoustic features developed in Arnold etal [18]

Although the features that we selected as cues are basedon domain knowledge they do not have an ontological statusin our framework in contrast to units such as phonemes andmorphemes in standard linguistic theories In these theoriesphonemes and morphemes are Russelian atomic units offormal phonological and syntactic calculi and considerableresearch has been directed towards showing that these unitsare psychologically real For instance speech errors involvingmorphemes have been interpreted as clear evidence for theexistence in the mind of morphemes [90] Research hasbeen directed at finding behavioral and neural correlates ofphonemes and morphemes [6 91 92] ignoring fundamentalcriticisms of both the phoneme and the morpheme astheoretical constructs [25 93 94]

The features that we use for representing aspects of formare heuristic features that have been selected or developedprimarily because they work well as discriminators We willgladly exchange the present features for other features if theseother features can be shown to afford higher discriminativeaccuracy Nevertheless the features that we use are groundedin domain knowledge For instance the letter trigrams usedfor modeling reading (see eg[58]) are motivated by thefinding in stylometry that letter trigrams are outstandingfeatures for discriminating between authorial hands [95] (inwhat follows we work with letter triplets and triphoneswhich are basically contextually enriched letter and phoneunits For languages with strong phonotactic restrictionssuch that the syllable inventory is quite small (eg Viet-namese) digraphs work appear to work better than trigraphs[96] Baayen et al [85] show that working with four-gramsmay enhance performance and current work in progress onmany other languages shows that for highly inflecting lan-guages with long words 4-grams may outperform 3-gramsFor computational simplicity we have not experimented withmixtures of units of different length nor with algorithmswithwhich such units might be determined) Letter n-grams arealso posited by Cohen and Dehaene [97] for the visual wordform system at the higher endof a hierarchy of neurons tunedto increasingly large visual features of words

Triphones the units that we use for representing thelsquoacoustic imagersquo or lsquoauditory verbal imageryrsquo of canonicalword forms have the same discriminative potential as lettertriplets but have as additional advantage that they do betterjustice compared to phonemes to phonetic contextual inter-dependencies such as plosives being differentiated primarilyby formant transitions in adjacent vowels The acousticfeatures that we use for modeling auditory comprehensionto be introduced in more detail below are motivated in partby the sensitivity of specific areas on the cochlea to differentfrequencies in the speech signal

Evaluation of model predictions proceeds by comparingthe predicted semantic vector s obtained by multiplying anobserved cue vector c with the transformation matrix F (s =cF) with the corresponding target row vector s of S A word 119894is counted as correctly recognized when s119894 is most stronglycorrelated with the target semantic vector s119894 of all targetvectors s119895 across all words 119895

Inflected words do not have their own semantic vectors inS We therefore created semantic vectors for inflected wordsby adding the semantic vectors of stem and affix and addedthese as additional row vectors to S before calculating thetransformation matrix F

We note here that there is only one transformationmatrix ie one discrimination network that covers allaffixes inflectional and derivational as well as derived andmonolexomic (simple) words This approach contrasts withthat of Marelli and Baroni [60] who pair every affix with itsown transformation matrix

42 Visual Comprehension For visual comprehension wefirst consider a model straightforwardly mapping form vec-tors onto semantic vectors We then expand the model withan indirect route first mapping form vectors onto the vectorsfor the acoustic image (derived from wordsrsquo triphone repre-sentations) and thenmapping the acoustic image vectors ontothe semantic vectors

The dataset on which we evaluated our models com-prised 3987 monolexomic English words 6595 inflectedvariants of monolexomic words and 898 derived words withmonolexomic base words to a total of 11480 words Thesecounts follow from the simultaneous constraints of (i) a wordappearing with sufficient frequency in TASA (ii) the wordbeing available in the British Lexicon Project (BLP [98]) (iii)

Complexity 15

the word being available in the CELEX database and theword being of the abovementioned morphological type Inthe present study we did not include inflected variants ofderived words nor did we include compounds

421 The Direct Route Straight from Orthography to Seman-tics To model single word reading as gauged by the visuallexical decision task we used letter trigrams as cues In ourdataset there were a total of 3465 unique trigrams resultingin a 11480 times 3465 orthographic cue matrix C The semanticvectors of the monolexomic and derived words were takenfrom the semantic weight matrix described in Section 3 Forinflected words semantic vectors were obtained by summa-tion of the semantic vectors of base words and inflectionalfunctions For the resulting semantic matrix S we retainedthe 5030 column vectors with the highest variances settingthe cutoff value for the minimal variance to 034times10minus7 FromS and C we derived the transformation matrix F which weused to obtain estimates s of the semantic vectors s in S

For 59 of the words s had the highest correlation withthe targeted semantic vector s (for the correctly predictedwords the mean correlation was 083 and the median was086 With respect to the incorrectly predicted words themean and median correlation were both 048) The accuracyobtained with naive discriminative learning using orthog-onal lexomes as outcomes instead of semantic vectors was27Thus as expected performance of linear discriminationlearning (ldl) is substantially better than that of naivediscriminative learning (ndl)

To assess whether this model is productive in the sensethat it canmake sense of novel complex words we considereda separate set of inflected and derived words which were notincluded in the original dataset For an unseen complexwordboth base word and the inflectional or derivational functionappeared in the training set but not the complex word itselfThe network F therefore was presented with a novel formvector which it mapped onto a novel vector in the semanticspace Recognition was successful if this novel vector wasmore strongly correlated with the semantic vector obtainedby summing the semantic vectors of base and inflectional orderivational function than with any other semantic vector

Of 553 unseen inflected words 43 were recognized suc-cessfully The semantic vectors predicted from their trigramsby the network F were overall well correlated in the meanwith the targeted semantic vectors of the novel inflectedwords (obtained by summing the semantic vectors of theirbase and inflectional function) 119903 = 067 119901 lt 00001The predicted semantic vectors also correlated well with thesemantic vectors of the inflectional functions (119903 = 061 119901 lt00001) For the base words the mean correlation dropped to119903 = 028 (119901 lt 00001)

For unseen derivedwords (514 in total) we also calculatedthe predicted semantic vectors from their trigram vectorsThe resulting semantic vectors s had moderate positivecorrelations with the semantic vectors of their base words(119903 = 040 119901 lt 00001) but negative correlations withthe semantic vectors of their derivational functions (119903 =minus013 119901 lt 00001) They did not correlate with the semantic

vectors obtained by summation of the semantic vectors ofbase and affix (119903 = 001 119901 = 056)

The reduced correlations for the derived words as com-pared to those for the inflected words likely reflects tosome extent that the model was trained on many moreinflected words (in all 6595) than derived words (898)whereas the number of different inflectional functions (7)was much reduced compared to the number of derivationalfunctions (24) However the negative correlations of thederivational semantic vectors with the semantic vectors oftheir derivational functions fit well with the observation inSection 323 that derivational functions are antiprototypesthat enter into negative correlations with the semantic vectorsof the corresponding derived words We suspect that theabsence of a correlation of the predicted vectors with thesemantic vectors obtained by integrating over the vectorsof base and derivational function is due to the semanticidiosyncracies that are typical for derivedwords For instanceausterity can denote harsh discipline or simplicity or apolicy of deficit cutting or harshness to the taste Anda worker is not just someone who happens to work butsomeone earning wages or a nonreproductive bee or waspor a thread in a computer program Thus even within theusages of one and the same derived word there is a lotof semantic heterogeneity that stands in stark contrast tothe straightforward and uniform interpretation of inflectedwords

422 The Indirect Route from Orthography via Phonology toSemantics Although reading starts with orthographic inputit has been shown that phonology actually plays a role duringthe reading process as well For instance developmentalstudies indicate that childrenrsquos reading development can bepredicted by their phonological abilities [87 99] Evidence ofthe influence of phonology on adultsrsquo reading has also beenreported [89 100ndash105] As pointed out by Perrone-Bertolottiet al [106] silent reading often involves an imagery speechcomponent we hear our own ldquoinner voicerdquo while readingSince written words produce a vivid auditory experiencealmost effortlessly they argue that auditory verbal imageryshould be part of any neurocognitive model of readingInterestingly in a study using intracranial EEG recordingswith four epileptic neurosurgical patients they observed thatsilent reading elicits auditory processing in the absence ofany auditory stimulation suggesting that auditory images arespontaneously evoked during reading (see also [107])

To explore the role of auditory images in silent readingwe operationalized sound images bymeans of phone trigrams(in our model phone trigrams are independently motivatedas intermediate targets of speech production see Section 5for further discussion) We therefore ran the model againwith phone trigrams instead of letter triplets The semanticmatrix S was exactly the same as above However a newtransformation matrix F was obtained for mapping a 11480times 5929 cue matrix C of phone trigrams onto S To our sur-prise the triphone model outperformed the trigram modelsubstantially Overall accuracy rose to 78 an improvementof almost 20

16 Complexity

In order to integrate this finding into our model weimplemented an additional network K that maps ortho-graphic vectors (coding the presence or absence of trigrams)onto phonological vectors (indicating the presence or absenceof triphones) The result is a dual route model for (silent)reading with a first route utilizing a network mappingstraight from orthography to semantics and a secondindirect route utilizing two networks one mapping fromtrigrams to triphones and the other subsequently mappingfrom triphones to semantics

The network K which maps trigram vectors onto tri-phone vectors provides good support for the relevant tri-phones but does not provide guidance as to their orderingUsing a graph-based algorithm detailed in the Appendix (seealso Section 52) we found that for 92 of the words thecorrect sequence of triphones jointly spanning the wordrsquossequence of letters received maximal support

We then constructed amatrix Twith the triphone vectorspredicted from the trigrams by networkK and defined a newnetwork H mapping these predicted triphone vectors ontothe semantic vectors SThemean correlation for the correctlyrecognized words was 090 and the median correlation was094 For words that were not recognized correctly the meanand median correlation were 045 and 043 respectively Wenow have for each word two estimated semantic vectors avector s1 obtained with network F of the first route and avector s2 obtained with network H from the auditory targetsthat themselves were predicted by the orthographic cuesThemean of the correlations of these pairs of vectors was 119903 = 073(119901 lt 00001)

To assess to what extent the networks that we definedare informative for actual visual word recognition we madeuse of the reaction times in the visual lexical decision taskavailable in the BLPAll 11480words in the current dataset arealso included in BLPWe derived threemeasures from each ofthe two networks

The first measure is the sum of the L1-norms of s1 ands2 to which we will refer as a wordrsquos total activation diversityThe total activation diversity is an estimate of the support fora wordrsquos lexicality as provided by the two routes The secondmeasure is the correlation of s1 and s2 to which we will referas a wordrsquos route congruency The third measure is the L1-norm of a lexomersquos column vector in S to which we referas its prior This last measure has previously been observedto be a strong predictor for lexical decision latencies [59] Awordrsquos prior is an estimate of a wordrsquos entrenchment and prioravailability

We fitted a generalized additive model to the inverse-transformed response latencies in the BLP with these threemeasures as key covariates of interest including as controlpredictors word length andword type (derived inflected andmonolexomic with derived as reference level) As shown inTable 5 inflected words were responded to more slowly thanderived words and the same holds for monolexomic wordsalbeit to a lesser extent Response latencies also increasedwith length Total activation diversity revealed a U-shapedeffect (Figure 5 left panel) For all but the lowest totalactivation diversities we find that response times increase as

total activation diversity increases This result is consistentwith previous findings for auditory word recognition [18]As expected given the results of Baayen et al [59] the priorwas a strong predictor with larger priors affording shorterresponse times There was a small effect of route congruencywhich emerged in a nonlinear interaction with the priorsuch that for large priors a greater route congruency affordedfurther reduction in response time (Figure 5 right panel)Apparently when the two routes converge on the samesemantic vector uncertainty is reduced and a faster responsecan be initiated

For comparison the dual-route model was implementedwith ndl as well Three measures were derived from thenetworks and used to predict the same response latenciesThese measures include activation activation diversity andprior all of which have been reported to be reliable predictorsfor RT in visual lexical decision [54 58 59] However sincehere we are dealing with two routes the three measures canbe independently derived from both routes We thereforesummed up the measures of the two routes obtaining threemeasures total activation total activation diversity and totalprior With word type and word length as control factorsTable 6 shows that thesemeasures participated in a three-wayinteraction presented in Figure 6 Total activation showed aU-shaped effect on RT that is increasingly attenuated as totalactivation diversity is increased (left panel) Total activationalso interacted with the total prior (right panel) For mediumrange values of total activation RTs increased with totalactivation and as expected decreased with the total priorThecenter panel shows that RTs decrease with total prior formostof the range of total activation diversity and that the effect oftotal activation diversity changes sign going from low to highvalues of the total prior

Model comparison revealed that the generalized additivemodel with ldl measures (Table 5) provides a substantiallyimproved fit to the data compared to the GAM using ndlmeasures (Table 6) with a difference of no less than 65101AIC units At the same time the GAM based on ldl is lesscomplex

Given the substantially better predictability of ldl mea-sures on human behavioral data one remaining question iswhether it is really the case that two routes are involved insilent reading After all the superior accuracy of the secondroute might be an artefact of a simple machine learningtechnique performing better for triphones than for trigramsThis question can be addressed by examining whether thefit of the GAM summarized in Table 5 improves or worsensdepending on whether the activation diversity of the first orthe second route is taken out of commission

When the GAM is provided access to just the activationdiversity of s2 the AIC increased by 10059 units Howeverwhen themodel is based only on the activation diversity of s1themodel fit increased by no less than 14790 AIC units Fromthis we conclude that at least for the visual lexical decisionlatencies in the BLP the second route first mapping trigramsto triphones and then mapping triphones onto semanticvectors plays the more important role

The superiority of the second route may in part bedue to the number of triphone features being larger

Complexity 17

minus0295

minus00175

026

part

ial e

ffect

1 2 3 4 50Total activation diversity

000

005

010

015

Part

ial e

ffect

05

10

15

20

Prio

r

02 04 06 08 1000Selfminuscorrelation

Figure 5 The partial effects of total activation diversity (left) and the interaction of route congruency and prior (right) on RT in the BritishLexicon Project

Table 5 Summary of a generalized additive model fitted to response latencies in visual lexical decision using measures based on ldl s thinplate regression spline smooth te tensor product smooth

A parametric coefficients Estimate Std Error t-value p-valueintercept -17774 00087 -205414 lt0001word typeInflected 01110 00059 18742 lt0001word typeMonomorphemic 00233 00054 4322 lt0001word length 00117 00011 10981 lt0001B smooth terms edf Refdf F p-values(total activation diversity) 6002 7222 236 lt0001te(route congruency prior) 14673 17850 2137 lt0001

02 06 10 14

NDL total activation

minus0171

0031

0233

part

ial e

ffect

minus4 minus2 0 2 4

10

15

20

25

30

35

40

NDL total prior

ND

L to

tal a

ctiv

atio

n di

vers

ity

minus0416

01915

0799

part

ial e

ffect

02 06 10 14

minus4

minus2

02

4

NDL total activation

ND

L to

tal p

rior

minus0158

01635

0485

part

ial e

ffect

10

15

20

25

30

35

40

ND

L to

tal a

ctiv

atio

n di

vers

ity

Figure 6The interaction of total activation and total activation diversity (left) of total prior by total activation diversity (center) and of totalactivation by total prior (right) on RT in the British Lexicon Project based on ndl

18 Complexity

Table 6 Summary of a generalized additive model fitted to response latencies in visual lexical decision using measures from ndl te tensorproduct smooth

A parametric coefficients Estimate Std Error t-value p-valueintercept -16934 00086 -195933 lt0001word typeInflected 00286 00052 5486 lt0001word typeMonomorphemic 00042 00055 0770 = 4word length 00066 00011 5893 lt0001B smooth terms edf Refdf F p-valuete(activation activation diversity prior) 4134 4972 1084 lt0001

than the number of trigram features (3465 trigrams ver-sus 5929 triphones) More features which mathematicallyamount to more predictors enable more precise map-pings Furthermore the heavy use made in English ofletter sequences such as ough (with 10 different pro-nunciations httpswwwdictionarycomesough)reduces semantic discriminability compared to the corre-sponding spoken forms It is noteworthy however that thebenefits of the triphone-to-semantics mapping are possibleonly thanks to the high accuracy with which orthographictrigram vectors are mapped onto phonological triphonevectors (92)

It is unlikely that the lsquophonological routersquo is always dom-inant in silent reading Especially in fast lsquodiagonalrsquo readingthe lsquodirect routersquomay bemore dominantThere is remarkablealthough for the present authors unexpected convergencewith the dual route model for reading aloud of Coltheartet al [108] and Coltheart [109] However while the text-to-phonology route of their model has as primary function toexplain why nonwords can be pronounced our results showthat both routes can actually be active when silently readingreal words A fundamental difference is however that in ourmodel wordsrsquo semantics play a central role

43 Auditory Comprehension For the modeling of readingwe made use of letter trigrams as cues These cues abstractaway from the actual visual patterns that fall on the retinapatterns that are already transformed at the retina beforebeing sent to the visual cortex Our hypothesis is that lettertrigrams represent those high-level cells or cell assemblies inthe visual system that are critical for reading andwe thereforeleave the modeling possibly with deep learning networksof how patterns on the retina are transformed into lettertrigrams for further research

As a consequence of the high level of abstraction of thetrigrams a word form is represented by a unique vectorspecifying which of a fixed set of letter trigrams is present inthe word Although one might consider modeling auditorycomprehension with phone triplets (triphones) replacing theletter trigrams of visual comprehension such an approachwould not do justice to the enormous variability of actualspeech Whereas the modeling of reading printed wordscan depart from the assumption that the pixels of a wordrsquosletters on a computer screen are in a fixed configurationindependently of where the word is shown on the screenthe speech signal of the same word type varies from token

to token as illustrated in Figure 7 for the English wordcrisis A survey of the Buckeye corpus [110] of spontaneousconversations recorded at Columbus Ohio [111] indicatesthat around 5 of the words are spoken with one syllablemissing and that a little over 20 of words have at least onephone missing

It is widely believed that the phoneme as an abstractunit of sound is essential for coming to grips with thehuge variability that characterizes the speech signal [112ndash114]However the phoneme as theoretical linguistic constructis deeply problematic [93] and for many spoken formscanonical phonemes do not do justice to the phonetics ofthe actual sounds [115] Furthermore if words are defined assequences of phones the problem arises what representationsto posit for words with two ormore reduced variants Addingentries for reduced forms to the lexicon turns out not toafford better overal recognition [116] Although exemplarmodels have been put forward to overcome this problem[117] we take a different approach here and following Arnoldet al [18] lay out a discriminative approach to auditorycomprehension

The cues that we make use of to represent the acousticsignal are the Frequency Band Summary Features (FBSFs)introduced by Arnold et al [18] as input cues FBSFs summa-rize the information present in the spectrogram of a speechsignal The algorithm that derives FBSFs first chunks theinput at the minima of the Hilbert amplitude envelope of thesignalrsquos oscillogram (see the upper panel of Figure 8) Foreach chunk the algorithm distinguishes 21 frequency bandsin the MEL scaled spectrum and intensities are discretizedinto 5 levels (lower panel in Figure 8) for small intervalsof time For each chunk and for each frequency band inthese chunks a discrete feature is derived that specifies chunknumber frequency band number and a description of thetemporal variation in the band bringing together minimummaximum median initial and final intensity values The21 frequency bands are inspired by the 21 receptive areason the cochlear membrane that are sensitive to differentranges of frequencies [118] Thus a given FBSF is a proxyfor cell assemblies in the auditory cortex that respond to aparticular pattern of changes over time in spectral intensityThe AcousticNDLCodeR package [119] for R [120] wasemployed to extract the FBSFs from the audio files

We tested ldl on 20 hours of speech sampled from theaudio files of the ucla library broadcast newsscapedata a vast repository of multimodal TV news broadcasts

Complexity 19

0 5000 10000 15000 20000 25000 30000 35000 40000minus6000

minus4000

minus2000

0

2000

4000

6000

8000

0 5000 10000 15000 20000 25000 30000 35000 40000minus6000

minus4000

minus2000

0

2000

4000

6000

8000

Figure 7Oscillogram for twodifferent realizations of theword crisiswith different degrees of reduction in theNewsScape archive [kraız](left)and [khraısıs](right)

00 01 02 03 04 05 06time

minus020minus015minus010minus005

000005010015

ampl

itude

1

21

frequ

ency

ban

d

Figure 8 Oscillogram of the speech signal for a realization of the word economic with Hilbert envelope (in orange) outlining the signal isshown on the top panel The lower panel depicts the discretized MEL scaled spectrum of the signalThe vertical bars are the boundary pointsthat partition the signal into 4 chunks For one chunk horizontal lines (in red) highlight one of the frequency bands for which the FBSFsprovide summaries of the variation over time in that frequency band The FBSF for the highlighted band is ldquoband11-start3-median3-min2-max5-end3-part2rdquo

provided to us by the Distributed Little Red Hen Lab Theaudio files of this resource were automatically classified asclean for relatively clean parts where there is speech withoutbackground noise or music and noisy for speech snippetswhere background noise or music is present Here we reportresults for 20 hours of clean speech to a total of 131673word tokens (representing 4779word types) with in all 40639distinct FBSFs

The FBSFs for the word tokens are brought together ina matrix C119886 with dimensions 131673 audio tokens times 40639FBSFs The targeted semantic vectors are taken from the Smatrix which is expanded to a matrix with 131673 rows onefor each audio token and 4609 columns the dimension of

the semantic vectors Although the transformation matrix Fcould be obtained by calculating C1015840S the calculation of C1015840is numerically expensive To reduce computational costs wecalculated F as follows (the transpose of a square matrix Xdenoted by X119879 is obtained by replacing the upper triangleof the matrix by the lower triangle and vice versa Thus( 3 87 2 )119879 = ( 3 78 2 ) For a nonsquare matrix the transpose isobtained by switching rows and columnsThus a 4times2 matrixbecomes a 2 times 4 matrix when transposed)

CF = S

C119879CF = C119879S

20 Complexity

(C119879C)minus1 (C119879C) F = (C119879C)minus1 C119879SF = (C119879C)minus1 C119879S

(17)

In this way matrix inversion is required for a much smallermatrixC119879C which is a square matrix of size 40639 times 40639

To evaluate model performance we compared the esti-mated semantic vectors with the targeted semantic vectorsusing the Pearson correlation We therefore calculated the131673 times 131673 correlation matrix for all possible pairs ofestimated and targeted semantic vectors Precisionwas calcu-lated by comparing predicted vectors with the gold standardprovided by the targeted vectors Recognition was defined tobe successful if the correlation of the predicted vector withthe targeted gold vector was the highest of all the pairwisecorrelations of this predicted vector with any of the other goldsemantic vectors Precision defined as the proportion of cor-rect recognitions divided by the total number of audio tokenswas at 3361 For the correctly identified words the meancorrelation was 072 and for the incorrectly identified wordsit was 055 To place this in perspective a naive discrimi-nation model with discrete lexomes as output performed at12 and a deep convolution network Mozilla DeepSpeech(httpsgithubcommozillaDeepSpeech based onHannun et al [9]) performed at 6 The low performanceofMozilla DeepSpeech is due primarily due to its dependenceon a languagemodelWhenpresentedwith utterances insteadof single words it performs remarkably well

44 Discussion A problem with naive discriminative learn-ing that has been puzzling for a long time is that measuresbased on ndl performed well as predictors of processingtimes [54 58] whereas accuracy for lexical decisions waslow An accuracy of 27 for a model performing an 11480-classification task is perhaps reasonable but the lack ofprecision is unsatisfactory when the goal is to model humanvisual lexicality decisions Bymoving fromndl to ldlmodelaccuracy is substantially improved (to 59) At the sametime predictions for reaction times improved considerablyas well As lexicality decisions do not require word identifica-tion further improvement in predicting decision behavior isexpected to be possible by considering not only whether thepredicted semantic is closest to the targeted vector but alsomeasures such as how densely the space around the predictedsemantic vector is populated

Accuracy for auditory comprehension is lower for thedata we considered above at around 33 Interestingly aseries of studies indicates that recognizing isolated wordstaken out of running speech is a nontrivial task also forhuman listeners [121ndash123] Correct identification by nativespeakers of 1000 randomly sampled word tokens from aGerman corpus of spontaneous conversational speech rangedbetween 21 and 44 [18] For both human listeners andautomatic speech recognition systems recognition improvesconsiderably when words are presented in their naturalcontext Given that ldl with FBSFs performs very wellon isolated word recognition it seems worth investigating

further whether the present approach can be developed into afully-fledged model of auditory comprehension that can takefull utterances as input For a blueprint of how we plan toimplement such a model see Baayen et al [124]

5 Speech Production

This section examines whether we can predict wordsrsquo formsfrom the semantic vectors of S If this is possible for thepresent dataset with reasonable accuracy we have a proofof concept that discriminative morphology is feasible notonly for comprehension but also for speech productionThe first subsection introduces the computational imple-mentation The next subsection reports on the modelrsquosperformance which is evaluated first for monomorphemicwords then for inflected words and finally for derivedwords Section 53 provides further evidence for the pro-duction network by showing that as the support from thesemantics for the triphones becomes weaker the amount oftime required for articulating the corresponding segmentsincreases

51 Computational Implementation For a productionmodelsome representational format is required for the output thatin turn drives articulation In what follows we make useof triphones as output features Triphones capture part ofthe contextual dependencies that characterize speech andthat render problematic the phoneme as elementary unit ofa phonological calculus [93] Triphones are in many waysnot ideal in that they inherit the limitations that come withdiscrete units Other output features structured along thelines of gestural phonology [125] or time series of movementsof key articulators registered with electromagnetic articulog-raphy or ultrasound are on our list for further explorationFor now we use triphones as a convenience construct andwe will show that given the support from the semantics forthe triphones the sequence of phones can be reconstructedwith high accuracy As some models of speech productionassemble articulatory targets from phone segments (eg[126]) the present implementation can be seen as a front-endfor this type of model

Before putting the model to the test we first clarify theway the model works by means of our toy lexicon with thewords one two and three Above we introduced the semanticmatrix S (13) which we repeat here for convenience

S = one two threeonetwothree

( 100201031001

040110 ) (18)

as well as an C indicator matrix specifying which triphonesoccur in which words (12) As in what follows this matrixspecifies the triphones targeted for productionwe henceforthrefer to this matrix as the Tmatrix

Complexity 21

T

= wV wVn Vn tu tu Tr Tri rionetwothree

( 100100

100010

010001

001001 ) (19)

For production our interest is in thematrixG that transformsthe row vectors of S into the row vectors of T ie we need tosolve

SG = T (20)

Given G we can predict for any semantic vector s the vectorof triphones t that quantifies the support for the triphonesprovided by s simply by multiplying s with G

t = sG (21)

As before the transformation matrix G is straightforwardto estimate Let S1015840 denote the Moore-Penrose generalizedinverse of S Since

S1015840SG = S1015840T (22)

we have

G = S1015840T (23)

Given G we can predict the triphone matrix T from thesemantic matrix S

SG = T (24)

For the present example the inverse of S and S1015840 is

S1015840 = one two threeonetwothree

( 110minus021minus009minus029107minus008

minus041minus002104 ) (25)

and the transformation matrix G is

G = wV wVn Vn tu tu Tr Tri ri

onetwothree

( 110minus021minus009110minus021minus009

110minus021minus009minus029107minus008

minus029107minus008minus041minus002104

minus041minus002104minus041minus002104 ) (26)

For this simple example T is virtually identical to T Forrealistic data T will not be identical to T but will be anapproximation of it that is optimal in the least squares senseThe triphones with the strongest support are expected to bethe most likely to be the triphones defining a wordrsquos form

We made use of the S matrix which we derived fromthe tasa corpus as described in Section 3 The majority ofcolumns of the 23561 times 23561 matrix S show very smalldeviations from zero and hence are uninformative As beforewe reduced the number of columns of S by removing columnswith very low variance Here one option is to remove allcolumns with a variance below a preset threshold 120579 Howeverto avoid adding a threshold as a free parameter we setthe number of columns retained to the number of differenttriphones in a given dataset a number which is around 119899 =4500 In summary S denotes a 119908times119899 matrix that specifies foreach of 119908 words an 119899-dimensional semantic vector

52 Model Performance

521 Performance on Monolexomic Words We first exam-ined model performance on a dataset comprising monolex-omicwords that did not carry any inflectional exponentsThisdataset of 3987words comprised three irregular comparatives

(elder less andmore) two irregular superlatives (least most)28 irregular past tense forms and one irregular past participle(smelt) For this set of words we constructed a 3987 times 4446matrix of semantic vectors S119898 (a submatrix of the S matrixintroduced above in Section 3) and a 3987 times 4446 triphonematrix T119898 (a submatrix of T) The number of colums of S119898was set to the number of different triphones (the columndimension of T119898) Those column vectors of S119898 were retainedthat had the 4446 highest variances We then estimated thetransformation matrix G and used this matrix to predict thetriphones that define wordsrsquo forms

We evaluated model performance in two ways First weinspected whether the triphones with maximal support wereindeed the targeted triphones This turned out to be the casefor all words Targeted triphones had an activation valueclose to one and nontargeted triphones an activation closeto zero As the triphones are not ordered we also investi-gated whether the sequence of phones could be constructedcorrectly for these words To this end we set a thresholdof 099 extracted all triphones with an activation exceedingthis threshold and used the all simple paths (a path issimple if the vertices it visits are not visited more than once)function from the igraph package [127] to calculate all pathsstarting with any left-edge triphone in the set of extractedtriphones From the resulting set of paths we selected the

22 Complexity

longest path which invariably was perfectly aligned with thesequence of triphones that defines wordsrsquo forms

We also evaluated model performance with a secondmore general heuristic algorithm that also makes use of thesame algorithm from graph theory Our algorithm sets up agraph with vertices collected from the triphones that are bestsupported by the relevant semantic vectors and considers allpaths it can find that lead from an initial triphone to a finaltriphoneThis algorithm which is presented inmore detail inthe appendix and which is essential for novel complex wordsproduced the correct form for 3982 out of 3987 words Itselected a shorter form for five words int for intent lin forlinnen mis for mistress oint for ointment and pin for pippinThe correct forms were also found but ranked second due toa simple length penalty that is implemented in the algorithm

From these analyses it is clear that mapping nearly4000 semantic vectors on their corresponding triphone pathscan be accomplished with very high accuracy for Englishmonolexomic words The question to be addressed nextis how well this approach works for complex words Wefirst address inflected forms and limit ourselves here to theinflected variants of the present set of 3987 monolexomicwords

522 Performance on Inflected Words Following the classicdistinction between inflection and word formation inflectedwords did not receive semantic vectors of their own Never-theless we can create semantic vectors for inflected words byadding the semantic vector of an inflectional function to thesemantic vector of its base However there are several ways inwhich themapping frommeaning to form for inflectedwordscan be set up To explain this we need some further notation

Let S119898 and T119898 denote the submatrices of S and T thatcontain the semantic and triphone vectors of monolexomicwords Assume that a subset of 119896 of these monolexomicwords is attested in the training corpus with inflectionalfunction 119886 Let T119886 denote the matrix with the triphonevectors of these inflected words and let S119886 denote thecorresponding semantic vectors To obtain S119886 we take thepertinent submatrix S119898119886 from S119898 and add the semantic vectors119886 of the affix

S119886 = S119898119886 + i⨂ s119886 (27)

Here i is a unit vector of length 119896 and ⨂ is the generalizedKronecker product which in (27) stacks 119896 copies of s119886 row-wise As a first step we could define a separate mapping G119886for each inflectional function 119886

S119886G119886 = T119886 (28)

but in this set-up learning of inflected words does not benefitfrom the knowledge of the base words This can be remediedby a mapping for augmented matrices that contain the rowvectors for both base words and inflected words[S119898

S119886]G119886 = [T119898

T119886] (29)

The dimension of G119886 (length of semantic vector by lengthof triphone vector) remains the same so this option is notmore costly than the preceding one Nevertheless for eachinflectional function a separate large matrix is required Amuch more parsimonious solution is to build augmentedmatrices for base words and all inflected words jointly

[[[[[[[[[[S119898S1198861S1198862S119886119899

]]]]]]]]]]G = [[[[[[[[[[

T119898T1198861T1198862T119886119899

]]]]]]]]]](30)

The dimension of G is identical to that of G119886 but now allinflectional functions are dealt with by a single mappingIn what follows we report the results obtained with thismapping

We selected 6595 inflected variants which met the cri-terion that the frequency of the corresponding inflectionalfunction was at least 50 This resulted in a dataset with 91comparatives 97 superlatives 2401 plurals 1333 continuousforms (eg walking) 859 past tense forms and 1086 formsclassed as perfective (past participles) as well as 728 thirdperson verb forms (eg walks) Many forms can be analyzedas either past tenses or as past participles We followed theanalyses of the treetagger which resulted in a dataset inwhichboth inflectional functions are well-attested

Following (30) we obtained an (augmented) 10582 times5483 semantic matrix S whereas before we retained the 5483columns with the highest column variance The (augmented)triphone matrix T for this dataset had the same dimensions

Inspection of the activations of the triphones revealedthat targeted triphones had top activations for 85 of themonolexomic words and 86 of the inflected words Theproportion of words with at most one intruding triphone was97 for both monolexomic and inflected words The graph-based algorithm performed with an overall accuracy of 94accuracies broken down bymorphology revealed an accuracyof 99 for the monolexomic words and an accuracy of 92for inflected words One source of errors for the productionalgorithm is inconsistent coding in the celex database Forinstance the stem of prosper is coded as having a final schwafollowed by r but the inflected forms are coded without ther creating a mismatch between a partially rhotic stem andcompletely non-rhotic inflected variants

We next put model performance to a more stringent testby using 10-fold cross-validation for the inflected variantsFor each fold we trained on all stems and 90 of all inflectedforms and then evaluated performance on the 10of inflectedforms that were not seen in training In this way we can ascer-tain the extent to which our production system (network plusgraph-based algorithm for ordering triphones) is productiveWe excluded from the cross-validation procedure irregularinflected forms forms with celex phonological forms withinconsistent within-paradigm rhoticism as well as forms thestem of which was not available in the training set Thus

Complexity 23

cross-validation was carried out for a total of 6236 inflectedforms

As before the semantic vectors for inflected words wereobtained by addition of the corresponding content and inflec-tional semantic vectors For each training set we calculatedthe transformationmatrixG from the S andTmatrices of thattraining set For an out-of-bag inflected form in the test setwe calculated its semantic vector and multiplied this vectorwith the transformation matrix (using (21)) to obtain thepredicted triphone vector t

The proportion of forms that were predicted correctlywas 062 The proportion of forms ranked second was 025Forms that were incorrectly ranked first typically were otherinflected forms (including bare stems) that happened toreceive stronger support than the targeted form Such errorsare not uncommon in spoken English For instance inthe Buckeye corpus [110] closest is once reduced to closFurthermore Dell [90] classified forms such as concludementfor conclusion and he relax for he relaxes as (noncontextual)errors

The algorithm failed to produce the targeted form for3 of the cases Examples of the forms produced instead arethe past tense for blazed being realized with [zId] insteadof [zd] the plural mouths being predicted as having [Ts] ascoda rather than [Ds] and finest being reduced to finst Thevoiceless production formouth does not follow the dictionarynorm but is used as attested by on-line pronunciationdictionaries Furthermore the voicing alternation is partlyunpredictable (see eg [128] for final devoicing in Dutch)and hence model performance here is not unreasonable Wenext consider model accuracy for derived words

523 Performance on Derived Words When building thevector space model we distinguished between inflectionand word formation Inflected words did not receive theirown semantic vectors By contrast each derived word wasassigned its own lexome together with a lexome for itsderivational function Thus happiness was paired with twolexomes happiness and ness Since the semantic matrix forour dataset already contains semantic vectors for derivedwords we first investigated how well forms are predictedwhen derived words are assessed along with monolexomicwords without any further differentiation between the twoTo this end we constructed a semantic matrix S for 4885words (rows) by 4993 (columns) and constructed the trans-formation matrix G from this matrix and the correspondingtriphone matrix T The predicted form vectors of T =SG supported the targeted triphones above all other tri-phones without exception Furthermore the graph-basedalgorithm correctly reconstructed 99 of all forms withonly 5 cases where it assigned the correct form secondrank

Next we inspected how the algorithm performs whenthe semantic vectors of derived forms are obtained fromthe semantic vectors of their base words and those of theirderivational lexomes instead of using the semantic vectors ofthe derived words themselves To allow subsequent evalua-tion by cross-validation we selected those derived words that

contained an affix that occured at least 30 times in our data set(again (38) agent (177) ful (45) instrument (82) less(54) ly (127) and ness (57)) to a total of 770 complex words

We first combined these derived words with the 3987monolexomic words For the resulting 4885 words the Tand S matrices were constructed from which we derivedthe transformation matrix G and subsequently the matrixof predicted triphone strengths T The proportion of wordsfor which the targeted triphones were the best supportedtriphones was 096 and the graph algorithm performed withan accuracy of 989

In order to assess the productivity of the system weevaluated performance on derived words by means of 10-fold cross-validation For each fold we made sure that eachaffix was present proportionally to its frequency in the overalldataset and that a derived wordrsquos base word was included inthe training set

We first examined performance when the transformationmatrix is estimated from a semantic matrix that contains thesemantic vectors of the derived words themselves whereassemantic vectors for unseen derived words are obtained bysummation of the semantic vectors of base and affix It turnsout that this set-up results in a total failure In the presentframework the semantic vectors of derived words are tooidiosyncratic and too finely tuned to their own collocationalpreferences They are too scattered in semantic space tosupport a transformation matrix that supports the triphonesof both stem and affix

We then used exactly the same cross-validation procedureas outlined above for inflected words constructing semanticvectors for derived words from the semantic vectors of baseand affix and calculating the transformation matrix G fromthese summed vectors For unseen derivedwordsGwas usedto transform the semantic vectors for novel derived words(obtained by summing the vectors of base and affix) intotriphone space

For 75 of the derived words the graph algorithmreconstructed the targeted triphones For 14 of the derivednouns the targeted form was not retrieved These includecases such as resound which the model produced with [s]instead of the (unexpected) [z] sewer which the modelproduced with R instead of only R and tumbler where themodel used syllabic [l] instead of nonsyllabic [l] given in thecelex target form

53 Weak Links in the Triphone Graph and Delays in SpeechProduction An important property of the model is that thesupport for trigrams changes where a wordrsquos graph branchesout for different morphological variants This is illustratedin Figure 9 for the inflected form blending Support for thestem-final triphone End is still at 1 but then blend can endor continue as blends blended or blending The resultinguncertainty is reflected in the weights on the edges leavingEnd In this example the targeted ing form is driven by theinflectional semantic vector for continuous and hence theedge to ndI is best supported For other forms the edgeweights will be different and hence other paths will be bettersupported

24 Complexity

1

1

1051

024023

03

023

1

001 001

1

02

03

023

002

1 0051

bl

blE

lEn

EndndI

dIN

IN

INI

NIN

dId

Id

IdI

dII

IIN

ndz

dz

dzI

zIN

nd

EnI

nIN

lEI

EIN

Figure 9 The directed graph for blending Vertices represent triphones including triphones such as IIN that were posited by the graphalgorithm to bridge potentially relevant transitions that are not instantiated in the training data Edges are labelled with their edge weightsThe graph the edge weights of which are specific to the lexomes blend and continuous incorporates not only the path for blending butalso the paths for blend blends and blended

The uncertainty that arises at branching points in thetriphone graph is of interest in the light of several exper-imental results For instance inter keystroke intervals intyping become longer at syllable and morph boundaries[129 130] Evidence from articulography suggests variabilityis greater at morph boundaries [131] Longer keystroke exe-cution times and greater articulatory variability are exactlywhat is expected under reduced edge support We thereforeexamined whether the edge weights at the first branchingpoint are predictive for lexical processing To this end weinvestigated the acoustic duration of the segment at the centerof the first triphone with a reduced LDL edge weight (in thepresent example d in ndI henceforth lsquobranching segmentrsquo)for those words in our study that are attested in the Buckeyecorpus [110]

The dataset we extracted from the Buckeye corpus com-prised 15105 tokens of a total of 1327 word types collectedfrom 40 speakers For each of these words we calculatedthe relative duration of the branching segment calculatedby dividing segment duration by word duration For thepurposes of statistical evaluation the distribution of thisrelative duration was brought closer to normality bymeans ofa logarithmic transformationWe fitted a generalized additivemixed model [132] to log relative duration with randomintercepts for word and speaker log LDL edge weight aspredictor of interest and local speech rate (Phrase Rate)log neighborhood density (Ncount) and log word frequencyas control variables A summary of this model is presentedin Table 7 As expected an increase in LDL Edge Weight

goes hand in hand with a reduction in the duration of thebranching segment In other words when the edge weight isreduced production is slowed

The adverse effects of weaknesses in a wordrsquos path inthe graph where the path branches is of interest againstthe discussion about the function of weak links in diphonetransitions in the literature on comprehension [133ndash138] Forcomprehension it has been argued that bigram or diphonelsquotroughsrsquo ie low transitional probabilities in a chain of hightransitional probabilities provide points where sequencesare segmented and parsed into their constituents Fromthe perspective of discrimination learning however low-probability transitions function in exactly the opposite wayfor comprehension [54 124] Naive discriminative learningalso predicts that troughs should give rise to shorter process-ing times in comprehension but not because morphologicaldecomposition would proceed more effectively Since high-frequency boundary bigrams and diphones are typically usedword-internally across many words they have a low cuevalidity for thesewords Conversely low-frequency boundarybigrams are much more typical for very specific base+affixcombinations and hence are better discriminative cues thatafford enhanced activation of wordsrsquo lexomes This in turngives rise to faster processing (see also Ramscar et al [78] forthe discriminative function of low-frequency bigrams in low-frequency monolexomic words)

There is remarkable convergence between the directedgraphs for speech production such as illustrated in Figure 9and computational models using temporal self-organizing

Complexity 25

Table 7 Statistics for the partial effects of a generalized additive mixed model fitted to the relative duration of edge segments in the Buckeyecorpus The trend for log frequency is positive accelerating TPRS thin plate regression spline

A parametric coefficients Estimate Std Error t-value p-valueIntercept -19165 00592 -323908 lt 00001Log LDL Edge Weight -01358 00400 -33960 00007Phrase Rate 00047 00021 22449 00248Log Ncount 01114 00189 58837 lt 00001B smooth terms edf Refdf F-value p-valueTPRS log Frequency 19007 19050 75495 00004random intercepts word 11498839 13230000 314919 lt 00001random intercepts speaker 302995 390000 345579 lt 00001

maps (TSOMs [139ndash141]) TSOMs also use error-drivenlearning and in recent work attention is also drawn toweak edges in wordsrsquo paths in the TSOM and the conse-quences thereof for lexical processing [142] We are not usingTSOMs however one reason being that in our experiencethey do not scale up well to realistically sized lexiconsA second reason is that we are pursuing the hypothesisthat form serves meaning and that self-organization ofform by itself is not necessary This hypothesis is likelytoo strong especially as we do not provide a rationale forthe trigram and triphone units that we use to representaspects of form It is worth noting that spatial organizationof form features is not restricted to TSOMs Although inour approach the spatial organization of triphone features isleft unspecified such an organization can be easily enforced(see [85] for further details) by using an algorithm suchas graphopt which self-organizes the vertices of a graphinto a two-dimensional plane Figure 9 was obtained withthis algorithm (Graphopt was developed for the layout oflarge graphs httpwwwschmuhlorggraphopt andis implemented in the igraph package Graphopt uses basicprinciples of physics to iteratively determine an optimallayout Each node in the graph is given both mass and anelectric charge and edges between nodes are modeled asspringsThis sets up a system inwhich there are attracting andrepelling forces between the vertices of the graph and thisphysical system is simulated until it reaches an equilibrium)

54 Discussion Our production network reconstructsknown wordsrsquo forms with a high accuracy 999 formonolexomic words 92 for inflected words and 99 forderived words For novel complex words accuracy under10-fold cross-validation was 62 for inflected words and 75for derived words

The drop in performance for novel forms is perhapsunsurprising given that speakers understand many morewords than they themselves produce even though they hearnovel forms on a fairly regular basis as they proceed throughlife [79 143]

However we also encountered several technical problemsthat are not due to the algorithm but to the representationsthat the algorithm has had to work with First it is surprisingthat accuracy is as high as it is given that the semanticvectors are constructed from a small corpus with a very

simple discriminative algorithm Second we encounteredinconsistencies in the phonological forms retrieved from thecelex database inconsistencies that in part are due to theuse of discrete triphones Third many cases where the modelpredictions do not match the targeted triphone sequence thetargeted forms have a minor irregularity (eg resound with[z] instead of [s]) Fourth several of the typical errors that themodel makes are known kinds of speech errors or reducedforms that one might encounter in engaged conversationalspeech

It is noteworthy that themodel is almost completely data-driven The triphones are derived from celex and otherthan somemanual corrections for inconsistencies are derivedautomatically from wordsrsquo phone sequences The semanticvectors are based on the tasa corpus and were not in any wayoptimized for the production model Given the triphone andsemantic vectors the transformation matrix is completelydetermined No by-hand engineering of rules and exceptionsis required nor is it necessary to specify with hand-codedlinks what the first segment of a word is what its secondsegment is etc as in the weaver model [37] It is only in theheuristic graph algorithm that three thresholds are requiredin order to avoid that graphs become too large to remaincomputationally tractable

The speech production system that emerges from thisapproach comprises first of all an excellent memory forforms that have been encountered before Importantly thismemory is not a static memory but a dynamic one It is not arepository of stored forms but it reconstructs the forms fromthe semantics it is requested to encode For regular unseenforms however a second network is required that projects aregularized semantic space (obtained by accumulation of thesemantic vectors of content and inflectional or derivationalfunctions) onto the triphone output space Importantly itappears that no separate transformations or rules are requiredfor individual inflectional or derivational functions

Jointly the two networks both of which can also betrained incrementally using the learning rule ofWidrow-Hoff[44] define a dual route model attempts to build a singleintegrated network were not successfulThe semantic vectorsof derived words are too idiosyncratic to allow generalizationfor novel forms It is the property of the semantic vectorsof derived lexomes described in Section 323 namely thatthey are close to but not inside the cloud of their content

26 Complexity

lexomes which makes them productive Novel forms do notpartake in the idiosyncracies of lexicalized existing wordsbut instead remain bound to base and derivational lexomesIt is exactly this property that makes them pronounceableOnce produced a novel form will then gravitate as experi-ence accumulates towards the cloud of semantic vectors ofits morphological category meanwhile also developing itsown idiosyncracies in pronunciation including sometimeshighly reduced pronunciation variants (eg tyk for Dutchnatuurlijk ([natyrlk]) [111 144 145] It is perhaps possibleto merge the two routes into one but for this much morefine-grained semantic vectors are required that incorporateinformation about discourse and the speakerrsquos stance withrespect to the adressee (see eg [115] for detailed discussionof such factors)

6 Bringing in Time

Thus far we have not considered the role of time The audiosignal comes in over time longerwords are typically readwithmore than one fixation and likewise articulation is a temporalprocess In this section we briefly outline how time can bebrought into the model We do so by discussing the readingof proper names

Proper names (morphologically compounds) pose achallenge to compositional theories as the people referredto by names such as Richard Dawkins Richard NixonRichardThompson and SandraThompson are not obviouslysemantic composites of each other We therefore assignlexomes to names irrespective of whether the individu-als referred to are real or fictive alive or dead Further-more we assume that when the personal name and familyname receive their own fixations both names are under-stood as pertaining to the same named entity which istherefore coupled with its own unique lexome Thus forthe example sentences in Table 8 the letter trigrams ofJohn are paired with the lexome JohnClark and like-wise the letter trigrams of Clark are paired with this lex-ome

By way of illustration we obtained thematrix of semanticvectors training on 100 randomly ordered tokens of thesentences of Table 8 We then constructed a trigram cuematrix C specifying for each word (John Clark wrote )which trigrams it contains In parallel a matrix L specify-ing for each word its corresponding lexomes (JohnClarkJohnClark write ) was set up We then calculatedthe matrix F by solving CF = L and used F to cal-culate estimated (predicted) semantic vectors L Figure 10presents the correlations of the estimated semantic vectorsfor the word forms John John Welsch Janet and Clarkwith the targeted semantic vectors for the named entitiesJohnClark JohnWelsch JohnWiggam AnneHastieand JaneClark For the sequentially read words John andWelsch the semantic vector generated by John and thatgenerated by Welsch were summed to obtain the integratedsemantic vector for the composite name

Figure 10 upper left panel illustrates that upon readingthe personal name John there is considerable uncertainty

about which named entity is at issue When subsequentlythe family name is read (upper right panel) uncertainty isreduced and John Welsch now receives full support whereasother named entities have correlations close to zero or evennegative correlations As in the small world of the presenttoy example Janet is a unique personal name there is nouncertainty about what named entity is at issue when Janetis read For the family name Clark on its own by contrastJohn Clark and Janet Clark are both viable options

For the present example we assumed that every ortho-graphic word received one fixation However depending onwhether the eye lands far enough into the word namessuch as John Clark and compounds such as graph theorycan also be read with a single fixation For single-fixationreading learning events should comprise the joint trigrams ofthe two orthographic words which are then simultaneouslycalibrated against the targeted lexome (JohnClark graph-theory)

Obviously the true complexities of reading such asparafoveal preview are not captured by this example Never-theless as a rough first approximation this example showsa way forward for integrating time into the discriminativelexicon

7 General Discussion

We have presented a unified discrimination-driven modelfor visual and auditory word comprehension and wordproduction the discriminative lexicon The layout of thismodel is visualized in Figure 11 Input (cochlea and retina)and output (typing and speaking) systems are presented inlight gray the representations of themodel are shown in darkgray For auditory comprehension we make use of frequencyband summary (FBS) features low-level cues that we deriveautomatically from the speech signal The FBS features serveas input to the auditory network F119886 that maps vectors of FBSfeatures onto the semantic vectors of S For reading lettertrigrams represent the visual input at a level of granularitythat is optimal for a functional model of the lexicon (see [97]for a discussion of lower-level visual features) Trigrams arerelatively high-level features compared to the FBS featuresFor features implementing visual input at a much lower levelof visual granularity using histograms of oriented gradients[146] see Linke et al [15] Letter trigrams are mapped by thevisual network F119900 onto S

For reading we also implemented a network heredenoted by K119886 that maps trigrams onto auditory targets(auditory verbal images) represented by triphones Theseauditory triphones in turn are mapped by network H119886onto the semantic vectors S This network is motivated notonly by our observation that for predicting visual lexicaldecision latencies triphones outperform trigrams by a widemargin Network H119886 is also part of the control loop forspeech production see Hickok [147] for discussion of thiscontrol loop and the necessity of auditory targets in speechproduction Furthermore the acoustic durations with whichEnglish speakers realize the stem vowels of English verbs[148] as well as the duration of word final s in English [149]

Complexity 27

Table 8 Example sentences for the reading of proper names To keep the example simple inflectional lexomes are not taken into account

John Clark wrote great books about antsJohn Clark published a great book about dangerous antsJohn Welsch makes great photographs of airplanesJohn Welsch has a collection of great photographs of airplanes landingJohn Wiggam will build great harpsichordsJohn Wiggam will be a great expert in tuning harpsichordsAnne Hastie has taught statistics to great second year studentsAnne Hastie has taught probability to great first year studentsJanet Clark teaches graph theory to second year studentsJohnClark write great book about antJohnClark publish a great book about dangerous antJohnWelsch make great photograph of airplaneJohnWelsch have a collection of great photograph airplane landingJohnWiggam future build great harpsichordJohnWiggam future be a great expert in tuning harpsichordAnneHastie have teach statistics to great second year studentAnneHastie have teach probability to great first year studentJanetClark teach graphtheory to second year student

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

John

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

John Welsch

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

Janet

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

Clark

Pear

son

corr

elat

ion

minus0

50

00

51

0

Figure 10 Correlations of estimated semantic vectors for John JohnWelsch Janet andClarkwith the targeted semantic vectors of JohnClarkJohnWelsch JohnWiggam AnneHastie and JaneClark

can be predicted from ndl networks using auditory targets todiscriminate between lexomes [150]

Speech production is modeled by network G119886 whichmaps semantic vectors onto auditory targets which in turnserve as input for articulation Although current modelsof speech production assemble articulatory targets fromphonemes (see eg [126 147]) a possibility that is certainlycompatible with our model when phones are recontextual-ized as triphones we think that it is worthwhile investigatingwhether a network mapping semantic vectors onto vectors ofarticulatory parameters that in turn drive the articulators canbe made to work especially since many reduced forms arenot straightforwardly derivable from the phone sequences oftheir lsquocanonicalrsquo forms [111 144]

We have shown that the networks of our model havereasonable to high production and recognition accuraciesand also generate a wealth of well-supported predictions forlexical processing Central to all networks is the hypothesisthat the relation between form and meaning can be modeleddiscriminatively with large but otherwise surprisingly simplelinear networks the underlyingmathematics of which (linearalgebra) is well-understood Given the present results it isclear that the potential of linear algebra for understandingthe lexicon and lexical processing has thus far been severelyunderestimated (the model as outlined in Figure 11 is amodular one for reasons of computational convenience andinterpretabilityThe different networks probably interact (seefor some evidence [47]) One such interaction is explored

28 Complexity

typing speaking

orthographic targets

trigrams

auditory targets

triphones

semantic vectors

TASA

auditory cues

FBS features

orthographic cues

trigrams

cochlea retina

To Ta

Ha

GoGa

KaS

Fa Fo

CoCa

Figure 11 Overview of the discriminative lexicon Input and output systems are presented in light gray the representations of the model areshown in dark gray Each of these representations is a matrix with its own set of features Arcs are labeled with the discriminative networks(transformationmatrices) thatmapmatrices onto each otherThe arc from semantic vectors to orthographic vectors is in gray as thismappingis not explored in the present study

in Baayen et al [85] In their production model for Latininflection feedback from the projected triphone paths backto the semantics (synthesis by analysis) is allowed so thatof otherwise equally well-supported paths the path that bestexpresses the targeted semantics is selected For informationflows between systems in speech production see Hickok[147]) In what follows we touch upon a series of issues thatare relevant for evaluating our model

Incremental learning In the present study we esti-mated the weights on the connections of these networkswith matrix operations but importantly these weights canalso be estimated incrementally using the learning rule ofWidrow and Hoff [44] further improvements in accuracyare expected when using the Kalman filter [151] As allmatrices can be updated incrementally (and this holds aswell for the matrix with semantic vectors which are alsotime-variant) and in theory should be updated incrementally

whenever information about the order of learning events isavailable the present theory has potential for studying lexicalacquisition and the continuing development of the lexiconover the lifetime [79]

Morphology without compositional operations Wehave shown that our networks are predictive for a rangeof experimental findings Important from the perspective ofdiscriminative linguistics is that there are no compositionaloperations in the sense of Frege and Russell [1 2 152] Thework on compositionality in logic has deeply influencedformal linguistics (eg [3 153]) and has led to the beliefthat the ldquoarchitecture of the language facultyrdquo is groundedin a homomorphism between a calculus (or algebra) ofsyntactic representations and a calculus (or algebra) based onsemantic primitives Within this tradition compositionalityarises when rules combining representations of form arematched with rules combining representations of meaning

Complexity 29

The approach of Marelli and Baroni [60] who derive thesemantic vector of a complex word from the semantic vectorof its base and a dedicated affix-specific mapping from thepertinent base words to the corresponding derived wordsis in the spirit of compositionality in that a change in form(through affixation) goes hand in hand with a change inmeaning (modeled with a linear mapping in semantic space)

The perspective on morphological processing developedin the present study breaks with this tradition Althoughsemantic vectors for inflected words are obtained by summa-tion of the semantic vectors of the base word and those ofits inflectional functions (see [85] for modeling of the richverbal paradigms of Latin) the form vectors correspondingto the resulting semantic vectors are not obtained by thesummation of vectors for chunks of form nor by any otheroperations on forms Attempts to align chunks of wordsrsquo formwith their semantic structures mdash typically represented bygraphs constructed over semantic primitives (see eg [154])mdash are destined to end in quagmires of arbitrary decisionsabout the number of inflectional classes required (Estonian30 or 300) what the chunks should look like (does Germanhave a morpheme d in its articles and demonstratives derdie das diese ) whether Semitic morphology requiresoperations on form on one or two tiers [155ndash157] andhow many affixal position slots are required for Athebascanlanguages such asNavajo [158] A language that is particularlypainful for compositional theories is Yelı Dnye an Indo-Pacific language spoken on Rossel island (in the south-east ofPapuaNewGuinea)The language has a substantial inventoryof over a 1000 function words used for verb agreementTypically these words are monosyllables that simultaneouslyspecify values for negation tense aspect person and numberof subject deixis evidentiality associated motion and coun-terfactuality Furthermore verbs typically have no less thaneight different stems the use of which depends on particularconstellations of inflectional functions [159 160] Yelı Dnyethus provides a striking counterexample to the hypothesisthat a calculus of meaning would be paralleled by a calculusof form

Whereas the theory of linear discriminative learningradically rejects the idea that morphological processing isgrounded in compositionality as defined in mainstream for-mal linguistic theory it does allow for the conjoint expressionin form of lexical meanings and inflectional functions andfor forms to map onto such conjoint semantic vectors Butjust asmultiple inflectional functions can be expressed jointlyin a single word multiple words can map onto nonconjointsemantic vectors as was illustrated above for named entityrecognition John Wiggam can be understood to refer to avery specific person without this person being a composi-tional function of the proper name John and the surnameWiggam In both these cases there is no isomorphy betweeninflectional functions and lexical meanings on the one handand chunks of form on the other hand

Of course conjoint vectors for lexical meanings andinflectional functions can be implemented andmade to workonly because a Latin verb form such as sapıvissemus isunderstood when building the semantic space S to expresslexomes for person (first) number (plural) voice (active)

tense (pluperfect) mood (subjunctive) and lexical meaning(to know) (see [85] for computational modeling details)Although this could loosely be described as a decomposi-tional approach to inflected words we prefer not to use theterm lsquodecompositionalrsquo as in formal linguistics this termas outlined above has a very different meaning A moreadequate terminology would be that the system is conjointrealizational for production and conjoint inferential for com-prehension (we note here that we have focused on temporallyconjoint realization and inference leaving temporally disjointrealization and inference for sequences of words and possiblycompounds and fixed expression for further research)

Naive versus linear discriminative learning There isone important technical difference between the learningengine of the current study ldl and naive discriminativelearning (ndl) Because ndl makes the simplifying assump-tion that outcomes are orthogonal the weights from the cuesto a given outcome are independent of the weights from thecues to the other outcomes This independence assumptionwhich motivates the word naive in naive discriminativelearning makes it possible to very efficiently update networkweights during incremental learning In ldl by contrast thisindependence no longer holds learning is no longer naiveWe therefore refer to our networks not as ndl networksbut simply as linear discriminative learning networks Actualincremental updating of ldl networks is computationallymore costly but further numerical optimization is possibleand implementations are underway

Scaling An important property of the present approachis that good results are obtained already for datasets ofmodestsize Our semantic matrix is derived from the tasa corpuswhich with only 10 million words is dwarfed by the 28 billionwords of the ukWaC corpus used by Marelli and Baroni [60]more words than any individual speaker can ever encounterin their lifetime Similarly Arnold et al [18] report goodresults for word identification when models are trained on20 hours of speech which contrasts with the huge volumesof speech required for training deep learning networks forspeech recognition Although performance increases some-whatwithmore training data [161] it is clear that considerableheadway can be made with relatively moderate volumesof speech It thus becomes feasible to train models onfor instance Australian English and New Zealand Englishusing already existing speech corpora and to implementnetworks that make precise quantitative predictions for howunderstanding is mediated by listenersrsquo expectations abouttheir interlocutors (see eg [162])

In this study we have obtained good results with asemantic matrix with a dimensionality of around 4000lexomes By itself we find it surprising that a semantic vectorspace can be scaffolded with a mere 4000 words But thisfindingmay also shed light on the phenomenon of childhoodamnesia [163 164] Bauer and Larkina [165] pointed outthat young children have autobiographical memories thatthey may not remember a few years later They argue thatthe rate of forgetting diminishes as we grow older andstabilizes in adulthood Part of this process of forgettingmay relate to the changes in the semantic matrix Cruciallywe expect the correlational structure of lexomes to change

30 Complexity

considerably over time as experience accumulates If this isindeed the case the semantic vectors for the lexomes foryoung children are different from the corresponding lexomesfor older children which will again differ from those ofadults As a consequence autobiographical memories thatwere anchored to the lexomes at a young age can no longerbe accessed as the semantic vectors to which these memorieswere originally anchored no longer exist

No examplars Given how we estimate the transforma-tion matrices with which we navigate between form andmeaning it might seem that our approach is in essence animplementation of an exemplar-based theory of the mentallexicon After all for instance the C119886 matrix specifies foreach auditory word form (exemplar) its pertinent frequencyband summary features However the computational imple-mentation of the present study should not be confused withthe underlying algorithmic conceptualization The underly-ing conceptualization is that learning proceeds incrementallytrial by trial with the weights of transformation networksbeing updated with the learning rule of Widrow and Hoff[44] In the mean trial-by-trial updating with the Widrow-Hoff learning rule results in the same expected connectionweights as obtained by the present estimates using thegeneralized inverse The important conceptual advantage ofincremental learning is however that there is no need to storeexemplars By way of example for auditory comprehensionacoustic tokens are given with the input to the learningsystemThese tokens the ephemeral result of interactingwiththe environment leave their traces in the transformationmatrix F119886 but are not themselves stored The same holdsfor speech production We assume that speakers dynamicallyconstruct the semantic vectors from their past experiencesrather than retrieving these semantic vectors from somedictionary-like fixed representation familiar from currentfile systems The dynamic nature of these semantic vectorsemerged loud and clear from the modeling of novel complexwords for speech production under cross-validation In otherwords our hypothesis is that all input and output vectors areephemeral in the sense that they represent temporary statesof the cognitive system that are not themselves represented inthat system but that leave their traces in the transformationmatrices that jointly characterize and define the cognitivesystem that we approximate with the discriminative lexicon

This dynamic perspective on the mental lexicon asconsisting of several coupled networks that jointly bringthe cognitive system into states that in turn feed into fur-ther networks (not modeled here) for sentence integration(comprehension) and articulation (production) raises thequestion about the status of units in this theory Units playan important role when constructing numeric vectors forform and meaning Units for letter trigrams and phonetrigrams are of course context-enriched letter and phoneunits Content lexomes as well as lexomes for derivationaland inflectional functions are crutches for central semanticfunctions ranging from functions realizing relatively concreteonomasiological functions (lsquothis is a dog not a catrsquo) toabstract situational functions allowing entities events andstates to be specified on the dimensions of time spacequantity aspect etc However crucial to the central thrust of

the present study there are no morphological symbols (unitscombining form andmeaning) normorphological formunitssuch as stems and exponents of any kind in the modelAbove we have outlined the many linguistic considerationsthat have led us not to want to incorporate such units aspart of our theory Importantly as there are no hidden layersin our model it is not possible to seek for lsquosubsymbolicrsquoreflexes of morphological units in patterns of activation overhidden layers Here we depart from earlier connectionistmodels which rejected symbolic representations but retainedthe belief that there should be morphological representationalbeit subsymbolic ones Seidenberg and Gonnerman [166]for instance discuss morphological representations as beinglsquointerlevel representationsrsquo that are emergent reflections ofcorrelations among orthography phonology and semanticsThis is not to say that themore agglutinative a language is themore the present model will seem to the outside observer tobe operating withmorphemes But themore a language tendstowards fusional or polysyntheticmorphology the less strongthis impression will be

Model complexity Next consider the question of howdifficult lexical processing actually is Sidestepping the com-plexities of the neural embedding of lexical processing [126147] here we narrow down this question to algorithmiccomplexity State-of-the-art models in psychology [37 114167] implementmany layers of hierarchically organized unitsand many hold it for an established fact that such units arein some sense psychologically real (see eg [19 91 168])However empirical results can be equally well understoodwithout requiring the theoretical construct of the morpheme(see eg [47 54 58 166 169 170]) The present studyillustrates this point for lsquoboundaryrsquo effects in written andoral production When paths in the triphone graph branchuncertainty increases and processing slows down Althoughobserved lsquoboundaryrsquo effects are compatible with a post-Bloomfieldian lexicon the very same effects arise in ourmodel even though morphemes do not figure in any way inour computations

Importantly the simplicity of the linear mappings in ourmodel of the discriminative lexicon allows us to sidestep theproblem that typically arise in computationally implementedfull-scale hierarchical systems namely that errors arisingat lower levels propagate to higher levels The major stepsforward made in recent end-to-end models in languageengineering suggest that end-to-end cognitive models arealso likely to perform much better One might argue thatthe hidden layers of deep learning networks represent thetraditional lsquohidden layersrsquo of phonemes morphemes andword forms mediating between form and meaning in lin-guistic models and offshoots thereof in psychology Howeverthe case of baboon lexical learning [12 13] illustrates thatthis research strategy is not without risk and that simplerdiscriminative networks can substantially outperform deeplearning networks [15] We grant however that it is conceiv-able that model predictions will improve when the presentlinear networks are replaced by deep learning networks whenpresented with the same input and output representationsused here (see Zhao et al [171] for a discussion of lossfunctions for deep learning targeting numeric instead of

Complexity 31

categorical output vectors) What is unclear is whethersuch networks will improve our understandingmdashthe currentlinear networks offer the analyst connection weights betweeninput and output representations that are straightforwardlylinguistically interpretable Where we think deep learningnetworks really will come into their own is bridging the gapbetween for instance the cochlea and retina on the one handand the heuristic representations (letter trigrams) that wehave used as a starting point for our functional theory of themappings between form and meaning

Network flexibility An important challenge for deeplearning networks designed to model human lexical process-ing is to keep the networks flexible and open to rapid updat-ing This openness to continuous learning is required notonly by findings on categorization [40 41 41 42] includingphonetic categorization [172 173] but is also visible in timeseries of reaction times Baayen and Hendrix [17] reportedimproved prediction of visual lexical decision reaction timesin the British Lexicon Project when a naive discriminativelearning network is updated from trial to trial simulating thewithin-experiment learning that goes on as subjects proceedthrough the experiment In natural speech the phenomenonof phonetic convergence between interlocutors [174] likewisebears witness to a lexicon that is constantly recalibratedin interaction with its environment (for the many ways inwhich interlocutorsrsquo speech aligns see [175]) Likewise therapidity with which listeners adapt to foreign accents [176]within a few minutes shows that lexical networks exhibitfast local optimization As deep learning networks typicallyrequire huge numbers of cycles through the training dataonce trained they are at risk of not having the requiredflexibility for fast local adaptation This risk is reduced for(incremental) wide learning as illustrated by the phoneticconvergence evidenced by the model of Arnold et al [18] forauditory comprehension (a very similar point was made byLevelt [177] in his critique of connectionist models in thenineties)

Open questions This initial study necessarily leavesmany questions unanswered Compounds and inflectedderived words have not been addressed and morphologicaloperations such as reduplication infixation and nonconcate-native morphology provide further testing grounds for thepresent approach We also need to address the processingof compounds with multiple constituents ubiquitous inlanguages as different as German Mandarin and EstonianFurthermore in actual utterances words undergo assimila-tion at their boundaries and hence a crucial test case forthe discriminative lexicon is to model the production andcomprehension of words in multi-word utterances For ourproduction model we have also completely ignored stresssyllabification and pitch which however likely has renderedthe production task more difficult than necessary

A challenge for discriminative morphology is the devel-opment of proper semantic matrices For languages withrich morphology such as Estonian current morphologicalparsers will identify case-inflected words as nominativesgenitives or partitives (among others) but these labelsfor inflectional forms do not correspond to the semanticfunctions encoded in these forms (see eg [178 179]) What

is required are computational tools that detect the appropriateinflectional lexomes for these words in the sentences orutterances in which they are embedded

A related problem specifically for the speech productionmodel is how to predict the strongly reduced word formsthat are rampant in conversational speech [111] Our auditorycomprehension network F119886 is confronted with reduced formsin training and the study of Arnold et al [18] indicates thatpresent identification accuracymay approximate human per-formance for single-word recognition Whereas for auditorycomprehension we are making progress what is required forspeech production is a much more detailed understanding ofthe circumstances under which speakers actually use reducedword forms The factors driving reduction may be in partcaptured by the semantic vectors but we anticipate thataspects of the unfolding discourse play an important role aswell which brings us back to the unresolved question of howto best model not isolated words but words in context

An important challenge for our theory which formalizesincremental implicit learning is how to address and modelthe wealth of explicit knowledge that speakers have abouttheir language We play with the way words sound in poetrywe are perplexed by wordsrsquo meanings and reinterpret themso that they make more sense (eg the folk etymology [180]reflected in modern English crawfish a crustacean and nota fish originally Middle English crevis compare ecrevissein French) we teach morphology in schools and we evenfind that making morphological relations explicit may bebeneficial for aphasic patients [181] The rich culture of worduse cannot but interact with the implicit system but howexactly this interaction unfolds andwhat its consequences areat both sides of the consciousunconscious divide is presentlyunclear

Although many questions remain our model is remark-ably successful in capturing many aspects of lexical andmorphological processing in both comprehension and pro-duction Since themodel builds on a combination of linguisti-callymotivated lsquosmartrsquo low-level representations for form andmeaning and large but otherwise straightforward andmathe-matically well-understood two-layer linear networks (to beclear networks without any hidden layers) the conclusionseems justified that algorithmically the ldquomental lexiconrdquomaybe much simpler than previously thought

Appendix

A Vectors Matrices and MatrixMultiplication

Figure 12 presents two data points 119886 and 119887 with coordinates(3 4) and (-2 -3) Arrows drawn from the origin to thesepoints are shown in blue with a solid and dashed linerespectively We refer to these points as vectors Vectors aredenoted with lower case letters in bold and we place the 119909coordinate of a point next to the 119910 coordinate

a = (3 4) b = (minus2 minus3) (A1)

32 Complexity

minus4 minus2 0 2 4

minus4

minus2

02

4

x

y

a

b

x

y

Figure 12 Points 119886 and 119887 can be transformed into points 119901 and 119902 bya linear transformation

We can represent a and b jointly by placing them next to eachother in a matrix For matrices we use capital letters in boldfont

A = ( 3 4minus2 minus3) (A2)

In addition to the points 119886 and 119887 Figure 12 also shows twoother datapoints 119909 and 119910 The matrix for these two points is

B = (minus5 24 minus1) (A3)

Let us assume that points 119886 and 119887 represent the forms oftwo words 1205961 and 1205962 and that the points 119909 and 119910 representthe meanings of these two words (below we discuss in detailhow exactly wordsrsquo forms and meanings can be representedas vectors of numbers) As morphology is the study of therelation between wordsrsquo forms and their meanings we areinterested in how to transform points 119886 and b into points119909 and 119910 and likewise in how to transform points 119909 and119910 back into points 119886 and 119887 The first transformation isrequired for comprehension and the second transformationfor production

We begin with the transformation for comprehensionFormally we have matrix A which we want to transforminto matrix B A way of mapping A onto B is by means ofa linear transformation using matrix multiplication For 2 times2 matrices matrix multiplication is defined as follows(1198861 11988621198871 1198872) (1199091 11990921199101 1199102) = (11988611199091 + 11988621199101 11988611199092 + 1198862119910211988711199091 + 11988721199101 11988711199092 + 11988721199102) (A4)

Such a mapping of A onto B is given by a matrix F

F = ( 1 2minus2 minus1) (A5)

and it is straightforward to verify that indeed( 3 4minus2 minus3) ( 1 2minus2 minus1) = (minus5 24 minus1) (A6)

Using matrix notation we can write

AF = B (A7)

Given A and B how do we obtain F The answeris straightforward but we need two further concepts theidentity matrix and the matrix inverse For multiplication ofnumbers the identity multiplication is multiplication with 1For matrices multiplication with the identity matrix I leavesthe locations of the points unchangedThe identitymatrix hasones on the main diagonal and zeros elsewhere It is easilyverified that indeed(1 00 1) (1199091 11990921199101 1199102) = (1199091 11990921199101 1199102) (1 00 1) = (1199091 11990921199101 1199102) (A8)

For numbers the inverse of multiplication with 119904 is dividingby 119904 (1119904 ) (119904119909) = 119909 (A9)

For matrices the inverse of a square matrix X is that matrixXminus1 such that their product is the identity matrix

Xminus1X = XXminus1 = I (A10)

For nonsquare matrices the inverse Yminus1 is defined such that

Yminus1 (YX) = (XY) 119884minus1 = X (A11)

We find the square matrix F that maps the square matrices Aonto B as follows

AF = B

Aminus1AF = Aminus1B

IF = Aminus1B

F = Aminus1B(A12)

Since for the present example A and its inverse happen to beidentical (Aminus1 = A) we obtain (A6)

Fmaps wordsrsquo form vectors onto wordsrsquo semantic vectorsLet us now consider the reverse and see how we can obtainwordsrsquo form vectors from wordsrsquo semantic vectors That isgiven B we want to find that matrix G which maps B ontoA ie

BG = A (A13)

Solving this equation proceeds exactly as above

BG = A

Bminus1BG = Bminus1A

IG = Bminus1A

G = Bminus1A(A14)

Complexity 33

The inverse of B is

Bminus1 = (13 2343 53) (A15)

and hence we have

(13 2343 53) ( 3 4minus2 minus3) = (minus13 minus2323 13 ) = G (A16)

The inverse of a matrix needs not exist A square matrixthat does not have an inverse is referred to as a singularmatrix Almost all matrices with which we work in theremainder of this study are singular For singular matricesan approximation of the inverse can be used such as theMoore-Penrose generalized inverse (In R the generalizedinverse is available in theMASS package function ginv Thenumeric library that is used by ginv is LAPACK availableat httpwwwnetliborglapack) In this study wedenote the generalized inverse of a matrix X by X1015840

The examples presented here are restricted to 2 times 2matrices but themathematics generalize tomatrices of largerdimensions When multiplying two matrices X and Y theonly constraint is that the number of columns ofX is the sameas the number of rows ofYThe resultingmatrix has the samenumber of rows as X and the same number of columns as Y

In the present study the rows of matrices represent wordsand columns the features of thesewords For instance the rowvectors of A can represent wordsrsquo form features and the rowvectors of and B the corresponding semantic features Thefirst row vector of A is mapped onto the first row vector ofB as follows (3 4) ( 1 2minus2 minus1) = (minus5 2) (A17)

A transformationmatrix such as F can be represented as atwo-layer networkThe network corresponding to F is shownin Figure 13 When the input vector (3 4) is presented to thenetwork the nodes 1198941 and 1198942 are set to 3 and 4 respectively Toobtain the activations 1199001 = minus5 and 1199002 = 2 for the output vector(minus5 2) we cumulate the evidence at the input weighted bythe connection strengths specified on the edges of the graphThus for 1199001 we obtain the value 3 times 1 + 4 times minus2 = minus5 and for1199002 we have 3 times 2 + 4 times minus1 = 2 In other words the networkmaps the input vector (3 4) onto the output vector (minus5 2)

An important property of linear maps is that they areproductive We can present a novel form vector say

s = (2 2) (A18)

to the network and it will map this vector onto a novelsemantic vector Using matrices this new semantic vector isstraightforwardly calculated(2 2) ( 1 2minus2 minus1) = (minus2 2) (A19)

1 2 minus2 minus1

i1 i2

o1 o2

semantic vectors come out here

form features go in here

connection weightsare the transformationmatrix

Figure 13 The linear network corresponding to the transformationmatrix F There are two input nodes 1198941 and 1198942 (in blue)and twooutput nodes 1199001 and 1199002 in red When the input vector (3 4) ispresented to the network the value at 1199001 is 3 times 1 + 4 times minus2 = minus5and that at 1199002 is 3 times 2 + 4 times minus1 = 2 The network maps the point(3 4) onto the point (minus5 2)B Graph-Based Triphone Sequencing

The hypothesis underlying graph-based triphone sequencingis that in the directed graph that has triphones as verticesand an edge between any two vertices that overlap (iesegments 2 and 3 of triphone 1 are identical to segments 1and 2 of triphone 2) the path from a wordrsquos initial triphone(the triphone starting with ) to a wordrsquos final triphone (thetriphone ending with a ) is the path receiving the bestsupport of all possible paths from the initial to the finaltriphone Unfortunately the directed graph containing allvertices and edges is too large tomake computations tractablein a reasonable amount of computation time The followingalgorithm is a heuristic algorithm that first collects potentiallyrelevant vertices and edges and then calculates all pathsstarting fromany initial vertex using theall simple pathsfunction from the igraph package [127] As a first step allvertices that are supported by the stem and whole wordwith network support exceeding 01 are selected To theresulting set those vertices are added that are supported bythe affix which is conditional on these vertices having anetwork support exceeding 095 A directed graph can nowbe constructed and for each initial vertex all paths startingat these vertices are calculated The subset of those pathsreaching a final vertex is selected and the support for eachpath is calculated As longer paths trivially will have greatersupport the support for a path is weighted for its lengthsimply by dividing the raw support by the path length Pathsare ranked by weighted path support and the path withmaximal support is selected as the acoustic image drivingarticulation

It is possible that no path from an initial vertex to a finalvertex is found due to critical boundary triphones not beinginstantiated in the data on which the model was trainedThis happens when the model is assessed on novel inflected

34 Complexity

nobo

dyso

met

hing

ever

ythi

ngno

thin

gev

eryo

nean

ythi

ngan

yone

som

eone

hers

him his

via

than I

exce

ptou

tside

thei

rsw

hoev

erth

ence

min

ein

side

amon

gst its

thei

rou

rm

yyo

uyo

ur her

she

your

selv

esw

hate

ver us we it

they he

them m

em

yself

him

self

hers

elfou

rselv

esyo

urse

lfits

elfth

emse

lves

abou

tw

ithou

tw

ithin

beyo

ndam

ong

betw

een

thro

ugho

utdu

ring

since by

from in of

with at fo

r tobe

low

abov

e up offdo

wn

upon on

agai

nst

tow

ard

tow

ards

arou

ndne

aral

ong

unde

rbe

hind

besid

eth

roug

hon

toin

toov

erac

ross

nobodysomethingeverythingnothingeveryoneanythinganyonesomeonehershimhisviathanIexceptoutsidetheirswhoeverthencemineinsideamongstitstheirourmyyouyourhersheyourselveswhateverusweittheyhethemmemyselfhimselfherselfourselvesyourselfitselfthemselvesaboutwithoutwithinbeyondamongbetweenthroughoutduringsincebyfrominofwithatfortobelowaboveupoffdownupononagainsttowardtowardsaroundnearalongunderbehindbesidethroughontointooveracross

minus01 0 01 02Value

Color Keyand Histogram

050

015

0025

00C

ount

Figure 14 Heatmap for the correlations of the semantic vectors of pronouns and prepositions Note that quantifier pronouns form a groupof their own that prepositions and the other pronouns form distinguisable groups and that within groups further subclusters are visible egforwe and us she and her and across and over All diagonal elements represent correlations equal to 1 (this is not brought out by the heatmapwhich color-codes diagonal elements with the color for the highest correlation of the range we specified 025)

and derived words under cross-validationTherefore verticesand edges are added to the graph for all nonfinal verticesthat are at the end of any of the paths starting at any initialtriphone Such novel vertices and edges are assigned zeronetwork support Typically paths from the initial vertex tothe final vertex can now be constructed and path support isevaluated as above

For an algorithm that allows triphone paths to includecycles which may be the case in languages with much richer

morphology than English see Baayen et al [85] and for codethe R packageWpmWithLdl described therein

C A Heatmap for Function Words(Pronouns and Prepositions)

The heatmap for the correlations of the semantic vectors ofpronouns and prepositions is presented in Figure 14

Complexity 35

Data Availability

With the exception of the data from theDistributed Little RedHen Lab all primary data are publicly available from the citedsources

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The authors are indebted to Geoff Hollis Jessie Nixon ChrisWestbury and LuisMienhardt for their constructive feedbackon earlier versions of this manuscript This research wassupported by an ERC advancedGrant (no 742545) to the firstauthor

References

[1] G Frege ldquoBegriffsschrift a formula language modeled uponthat of arithmetic for pure thoughtrdquo in From Frege to GodelA Source Book in Mathematical Logic 1879-1931 pp 1ndash82 1967

[2] B RussellAn Inquiry intoMeaning andTruth Allen andUnwinLondon UK 1942

[3] R Montague ldquoThe proper treatment of quantification in ordi-nary englishrdquo in Approaches to Natural Language pp 221ndash242Springer 1973

[4] N Chomsky Syntactic Structures vol 33 Mouton The Hague1957

[5] W J M Levelt Formal Grammars in Linguistics And Psycholin-guistics Volume 1 An Introduction to the Theory of FormalLanguages and Automata Volume 2 Applications in LinguisticTheory Volume 3 Psycholinguistic Applications John BenjaminsPublishing 2008

[6] D F Kleinschmidt and T Florian Jaeger ldquoRobust speechperception Recognize the familiar generalize to the similar andadapt to the novelrdquo Psychological Review vol 122 no 2 pp 148ndash203 2015

[7] L Breiman ldquoRandom forestsrdquo Machine Learning vol 45 no 1pp 5ndash32 2001

[8] C Strobl A-L Boulesteix A Zeileis and T Hothorn ldquoBiasin random forest variable importance measures illustrationssources and a solutionrdquo BMC Bioinformatics vol 8 2007

[9] A Hannun C Case J Casper et al ldquoDeep speechScaling up end-to-end speech recognitionrdquo 2014httpsarxivorgabs14125567

[10] Z Tuske P Golik R Schluter and H Ney ldquoAcoustic modelingwith deep neural networks using raw time signal for lvcsrrdquo inProceedings of the INTERSPEECH pp 890ndash894 2014

[11] J Schmidhuber ldquoDeep learning in neural networks anoverviewrdquo Neural Networks vol 61 pp 85ndash117 2015

[12] T Hannagan J C Ziegler S Dufau J Fagot and J GraingerldquoDeep learning of orthographic representations in baboonsrdquoPLoS ONE vol 9 no 1 2014

[13] J Grainger S Dufau M Montant J C Ziegler and J FagotldquoOrthographic processing in baboons (Papio papio)rdquo Sciencevol 336 no 6078 pp 245ndash248 2012

[14] D Scarf K Boy A U Reinert J Devine O Gunturkun andM Colombo ldquoOrthographic processing in pigeons (columba

livia)rdquo Proceedings of the National Academy of Sciences vol 113no 40 pp 11272ndash11276 2016

[15] M Linke F Broker M Ramscar and R H Baayen ldquoArebaboons learning ldquoorthographicrdquo representations Probablynotrdquo PLoS ONE vol 12 no 8 2017

[16] D H Hubel and T N Wiesel ldquoReceptive fields binocularinteraction and functional architecture in the catrsquos visualcortexrdquo The Journal of Physiology vol 160 pp 106ndash154 1962

[17] R H Baayen and P Hendrix ldquoTwo-layer networks non-linear separation and human learningrdquo in From Semantics toDialectometry Festschrift in honor of JohnNerbonne Tributes 32M Wieling M Kroon G Van Noord and G Bouma Eds pp13ndash22 College Publications 2017

[18] D Arnold F Tomaschek K Sering F Lopez and R H BaayenldquoWords from spontaneous conversational speech can be rec-ognized with human-like accuracy by an error-driven learningalgorithm that discriminates between meanings straight fromsmart acoustic features bypassing the phoneme as recognitionunitrdquo PLoS ONE vol 12 no 4 2017

[19] P Zwitserlood ldquoProcessing and representation of morphologi-cal complexity in native language comprehension and produc-tionrdquo in The Construction of Words G E Booij Ed pp 583ndash602 Springer 2018

[20] G Booij ldquoConstruction morphologyrdquo Language and LinguisticsCompass vol 4 no 7 pp 543ndash555 2010

[21] J P Blevins ldquoWord-based morphologyrdquo Journal of Linguisticsvol 42 no 3 pp 531ndash573 2006

[22] G Stump Inflectional Morphology A Theory of ParadigmStructure Cambridge University Press 2001

[23] R Beard ldquoOn the extent and nature of irregularity in thelexiconrdquo Lingua vol 42 no 4 pp 305ndash341 1977

[24] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1974

[25] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1991

[26] C Hockett ldquoThe origin of speechrdquo Scientific American vol 203pp 89ndash97 1960

[27] M Erelt Estonian Language M Erelt Ed Estonian AcademyPublishers Tallinn Estonia 2003

[28] P H Matthews Grammatical Theory in the United States fromBloomfield to Chomsky Cambridge University Press Cam-bridge 1993

[29] N Chomsky andMHalleTheSound Pattern of English Harperand Row NY USA 1968

[30] T Landauer and S Dumais ldquoA solution to Platorsquos problem Thelatent semantic analysis theory of acquisition induction andrepresentation of knowledgerdquo Psychological Review vol 104 no2 pp 211ndash240 1997

[31] W Weaver ldquoTranslationrdquo in Machine Translation of LanguagesFourteen Essays W N Locke and A D Booth Eds pp 15ndash23MIT Press Cambridge 1955

[32] J R Firth Selected Papers of J R Firth 1952-59 IndianaUniversity Press 1968

[33] K Lund and C Burgess ldquoProducing high-dimensional seman-tic spaces from lexical co-occurrencerdquo Behavior Research Meth-ods Instruments and Computers vol 28 no 2 pp 203ndash2081996

[34] C Shaoul and C Westbury ldquoExploring lexical co-occurrencespace using HiDExrdquo Behavior Research Methods vol 42 no 2pp 393ndash413 2010

36 Complexity

[35] T Mikolov I Sutskever K Chen G Corrado and J DeanldquoDistributed representations of words and phrases and theircompositionalityrdquo inAdvances in Neural Information ProcessingSystems pp 3111ndash3119 2013

[36] A Roelofs ldquoThe WEAVER model of word-form encoding inspeech productionrdquo Cognition vol 64 no 3 pp 249ndash284 1997

[37] W J M Levelt A Roelofs and A S Meyer ldquoA theory of lexicalaccess in speech productionrdquoBehavioral and Brain Sciences vol22 no 1 pp 1ndash38 1999

[38] M Taft ldquoInteractive-activation as a Framework for Under-standing Morphological Processingrdquo Language and CognitiveProcesses vol 9 no 3 pp 271ndash294 1994

[39] R Schreuder and R H Baayen ldquoModeling morphologicalprocessingrdquo inMorphological Aspects of Language Processing LB Feldman Ed pp 131ndash154 Lawrence ErlbaumHillsdale NewJersey USA 1995

[40] B C Love D L Medin and T M Gureckis ldquoSUSTAIN ANetwork Model of Category Learningrdquo Psychological Reviewvol 111 no 2 pp 309ndash332 2004

[41] C J Marsolek ldquoWhat antipriming reveals about primingrdquoTrends in Cognitive Sciences vol 12 no 5 pp 176ndash181 2008

[42] M Ramscar and R Port ldquoCategorization (without categories)rdquoin Handbook of Cognitive Linguistics E Dabrowska and DDivjak Eds pp 75ndash99 De Gruyter Berlin Germany 2015

[43] R A Rescorla and A R Wagner ldquoA theory of Pavlovianconditioning Variations in the effectiveness of reinforcementand nonreinforcementrdquo in Classical Conditioning II CurrentResearch and Theory A H Black and W F Prokasy Eds pp64ndash99 Appleton Century Crofts NY USA 1972

[44] B Widrow and M E Hoff ldquoAdaptive switching circuitsrdquo 1960WESCON Convention Record Part IV pp 96ndash104 1960

[45] R H Baayen R Schreuder and R Sproat ldquoMorphology inthe Mental Lexicon A Computational Model for Visual WordRecognitionrdquo in Lexicon Development for Speech and LanguageProcessing F van Eynde and D Gibbon Eds pp 267ndash291Kluwer Academic Publishers 2000

[46] M Taft ldquoA morphological-decomposition model of lexicalrepresentationrdquo Linguistics vol 26 no 4 pp 657ndash667 1988

[47] MWHarmandM S Seidenberg ldquoComputing themeanings ofwords in reading Cooperative division of labor between visualand phonological processesrdquo Psychological Review vol 111 no3 pp 662ndash720 2004

[48] P C Trimmer J M McNamara A Houston and J A RMarshall ldquoDoes natural selection favour the Rescorla-Wagnerrulerdquo Journal of Theoretical Biology vol 302 pp 39ndash52 2012

[49] R R Miller R C Barnet and N J Grahame ldquoAssessment ofthe Rescorla-Wagner modelrdquo Psychological Bulletin vol 117 no3 pp 363ndash386 1995

[50] L J Kamin ldquoPredictability surprise attention and condition-ingrdquo in Punishment and Aversive Behavior B A Campbell andRMChurch Eds pp 276ndash296Appleton-Century-CroftsNYUSA 1969

[51] R A Rescorla ldquoPavlovian conditioning Itrsquos not what you thinkit isrdquo American Psychologist vol 43 no 3 pp 151ndash160 1988

[52] M Ramscar D Yarlett M Dye K Denny and K ThorpeldquoThe Effects of Feature-Label-Order and Their Implications forSymbolic Learningrdquo Cognitive Science vol 34 no 6 pp 909ndash957 2010

[53] M Ramscar and D Yarlett ldquoLinguistic self-correction in theabsence of feedback A new approach to the logical problem oflanguage acquisitionrdquo Cognitive Science vol 31 no 6 pp 927ndash960 2007

[54] R H Baayen P Milin D F ETHurđevic P Hendrix and MMarelli ldquoAn Amorphous Model for Morphological Process-ing in Visual Comprehension Based on Naive DiscriminativeLearningrdquo Psychological Review vol 118 no 3 pp 438ndash4812011

[55] F M Del Prado Martın R Bertram T Haikio R Schreuderand R H Baayen ldquoMorphological family size in a morphologi-cally rich languageThe case of Finnish compared to Dutch andHebrewrdquo Journal of Experimental Psychology LearningMemoryand Cognition vol 30 pp 1271ndash1278 2004

[56] P Milin D F Durdevic and F M Del Prado Martın ldquoThesimultaneous effects of inflectional paradigms and classes onlexical recognition Evidence from Serbianrdquo Journal of Memoryand Language vol 60 no 1 pp 50ndash64 2009

[57] B K Bergen ldquoThe psychological reality of phonaesthemesrdquoLanguage vol 80 no 2 pp 290ndash311 2004

[58] P Milin L B Feldman M Ramscar P Hendrix and R HBaayen ldquoDiscrimination in lexical decisionrdquo PLoS ONE vol 12no 2 2017

[59] R H Baayen P Milin and M Ramscar ldquoFrequency in lexicalprocessingrdquo Aphasiology vol 30 no 11 pp 1174ndash1220 2016

[60] M Marelli and M Baroni ldquoAffixation in semantic space Mod-eling morpheme meanings with compositional distributionalsemanticsrdquo Psychological Review vol 122 no 3 pp 485ndash5152015

[61] T Sering PMilin andRH Baayen ldquoLanguage comprehensionas amultiple label classification problemrdquo StatisticaNeerlandicapp 1ndash15 2018

[62] R Kaye and RWilson Linear Algebra Oxford University Press1998

[63] S H Ivens and B L Koslin Demands for Reading LiteracyRequire New Accountability Methods Touchstone Applied Sci-ence Associates 1991

[64] T K Landauer P W Foltz and D Laham ldquoAn introduction tolatent semantic analysisrdquoDiscourse Processes vol 25 no 2-3 pp259ndash284 1998

[65] H Schmid ldquoImprovements in Part-of-Speech Tagging with anApplication to Germanrdquo in Proceedings of the ACL SIGDAT-Workshop pp 13ndash25 Dublin Ireland 1995

[66] R H Baayen R Piepenbrock and L Gulikers The CELEXLexical Database (CD-ROM) Linguistic Data ConsortiumUniversity of Pennsylvania Philadelphia PA 1995

[67] J Mitchell and M Lapata ldquoVector-based models of semanticcompositionrdquo in Proceedings of the ACL pp 236ndash244 2008

[68] A Lazaridou M Marelli R Zamparelli and M BaronildquoCompositionally derived representations of morphologicallycomplex words in distributional semanticsrdquo in Proceedings ofthe ACL (1) pp 1517ndash1526 2013

[69] R Cotterell H Schutze and J Eisner ldquoMorphological smooth-ing and extrapolation of word embeddingsrdquo in Proceedings ofthe 54th Annual Meeting of the Association for ComputationalLinguistics vol 1 pp 1651ndash1660 2016

[70] B D Zeller S Pado and J Snajder ldquoTowards semantic valida-tion of a derivational lexiconrdquo in Proceedings of the COLING2014 the 25th International Conference on Computational Lin-guistics Technical Papers pp 1728ndash1739 2014

[71] H Marchand The Categories and Types of Present-Day EnglishWord Formation A Synchronic-Diachronic Approach BeckrsquoscheVerlagsbuchhandlung Munchen Germany 1969

[72] L Bauer EnglishWord-Formation Cambridge University PressCambridge 1983

Complexity 37

[73] I PlagWord-Formation in English CambridgeUniversity PressCambridge 2003

[74] R Beard Lexeme-Morpheme Base Morphology A GeneralTheory of Inflection and Word Formation State University ofNew York Press Albany NY USA 1995

[75] K Geeraert J Newman and R H Baayen ldquoIdiom VariationExperimental Data and a Blueprint of a Computational ModelrdquoTopics in Cognitive Science vol 9 no 3 pp 653ndash669 2017

[76] C Shaoul S Bitschau N Schilling et al ldquondl2 Naive discrim-inative learning an implementation in R R packagerdquo 2015

[77] P Milin D Divjak and R H Baayen ldquoA learning perspectiveon individual differences in skilled reading Exploring andexploiting orthographic and semantic discrimination cuesrdquoJournal of Experimental Psychology Learning Memory andCognition vol 43 no 11 pp 1730ndash1751 2017

[78] M Ramscar P Hendrix C Shaoul P Milin and R H BaayenldquoNonlinear dynamics of lifelong learning the myth of cognitivedeclinerdquo Topics in Cognitive Science vol 6 pp 5ndash42 2014

[79] M Ramscar C C Sun P Hendrix and R H Baayen ldquoTheMis-measurement of Mind Life-Span Changes in Paired-Associate-Learning Scores Reflect the ldquoCostrdquo of Learning Not CognitiveDeclinerdquo Psychological Science vol 28 no 8 pp 1171ndash1179 2017

[80] G desRosiers and D Ivison ldquoPaired associate learning Form 1and form 2 of the Wechsler memory scalerdquo Archives of ClinicalNeuropsychology vol 3 no 1 pp 47ndash67 1988

[81] E Bruni N K Tran andM Baroni ldquoMultimodal distributionalsemanticsrdquo Journal of Artificial Intelligence Research vol 49 pp1ndash47 2014

[82] W N Venables and B D RipleyModern Applied Statistics withS-PLUS Springer NY USA 2002

[83] A B Warriner V Kuperman and M Brysbaert ldquoNorms ofvalence arousal and dominance for 13915 English lemmasrdquoBehavior Research Methods vol 45 no 4 pp 1191ndash1207 2013

[84] D Kastovsky ldquoProductivity in word formationrdquo Linguistics vol24 no 3 pp 585ndash600 1986

[85] R H Baayen Y Chuang and J P Blevins ldquoInflectionalmorphology with linear mappingsrdquoTheMental Lexicon vol 13no 2 pp 232ndash270 2018

[86] Burnage CELEX A Guide for Users Centre for Lexical Infor-mation Nijmegen The Netherlands 1988

[87] C McBride-Chang ldquoModels of Speech Perception and Phono-logical Processing in Readingrdquo Child Development vol 67 no4 pp 1836ndash1856 1996

[88] G C Van Orden H Kloos et al ldquoThe question of phonologyand readingrdquo inThe Science of Reading A Handbook pp 61ndash782005

[89] D Jared and K OrsquoDonnell ldquoSkilled adult readers activate themeanings of high-frequency words using phonology Evidencefrom eye trackingrdquoMemoryampCognition vol 45 no 2 pp 334ndash346 2017

[90] G S Dell ldquoA Spreading-Activation Theory of Retrieval inSentence Productionrdquo Psychological Review vol 93 no 3 pp283ndash321 1986

[91] A Marantz ldquoNo escape from morphemes in morphologicalprocessingrdquo Language and Cognitive Processes vol 28 no 7 pp905ndash916 2013

[92] M Bozic W D Marslen-Wilson E A Stamatakis M HDavis and L K Tyler ldquoDifferentiating morphology formand meaning Neural correlates of morphological complexityrdquoCognitive Neuroscience vol 19 no 9 pp 1464ndash1475 2007

[93] R F Port andA P Leary ldquoAgainst formal phonologyrdquoLanguagevol 81 no 4 pp 927ndash964 2005

[94] J P Blevins ldquoStems and paradigmsrdquo Language vol 79 no 4 pp737ndash767 2003

[95] R Forsyth and D Holmes ldquoFeature-finding for test classifica-tionrdquo Literary and Linguistic Computing vol 11 no 4 pp 163ndash174 1996

[96] H Pham and R H Baayen ldquoVietnamese compounds showan anti-frequency effect in visual lexical decisionrdquo LanguageCognition and Neuroscience vol 30 no 9 pp 1077ndash1095 2015

[97] L Cohen and S Dehaene ldquoVentral and dorsal contributions toword readingrdquo inTheCognitive Neurosciences M S GazzanigaEd pp 789ndash804 Massachusetts Institute of Technology 2009

[98] E Keuleers P Lacey K Rastle and M Brysbaert ldquoThe BritishLexicon Project Lexical decision data for 28730 monosyllabicand disyllabic English wordsrdquo Behavior Research Methods vol44 no 1 pp 287ndash304 2012

[99] R K Wagner J K Torgesen and C A Rashotte ldquoDevel-opment of reading-related phonological processing abilitiesnew evidence of bidirectional causality from a latent variablelongitudinal studyrdquoDevelopmental Psychology vol 30 no 1 pp73ndash87 1994

[100] K F EWong andH-C Chen ldquoOrthographic and phonologicalprocessing in reading Chinese text Evidence from eye fixa-tionsrdquo Language and Cognitive Processes vol 14 no 5-6 pp461ndash480 1999

[101] R L Newman D Jared and C A Haigh ldquoDoes phonologyplay a role when skilled readers read high-frequency wordsEvidence from ERPsrdquo Language and Cognitive Processes vol 27no 9 pp 1361ndash1384 2012

[102] D Jared J Ashby S J Agauas and B A Levy ldquoPhonologicalactivation of word meanings in grade 5 readersrdquo Journal ofExperimental Psychology LearningMemory andCognition vol42 no 4 pp 524ndash541 2016

[103] T Bitan A Kaftory A Meiri-Leib Z Eviatar and O PelegldquoPhonological ambiguity modulates resolution of semanticambiguity during reading An fMRI study of Hebrewrdquo Neu-ropsychology vol 31 no 7 pp 759ndash777 2017

[104] D Jared and S Bainbridge ldquoReading homophone puns Evi-dence from eye trackingrdquo Canadian Journal of ExperimentalPsychology vol 71 no 1 pp 2ndash13 2017

[105] S Amenta M Marelli and S Sulpizio ldquoFrom sound tomeaning Phonology-to-Semantics mapping in visual wordrecognitionrdquo Psychonomic Bulletin amp Review vol 24 no 3 pp887ndash893 2017

[106] M Perrone-Bertolotti J Kujala J R Vidal et al ldquoHow silent issilent reading intracerebral evidence for top-down activationof temporal voice areas during readingrdquo The Journal of Neuro-science vol 32 no 49 pp 17554ndash17562 2012

[107] B Yao P Belin andC Scheepers ldquoSilent reading of direct versusindirect speech activates voice-selective areas in the auditorycortexrdquo Journal of Cognitive Neuroscience vol 23 no 10 pp3146ndash3152 2011

[108] M Coltheart B Curtis P Atkins and M Haller ldquoMod-els of Reading Aloud Dual-Route and Parallel-Distributed-Processing Approachesrdquo Psychological Review vol 100 no 4pp 589ndash608 1993

[109] M Coltheart ldquoModeling reading The dual-route approachrdquo inThe Science of Reading A Handbook pp 6ndash23 2005

38 Complexity

[110] M Pitt K Johnson E Hume S Kiesling and W RaymondldquoThe Buckeye corpus of conversational speech Labeling con-ventions and a test of transcriber reliabilityrdquo Speech Communi-cation vol 45 no 1 pp 89ndash95 2005

[111] K Johnson ldquoMassive reduction in conversational AmericanEnglishrdquo in Proceedings of the Spontaneous Speech Data andAnalysis Proceedings of the 1st Session of the 10th InternationalSymposium pp 29ndash54 The National International Institute forJapanese Language Tokyo Japan 2004

[112] C Phillips ldquoLevels of representation in the electrophysiology ofspeech perceptionrdquo Cognitive Science vol 25 no 5 pp 711ndash7312001

[113] R L Diehl A J Lotto and L L Holt ldquoSpeech perceptionrdquoAnnual Review of Psychology vol 55 pp 149ndash179 2004

[114] D Norris and J M McQueen ldquoShortlist B a Bayesian model ofcontinuous speech recognitionrdquo Psychological Review vol 115no 2 pp 357ndash395 2008

[115] S Hawkins ldquoRoles and representations of systematic finephonetic detail in speech understandingrdquo Journal of Phoneticsvol 31 no 3-4 pp 373ndash405 2003

[116] CCucchiarini andH Strik ldquoAutomatic Phonetic TranscriptionAn overviewrdquo in Proceedings of the 15th ICPhS pp 347ndash350Barcelona Spain 2003

[117] K JohnsonTheAuditoryPerceptual Basis for Speech Segmenta-tion Ohio State University Working Papers in Linguistics 1997

[118] H Fletcher ldquoAuditory patternsrdquo Reviews of Modern Physics vol12 no 1 pp 47ndash65 1940

[119] D Arnold ldquoAcousticndlcoder Coding sound files for use withndl R package version 101rdquo 2017

[120] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2016

[121] J Pickett and I Pollack ldquoIntelligibility of excerpts from fluentspeech Effects of rate of utterance and duration of excerptrdquoLanguage and Speech vol 6 pp 151ndash164 1963

[122] L Shockey ldquoPerception of reduced forms by non-native speak-ers of Englishrdquo in Sound Patterns of Spontaneous Speech DDuez Ed pp 97ndash100 ESCA Aix 1998

[123] M Ernestus H Baayen and R Schreuder ldquoThe recognition ofreduced word formsrdquo Brain and Language vol 81 no 1-3 pp162ndash173 2002

[124] R H Baayen C Shaoul J Willits and M Ramscar ldquoCom-prehension without segmentation a proof of concept withnaive discriminative learningrdquo Language Cognition and Neu-roscience vol 31 no 1 pp 106ndash128 2016

[125] C P Browman and L Goldstein ldquoArticulatory phonology Anoverviewrdquo Phonetica vol 49 no 3-4 pp 155ndash180 1992

[126] J A Tourville and F H Guenther ldquoThe DIVA model A neuraltheory of speech acquisition and productionrdquo Language andCognitive Processes vol 26 no 7 pp 952ndash981 2011

[127] G Csardi and T Nepusz ldquoThe igraph software package forcomplex network researchrdquo InterJournal Complex Systems vol1695 no 5 pp 1ndash9 2006

[128] M Ernestus and R H Baayen ldquoPredicting the unpredictableInterpreting neutralized segments in Dutchrdquo Language vol 79no 1 pp 5ndash38 2003

[129] R Weingarten G Nottbusch and U Will ldquoMorphemes sylla-bles and graphemes in written word productionrdquo inMultidisci-plinary Approaches to Speech Production T Pechmann and CHabel Eds pp 529ndash572 Mouton de Gruyter Berlin Germany2004

[130] R Bertram F E Toslashnnessen S Stromqvist J Hyona andP Niemi ldquoCascaded processing in written compound wordproductionrdquo Frontiers in Human Neuroscience vol 9 pp 1ndash102015

[131] T Cho ldquoEffects of morpheme boundaries on intergesturaltiming Evidence from Koreanrdquo Phonetica vol 58 no 3 pp129ndash162 2001

[132] S N Wood Generalized Additive Models Chapman ampHallCRC NY USA 2017

[133] M Seidenberg ldquoSublexical structures in visual word recogni-tion Access units or orthographic redundancyrdquo in Attentionand Performance XII M Coltheart Ed pp 245ndash264 LawrenceErlbaum Associates Hove 1987

[134] J M McQueen ldquoSegmentation of continuous speech usingphonotacticsrdquo Journal of Memory and Language vol 39 no 1pp 21ndash46 1998

[135] J B Hay Causes and Consequences of Word Structure Rout-ledge NY USA London UK 2003

[136] J B Hay ldquoFrom speech perception to morphology Affix-ordering revisitedrdquo Language vol 78 pp 527ndash555 2002

[137] J B Hay and R H Baayen ldquoPhonotactics parsing and produc-tivityrdquo Italian Journal of Linguistics vol 15 no 1 pp 99ndash1302003

[138] J Hay J Pierrehumbert and M Beckman ldquoSpeech perceptionwell-formedness and the statistics of the lexiconrdquo Papers inlaboratory phonology VI pp 58ndash74 2004

[139] M Ferro C Marzi and V Pirrelli ldquoA self-organizing modelof word storage and processing implications for morphologylearningrdquo Lingue e Linguaggio vol 10 no 2 pp 209ndash226 2011

[140] F ChersiM Ferro G Pezzulo andV Pirrelli ldquoTopological self-organization and prediction learning support both action andlexical chains in the brainrdquo Topics in Cognitive Science vol 6no 3 pp 476ndash491 2014

[141] V Pirrelli M Ferro and C Marzi ldquoComputational complexityof abstractive morphologyrdquo in Understanding and MeasuringMorphological Complexity M Bearman D Brown and G GCorbett Eds pp 141ndash166 Oxford University Press 2015

[142] C Marzi M Ferro and V Pirrelli ldquoIs inflectional irregularitydysfunctional to human processingrdquo in Abstract Booklet TheMental Lexicon 2018 V Kuperman Ed University of AlbertaEdmonton 2018

[143] E Keuleers M Stevens P Mandera and M Brysbaert ldquoWordknowledge in the crowd Measuring vocabulary size and wordprevalence in a massive online experimentrdquo The QuarterlyJournal of Experimental Psychology vol 68 no 8 pp 1665ndash16922015

[144] M Ernestus Voice Assimilation And Segment Reduction inCasual Dutch A Corpus-Based Study of the Phonology-PhoneticsInterface LOT Utrecht the Netherlands 2000

[145] R Kemps M Ernestus R Schreuder and H Baayen ldquoProcess-ing reducedword formsThe suffix restoration effectrdquoBrain andLanguage vol 90 no 1-3 pp 117ndash127 2004

[146] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) vol 1 pp 886ndash893 June 2005

[147] G Hickok ldquoThe architecture of speech production and the roleof the phoneme in speech processingrdquo Language Cognition andNeuroscience vol 29 no 1 pp 2ndash20 2014

[148] B V Tucker M Sims and R H Baayen Opposing Forceson Acoustic Duration Manuscript University of Alberta andUniversity of Tubingen 2018

Complexity 39

[149] I Plag J Homann and G Kunter ldquoHomophony and mor-phology The acoustics of word-final S in Englishrdquo Journal ofLinguistics vol 53 no 1 pp 181ndash216 2017

[150] F Tomaschek I Plag M Ernestus and R H BaayenModeling the Duration of Word-Final s in English withNaive Discriminative Learning Manuscript University ofSiegenTubingenNijmegen 2018

[151] R E Kalman ldquoA new approach to linear filtering and predictionproblemsrdquo Journal of Fluids Engineering vol 82 no 1 pp 35ndash451960

[152] B Russell ldquoOn denotingrdquo Mind vol 14 no 56 pp 479ndash4931905

[153] N Hornstein Logical Form FromGB toMinimalism Blackwell1995

[154] R Jackendoff Semantic StructuresMITPress Cambridge 1990[155] J J McCarthy ldquoA prosodic theory of non-concatenative mor-

phologyrdquo Linguistic Inquiry vol 12 pp 373ndash418 1981[156] A Ussishkin ldquoA Fixed Prosodic Theory of Nonconcatenative

Templaticmorphologyrdquo Natural Language amp Linguistic Theoryvol 23 no 1 pp 169ndash218 2005

[157] A Ussishkin ldquoAffix-favored contrast inequity and psycholin-guistic grounding for non-concatenative morphologyrdquo Mor-phology vol 16 no 1 pp 107ndash125 2006

[158] RW Young andWMorganTheNavajo Language A Grammarand Colloquial Dictionary University of New Mexico Press1980

[159] S C Levinson and A Majid ldquoThe island of time Yelı Dnye thelanguage of Rossel islandrdquo Frontiers in psychology vol 4 2013

[160] S C Levinson ldquoThe language of space in Yelı Dnyerdquo inGrammars of Space Explorations in Cognitive Diversity pp 157ndash203 Cambridge University Press 2006

[161] E Shafaei-Bajestan and R H Baayen ldquoWide Learning forAuditory Comprehensionrdquo in Proceedings of the Interspeech2018 2018

[162] J Hay and K Drager ldquoStuffed toys and speech perceptionrdquoLinguistics vol 48 no 4 pp 865ndash892 2010

[163] V Henri and C Henri ldquoOn our earliest recollections ofchildhoodrdquo Psychological Review vol 2 pp 215-216 1895

[164] S Freud ldquoChildhood and concealing memoriesrdquo in The BasicWritings of Sigmund Freud The Modern Library NY USA19051953

[165] P J Bauer and M Larkina ldquoThe onset of childhood amnesiain childhood A prospective investigation of the course anddeterminants of forgetting of early-life eventsrdquoMemory vol 22no 8 pp 907ndash924 2014

[166] M S Seidenberg and L M Gonnerman ldquoExplaining deriva-tional morphology as the convergence of codesrdquo Trends inCognitive Sciences vol 4 no 9 pp 353ndash361 2000

[167] M Coltheart K Rastle C Perry R Langdon and J ZieglerldquoDRC a dual route cascaded model of visual word recognitionand reading aloudrdquoPsychological Review vol 108 no 1 pp 204ndash256 2001

[168] M V Butz and E F Kutter How the Mind Comes into BeingIntroducing Cognitive Science from a Functional and Computa-tional Perspective Oxford University Press 2016

[169] M W Harm and M S Seidenberg ldquoPhonology readingacquisition and dyslexia Insights from connectionist modelsrdquoPsychological Review vol 106 no 3 pp 491ndash528 1999

[170] L M Gonnerman M S Seidenberg and E S AndersenldquoGraded semantic and phonological similarity effects in prim-ing Evidence for a distributed connectionist approach to

morphologyrdquo Journal of Experimental Psychology General vol136 no 2 pp 323ndash345 2007

[171] H Zhao O Gallo I Frosio and J Kautz ldquoLoss Functions forImage Restoration With Neural Networksrdquo IEEE Transactionson Computational Imaging vol 3 no 1 pp 47ndash57 2017

[172] C Clarke and P Luce ldquoPerceptual adaptation to speakercharacteristics Vot boundaries in stop voicing categorizationrdquoin Proceedings of the ISCA Workshop on Plasticity in SpeechPerception 2005

[173] DNorris JMMcQueen andACutler ldquoPerceptual learning inspeechrdquo Cognitive Psychology vol 47 no 2 pp 204ndash238 2003

[174] A Schweitzer and N Lewandowski ldquoSocial factors in conver-gence of f1 and f2 in spontaneous speechrdquo in Proceedings ofthe 10th International Seminar on Speech Production CologneGermany 2014

[175] M J Pickering and S Garrod ldquoToward a mechanistic psychol-ogy of dialoguerdquo Behavioral and Brain Sciences vol 27 no 2 pp169ndash190 2004

[176] C M Clarke and M F Garrett ldquoRapid adaptation to foreign-accented Englishrdquo The Journal of the Acoustical Society ofAmerica vol 116 no 6 pp 3647ndash3658 2004

[177] W J Levelt ldquoDie konnektionistische moderdquo Sprache UndKognition vol 10 no 2 pp 61ndash72 1991

[178] A Kostic ldquoInformational load constraints on processinginflected morphologyrdquo in Morphological Aspects of LanguageProcessing L B Feldman Ed Lawrence Erlbaum Inc Publish-ers New Jersey USA 1995

[179] J P BlevinsWord andParadigmMorphology OxfordUniversityPress 2016

[180] E Forstemann Uber Deutsche volksetymologie Zeitschrift furvergleichende Sprachforschung auf dem Gebiete des DeutschenGriechischen und Lateinischen 1852

[181] K Nault Morphological Therapy Protocal [Ph D thesis] Uni-versity of Alberta Edmonton 2010

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 4: The Discriminative Lexicon: A Unified Computational Model ...downloads.hindawi.com/journals/complexity/2019/4895891.pdf · as developed within computational linguistics and com-puter

4 Complexity

is not unique to the present approach but is shared withother algorithms for constructing semantic vectors such asword2vec However time-variant semantic vectors are instark contrast to the monadic units representing meaningsin models such as proposed by Baayen et al [45] Taft [46]Levelt et al [37] The distributed representation for wordsrsquomeanings in the triangle model [47] were derived fromWordNet and hence are also time-invariant) Thus in whatfollows a wordrsquos meaning is defined as a semantic vectorthat is a function of time it is subject to continuous changeBy itself without context a wordrsquos semantic vector at time 119905reflects its lsquoentanglementrsquo with other words

Obviously it is extremely unlikely that our understandingof the classes and categories that we discriminate betweenis well approximated just by lexical cooccurrence statisticsHowever we will show that text-based semantic vectors aregood enough as a first approximation for implementing acomputationally tractable discriminative lexicon

Returning to the learning rule (1) we note that it iswell-known to capture important aspects of the dynamics ofdiscrimination learning [48 49] For instance it captures theblocking effect [43 50 51] as well as order effects in learning[52] The blocking effect is the finding that when one featurehas been learned to perfection adding a novel feature doesnot improve learning This follows from (1) which is that as119908119894119895 tends towards 1 Δ 119894119895 tends towards 0 A novel featureeven though by itself it is perfectly predictive is blocked fromdeveloping a strong connection weight of its own The ordereffect has been observed for eg the learning of words forcolors Given objects of fixed shape and size but varying colorand the corresponding color words it is essential that theobjects are presented to the learner before the color wordThis ensures that the objectrsquos properties become the featuresfor discriminating between the color words In this caseapplication of the learning rule (1) will result in a strongconnection between the color features of the object to theappropriate color word If the order is reversed the colorwords become features predicting object properties In thiscase application of (1) will result in weights from a color wordto object features that are proportional to the probabilities ofthe objectrsquos features in the training data

Given the dynamics of discriminative learning staticlexical entries are not useful Anticipating more detaileddiscussion below we argue that actually the whole dictio-nary metaphor of lexical access is misplaced and that incomprehensionmeanings are dynamically created from formand that in production forms are dynamically created frommeanings Importantly what forms andmeanings are createdwill vary from case to case as all learning is fundamentallyincremental and subject to continuous recalibration In orderto make the case that this is actually possible and computa-tionally feasible we first introduce the framework of naivediscriminative learning discuss its limitations and then layout how this approach can be enhanced to meet the goals ofthis study

24 Naive Discriminative Learning Naive discriminativelearning (ndl) is a computational framework grounded

in learning rule (1) that was inspired by prior work ondiscrimination learning (eg [52 53]) A model for readingcomplex words was introduced by Baayen et al [54] Thisstudy implemented a two-layer linear network with letterpairs as input features (henceforth cues) and units repre-senting meanings as output classes (henceforth outcomes)Following Word and Paradigm morphology this model doesnot include form representations for morphemes or wordforms It is an end-to-end model for the relation betweenform and meaning that sets itself the task to predict wordmeanings from sublexical orthographic features Baayen etal [54] were able to show that the extent to which cuesin the visual input support word meanings as gaugedby the activation of the corresponding outcomes in thenetwork reflects a wide range of phenomena reported inthe lexical processing literature including the effects offrequency of occurrence family size [55] relative entropy[56] phonaesthemes (nonmorphemic sounds with a con-sistent semantic contribution such as gl in glimmer glowand glisten) [57] and morpho-orthographic segmentation[58]

However the model of Baayen et al [54] adopted a starkform of naive realism albeit just for reasons of computationalconvenience A wordrsquos meaning was defined in terms of thepresence or absence of an outcome (120575119895 in (1)) Subsequentwork sought to address this shortcoming by developingsemantic vectors using learning rule (1) to predict wordsfrom words as explained above [58 59] The term ldquolexomerdquowas introduced as a technical term for the outcomes ina form-to-meaning network conceptualized as pointers tosemantic vectors

In this approach lexomes do double duty In a firstnetwork lexomes are the outcomes that the network istrained to discriminate between given visual input In asecond network the lexomes are the lsquoatomicrsquo units in acorpus that serve as the input for building a distributionalmodel with semantic vectors Within the theory unfoldedin this study the term lexome refers only to the elementsfrom which a semantic vector space is constructed Thedimension of this vector space is equal to the number oflexomes and each lexome is associated with its own semanticvector

It turns out however that mathematically a networkdiscriminating between lexomes given orthographic cues asin naive discriminative learning models is suboptimal Inorder to explain why the set-up of ndl is suboptimal weneed some basic concepts and notation from linear algebrasuch as matrix multiplication and matrix inversion Readersunfamiliar with these concepts are referred to Appendix Awhich provides an informal introduction

241 Limitations of Naive Discriminative Learning Math-ematically naive discrimination learning works with twomatrices a cue matrix C that specifies wordsrsquo form featuresand a matrix S that specifies the targeted lexomes [61] Weillustrate the cuematrix for four words aaa aab abb and abband their letter bigrams aa ab and bb A cell 119888119894119895 is 1 if word 119894has bigram 119895 and zero otherwise

Complexity 5

C =aa ab bb

aaaaababbabb

( 11000111

0011 ) (2)

The third and fourth words are homographs they shareexactly the same bigrams We next define a target matrix Sthat defines for each of the four words the correspondinglexome 120582 This is done by setting one bit to 1 and all otherbits to zero in the row vectors of this matrix

S =1205821 1205822 1205823 1205824

aaaaababbabb

( 10000100

00100001 ) (3)

The problem that arises at this point is that the word formsjointly set up a space with three dimensions whereas thetargeted lexomes set up a four-dimensional space This isproblematic for the following reason A linear mapping of alower-dimensional spaceA onto a higher-dimensional spaceB will result in a subspace of B with a dimensionality thatcannot be greater than that of A [62] If A is a space in R2

and B is a space in R3 a linear mapping of A onto B willresult in a plane inB All the points inB that are not on thisplane cannot be reached fromA with the linear mapping

As a consequence it is impossible for ndl to perfectlydiscriminate between the four lexomes (which set up a four-dimensional space) given their bigram cues (which jointlydefine a three-dimensional space) For the input bb thenetwork therefore splits its support equally over the lexomes1205823 and 1205824 The transformation matrix F is

F = C1015840S = 1205821 1205822 1205823 1205824aaabbb

( 1minus1101minus1

00050005 ) (4)

and the matrix with predicted vectors S is

S = CF =1205821 1205822 1205823 1205824

aaaaababbabb

( 10000100

000505000505 ) (5)

When a cuematrixC and a target matrix S are set up for largenumbers of words the dimension of the ndl cue space willbe substantially smaller than the space of S the dimensionof which is equal to the number of lexomes Thus in the set-up of naive discriminative learning we do take into accountthat words tend to have similar forms but we do not take into

account that words are also similar in meaning By settingup S as completely orthogonal we are therefore making itunnecessarily hard to relate form to meaning

This study therefore implements discrimination learningwith semantic vectors of reals replacing the one-bit-on rowvectors of the target matrix S By doing so we properlyreduce the dimensionality of the target space which in turnmakes discriminative learningmore accurateWewill refer tothis new implementation of discrimination learning as lineardiscriminative learning (ldl) instead of naive discriminativelearning as the outcomes are no longer assumed to beindependent and the networks aremathematically equivalentto linear mappings onto continuous target matrices

25 Linear Transformations and Proportional Analogy Acentral concept in word and paradigm morphology is thatthe form of a regular inflected form stands in a relationof proportional analogy to other words in the inflectionalsystem As explained by Matthews ([25] p 192f)

In effect we are predicting the inflections of servusby analogy with those of dominus As GenitiveSingular domini is to Nominative Singular domi-nus so x (unknown) must be to NominativeSingular servus What then is x Answer it mustbe servi In notation dominus domini = servusservi ([25] p 192f)

Here form variation is associated with lsquomorphosyntacticfeaturesrsquo Such features are often naively assumed to functionas proxies for a notion of lsquogrammatical meaningrsquo Howeverin word and paradigm morphology these features actu-ally represent something more like distribution classes Forexample the accusative singular forms of nouns belongingto different declensions will typically differ in form and beidentified as the lsquosamersquo case in different declensions by virtueof distributional parallels The similarity in meaning thenfollows from the distributional hypothesis which proposesthat linguistic items with similar distributions have similarmeanings [31 32] Thus the analogy of forms

dominus domini = servus servi (6)

is paralleled by an analogy of distributions 119889119889 (dominus) 119889 (domini) = 119889 (servus) 119889 (servi) (7)

Borrowing notational conventions of matrix algebra we canwrite

(dominusdominiservusservi

) sim (119889 (dominus)119889 (domini)119889 (servus)119889 (servi) ) (8)

In this study we operationalize the proportional analogy ofword and paradigmmorphology by replacing the word formsin (8) by vectors of features that are present or absent in aword and by replacing wordsrsquo distributions by semantic vec-tors Linear mappings between form and meaning formalize

6 Complexity

two distinct proportional analogies one analogy for goingfrom form to meaning and a second analogy for going frommeaning to form Consider for instance a semantic matrix Sand a form matrix C with rows representing words

S = (1198861 11988621198871 1198872)C = (1199011 11990121199021 1199022) (9)

The transformation matrix G that maps S onto C111988611198872 minus 11988621198871 ( 1198872 minus1198862minus1198871 1198861 )⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟Sminus1

(1199011 11990121199022 1199022)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟C= 111988611198872 minus 11988621198871 [( 11988721199011 11988721199012minus11988711199011 minus11988711199012) + (minus11988621199021 minus1198862119902211988611199021 11988611199022 )]⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

G

(10)

can be written as the sum of two matrices (both of whichare scaled by 11988611198872 minus 11988621198871) The first matrix takes the firstform vector of C and weights it by the elements of thesecond semantic vector The second matrix takes the secondform vector of C and weights this by the elements of thefirst semantic vector of S Thus with this linear mapping apredicted form vector is a semantically weighted mixture ofthe form vectors of C

The remainder of this paper is structured as follows Wefirst introduce the semantic vector space S that we will beusing which we derived from the tasa corpus [63 64] Nextwe show that we canmodel word comprehension bymeans ofa matrix (or network) F that transforms the cue row vectorsof C into the semantic row vectors of S ie CF = S We thenshow that we can model the production of word forms giventheir semantics by a transformation matrix G ie SG = CwhereC is a matrix specifying for each word which triphonesit contains As the network is informative only about whichtriphones are at issue for a word but remains silent abouttheir order an algorithm building on graph theory is usedto properly order the triphones All models are evaluated ontheir accuracy and are also tested against experimental data

In the general discussion we will reflect on the conse-quences for linguistic theory of our finding that it is indeedpossible to model the lexicon and lexical processing usingsystemic discriminative learning without requiring lsquohiddenunitsrsquo such as morphemes and stems

3 A Semantic Vector Space Derived fromthe TASA Corpus

The first part of this section describes how we constructedsemantic vectors Specific to our approach is that we con-structed semantic vectors not only for content words butalso for inflectional and derivational functions The semanticvectors for these functions are of especial importance forthe speech production component of the model The second

part of this section addresses the validation of the semanticvectors for which we used paired associate learning scoressemantic relatedness ratings and semantic plausibility andsemantic transparency ratings for derived words

31 Constructing the Semantic Vector Space The semanticvector space that we use in this study is derived from thetasa corpus [63 64] We worked with 752130 sentencesfrom this corpus to a total of 10719386 word tokens Weused the treetagger software [65] to obtain for eachword token its stem and a part of speech tag Compoundswritten with a space such as apple pie and jigsaw puzzlewere when listed in the celex lexical database [66] joinedinto one onomasiological unit Information about wordsrsquomorphological structure was retrieved from celex

In computational morphology several options have beenexplored for representing themeaning of affixesMitchell andLapata [67] proposed to model the meaning of an affix as avector that when added to the vector b of a base word givesthe vector d of the derived word They estimated this vectorby calculating d119894 minus b119894 for all available pairs 119894 of base andderived words and taking the average of the resulting vectorsLazaridou et al [68] andMarelli and Baroni [60]modeled thesemantics of affixes by means of matrix operations that takeb as input and produce d as output ie

d = Mb (11)

whereas Cotterell et al [69] constructed semantic vectorsas latent variables in a Gaussian graphical model Whatthese approaches have in common is that they derive orimpute semantic vectors given the semantic vectors of wordsproduced by algorithms such as word2vec algorithmswhichwork with (stemmed) words as input units

From the perspective of discriminative linguistics how-ever it does not make sense to derive one meaning fromanother Furthermore rather than letting writing conven-tions dictate what units are accepted as input to onersquosfavorite tool for constructing semantic vectors includingnot and again but excluding the prefixes un- in- and re-we included not only lexomes for content words but alsolexomes for prefixes and suffixes as units for constructinga semantic vector space As a result semantic vectors forprefixes and suffixes are obtained straightforwardly togetherwith semantic vectors for content words

To illustrate this procedure consider the sentence theboysrsquo happiness was great to observe Standard methods willapply stemming and remove stop words resulting in thelexomes boy happiness great and observe being theinput to the algorithm constructing semantic vectors fromlexical co-occurrences In our approach by contrast thelexomes considered are the boy happiness be greatto and observe and in addition plural ness past Stopwords are retained (although including function words mayseem counterproductive from the perspective of semanticvectors in natural language processing applications they areretained in the present approach for two reasons First sincein our model semantic vectors drive speech production inorder to model the production of function words semantic

Complexity 7

vectors for function words are required Second althoughthe semantic vectors of function words typically are lessinformative (they typically have small association strengthsto very large numbers of words) they still show structureas illustrated in Appendix C for pronouns and prepositions)inflectional endings are replaced by the inflectional functionsthey subserve and for derived words the semantic functionof the derivational suffix is identified as well In what followswe outline in more detail what considerations motivate thisway of constructing the input for our algorithm constructingsemantic vectors We note here that our method for handlingmorphologically complex words can be combined with anyof the currently available algorithms for building semanticvectors We also note here that we are not concerned withthe question of how lexomes for plurality or tense might beinduced from word forms from world knowledge or anycombination of the two All we assume is first that anyoneunderstanding the sentence the boysrsquo happiness was great toobserve knows that more than one boy is involved and thatthe narrator situates the event in the past Second we takethis understanding to drive learning

We distinguished the following seven inflectional func-tions comparative and superlative for adjectives sin-gular and plural for nouns and past perfective con-tinuous persistence (denotes the persistent relevance ofthe predicate in sentences such as London is a big city) andperson3 (third person singular) for verbs

The semantics of derived words tend to be characterizedby idiosyncracies [70] that can be accompanied by formalidiosyncracies (eg business and resound) but often areformally unremarkable (eg heady with meanings as diverseas intoxicating exhilerating impetuous and prudent) Wetherefore paired derived words with their own content lex-omes as well as with a lexome for the derivational functionexpressed in the derivedwordWe implemented the followingderivational lexomes ordinal for ordinal numbers notfor negative un and in undo for reversative un other fornonnegative in and its allomorphs ion for ation ution itionand ion ee for ee agent for agents with er instrumentfor instruments with er impagent for impersonal agentswith er causer for words with er expressing causers (thedifferentiation of different semantic functions for er wasbased on manual annotation of the list of forms with er theassignment of a given form to a category was not informedby sentence context and hence can be incorrect) again forre and ness ity ism ist ic able ive ous ize enceful ish under sub self over out mis and dis for thecorresponding prefixes and suffixes This set-up of lexomes isinformed by the literature on the semantics of English wordformation [71ndash73] and targets affixal semantics irrespective ofaffixal forms (following [74])

This way of setting up the lexomes for the semantic vectorspace model illustrates an important aspect of the presentapproach namely that lexomes target what is understoodand not particular linguistic forms Lexomes are not unitsof form For example in the study of Geeraert et al [75]the forms die pass away and kick the bucket are all linkedto the same lexome die In the present study we have notattempted to identify idioms and likewise no attempt was

made to disambiguate homographs These are targets forfurther research

In the present study we did implement the classicaldistinction between inflection and word formation by rep-resenting inflected words with a content lexome for theirstem but derived words with a content lexome for the derivedword itself In this way the sentence scientists believe thatexposure to shortwave ultraviolet rays can cause blindnessis associated with the following lexomes scientist plbelieve persistence that exposure sg to shortwaveultraviolet ray can present cause blindness andness In our model sentences constitute the learning eventsie for each sentence we train the model to predict thelexomes in that sentence from the very same lexomes inthat sentence Sentences or for spoken corpora utterancesare more ecologically valid than windows of say two wordspreceeding and two words following a target word mdash theinterpretation of a word may depend on the company thatit keeps at long distances in a sentence We also did notremove any function words contrary to standard practicein distributional semantics (see Appendix C for a heatmapillustrating the kind of clustering observable for pronounsand prepositions) The inclusion of semantic vectors for bothaffixes and function words is necessitated by the generalframework of our model which addresses not only com-prehension but also production and which in the case ofcomprehension needs to move beyond lexical identificationas inflectional functions play an important role during theintegration of words in sentence and discourse

Only words with a frequency exceeding 8 occurrences inthe tasa corpus were assigned lexomes This threshold wasimposed in order to avoid excessive data sparseness whenconstructing the distributional vector space The number ofdifferent lexomes that met the criterion for inclusion was23562

We constructed a semantic vector space by training anndl network on the tasa corpus using the ndl2 packagefor R [76] Weights on lexome-to-lexome connections wererecalibrated sentence by sentence in the order in which theyappear in the tasa corpus using the learning rule of ndl(ie a simplified version of the learning rule of Rescorlaand Wagner [43] that has only two free parameters themaximum amount of learning 120582 (set to 1) and a learning rate120588 (set to 0001) This resulted in a 23562 times 23562 matrixSentence-based training keeps the carbon footprint of themodel down as the number of learning events is restrictedto the number of utterances (approximately 750000) ratherthan the number of word tokens (approximately 10000000)The row vectors of the resulting matrix henceforth S are thesemantic vectors that we will use in the remainder of thisstudy (work is in progress to further enhance the S matrixby including word sense disambiguation and named entityrecognition when setting up lexomes The resulting lexomicversion of the TASA corpus and scripts (in python as wellas in R) for deriving the S matrix will be made available athttpwwwsfsuni-tuebingendesimhbaayen)

The weights on themain diagonal of thematrix tend to behigh unsurprisingly as in each sentence on which the modelis trained each of the words in that sentence is an excellent

8 Complexity

predictor of that sameword occurring in that sentenceWhenthe focus is on semantic similarity it is useful to set the maindiagonal of this matrix to zero However for more generalassociation strengths between words the diagonal elementsare informative and should be retained

Unlike standard models of distributional semantics wedid not carry out any dimension reduction using forinstance singular value decomposition Because we do notwork with latent variables each dimension of our semanticspace is given by a column vector of S and hence eachdimension is linked to a specific lexomeThus the rowvectorsof S specify for each of the lexomes how well this lexomediscriminates between all the lexomes known to the model

Although semantic vectors of length 23562 can providegood results vectors of this length can be challenging forstatistical evaluation Fortunately it turns out that many ofthe column vectors of S are characterized by extremely smallvariance Such columns can be removed from the matrixwithout loss of accuracy In practice we have found it sufficestoworkwith approximately 4 to 5 thousand columns selectedcalculating column variances and using only those columnswith a variance exceeding a threshold value

32 Validation of the Semantic Vector Space As shown byBaayen et al [59] and Milin et al [58 77] measures basedon matrices such as S are predictive for behavioral measuressuch as reaction times in the visual lexical decision taskas well as for self-paced reading latencies In what followswe first validate the semantic vectors of S on two data setsone data set with accuracies in paired associate learningand one dataset with semantic similarity ratings We thenconsidered specifically the validity of semantic vectors forinflectional and derivational functions by focusing first onthe correlational structure of the pertinent semantic vectorsfollowed by an examination of the predictivity of the semanticvectors for semantic plausibility and semantic transparencyratings for derived words

321 PairedAssociate Learning Thepaired associate learning(PAL) task is a widely used test in psychology for evaluatinglearning and memory Participants are given a list of wordpairs tomemorize Subsequently at testing they are given thefirst word and have to recall the second wordThe proportionof correctly recalled words is the accuracy measure on thePAL task Accuracy on the PAL test decreases with age whichhas been attributed to cognitive decline over the lifetimeHowever Ramscar et al [78] and Ramscar et al [79] provideevidence that the test actually measures the accumulationof lexical knowledge In what follows we use the data onPAL performance reported by desRosiers and Ivison [80]We fitted a linear mixed model to accuracy in the PAL taskas a function of the Pearson correlation 119903 of paired wordsrsquosemantic vectors in S (but with weights on the main diagonalincluded and using the 4275 columns with highest variance)with random intercepts for word pairs sex and age as controlvariables and crucially an interaction of 119903 by age Given thefindings of Ramscar and colleagues we expect to find that theslope of 119903 (which is always negative) as a predictor of PAL

00 01 02 03 04 05r

010

2030

4050

simila

rity

ratin

g

Figure 1 Similarity rating in the MEN dataset as a function of thecorrelation 119903 between the row vectors in S of the words in the testpairs

accuracy increases with age (indicating decreasing accuracy)Table 1 shows that this prediction is born out For age group20ndash29 (the reference level of age) the slope for 119903 is estimatedat 031 For the next age level this slope is adjusted upward by012 and as age increases these upward adjustments likewiseincrease to 021 024 and 032 for age groups 40ndash49 50-59 and 60-69 respectively Older participants know theirlanguage better and hence are more sensitive to the semanticsimilarity (or lack thereof) of thewords thatmake up PAL testpairs For the purposes of the present study the solid supportfor 119903 as a predictor for PAL accuracy contributes to validatingthe row vectors of S as semantic vectors

322 Semantic Relatedness Ratings We also examined per-formance of S on the MEN test collection [81] that providesfor 3000 word pairs crowd sourced ratings of semanticsimilarity For 2267 word pairs semantic vectors are availablein S for both words Figure 1 shows that there is a nonlinearrelation between the correlation of wordsrsquo semantic vectorsin S and the ratings The plot shows as well that for lowcorrelations the variability in the MEN ratings is larger Wefitted a Gaussian location scale additive model to this dataset summarized in Table 2 which supported 119903 as a predictorfor both mean MEN rating and the variability in the MENratings

To put this performance in perspective wecollected the latent semantic analysis (LSA) similarityscores for the MEN word pairs using the website athttplsacoloradoedu The Spearman correlationfor LSA scores andMEN ratings was 0697 and that for 119903 was0704 (both 119901 lt 00001) Thus our semantic vectors performon a par with those of LSA a well-established older techniquethat still enjoys wide use in psycholinguistics Undoubtedlyoptimized techniques from computational linguistics such asword2vec will outperform our model The important pointhere is that even with training on full sentences rather thanusing small windows and even when including function

Complexity 9

Table 1 Linear mixed model fit to paired associate learning scores with the correlation 119903 of the row vectors in S of the paired words aspredictor Treatment coding was used for factors For the youngest age group 119903 is not predictive but all other age groups show increasinglylarge slopes compared to the slope of the youngest age group The range of 119903 is [minus488 minus253] hence larger values for the coefficients of 119903 andits interactions imply worse performance on the PAL task

A parametric coefficients Estimate Std Error t-value p-valueintercept 36220 06822 53096 lt 00001119903 03058 01672 18293 00691age=39 02297 01869 12292 02207age=49 04665 01869 24964 00135age=59 06078 01869 32528 00014age=69 08029 01869 42970 lt 00001sex=male -01074 00230 -46638 lt 00001119903age=39 01167 00458 25490 00117119903age=49 02090 00458 45640 lt 00001119903age=59 02463 00458 53787 lt 00001119903age=69 03239 00458 70735 lt 00001B smooth terms edf Refdf F-value p-valuerandom intercepts word pair 178607 180000 1282283 lt 00001

Table 2 Summary of a Gaussian location scale additive model fitted to the similarity ratings in the MEN dataset with as predictor thecorrelation 119903 of wordrsquos semantic vectors in the Smatrix TPRS thin plate regression spline b theminimum standard deviation for the logb linkfunction Location parameter estimating the mean scale parameter estimating the variance through the logb link function (120578 = log(120590 minus 119887))A parametric coefficients Estimate Std Error t-value p-valueIntercept [location] 252990 01799 1406127 lt 00001Intercept [scale] 21297 00149 1430299 lt 00001B smooth terms edf Refdf F-value p-valueTPRS 119903 [location] 83749 88869 27662399 lt 00001TPRS 119903 [scale] 59333 70476 875384 lt 00001

words performance of our linear network with linguisticallymotivated lexomes is sufficiently high to serve as a basis forfurther analysis

323 Correlational Structure of Inflectional and DerivationalVectors Now that we have established that the semantic vec-tor space S obtainedwith perhaps the simplest possible error-driven learning algorithm discriminating between outcomesgiven multiple cues (1) indeed captures semantic similaritywe next consider how the semantic vectors of inflectional andderivational lexomes cluster in this space Figure 2 presents aheatmap for the correlation matrix of the functional lexomesthat we implemented as described in Section 31

Nominal and verbal inflectional lexomes as well as adver-bial ly cluster in the lower left corner and tend to be eithernot correlated with derivational lexomes (indicated by lightyellow) or to be negatively correlated (more reddish colors)Some inflectional lexomes however are interspersed withderivational lexomes For instance comparative superla-tive and perfective form a small subcluster togetherwithin the group of derivational lexomes

The derivational lexomes show for a majority of pairssmall positive correlationsand stronger correlations in thecase of the verb-forming derivational lexomes mis over

under and undo which among themselves show thevstrongest positive correlations of allThe inflectional lexomescreating abstract nouns ion ence ity and ness also form asubcluster In other words Figure 2 shows that there is somestructure to the distribution of inflectional and derivationallexomes in the semantic vector space

Derived words (but not inflected words) have their owncontent lexomes and hence their own row vectors in S Do thesemantic vectors of these words cluster according to the theirformatives To address this question we extracted the rowvectors from S for a total of 3500 derivedwords for 31 differentformatives resulting in a 3500 times 23562 matrix 119863 sliced outof S To this matrix we added the semantic vectors for the 31formatives We then used linear discriminant analysis (LDA)to predict a lexomersquos formative from its semantic vector usingthe lda function from the MASS package [82] (for LDA towork we had to remove columns from D with very smallvariance as a consequence the dimension of the matrix thatwas the input to LDA was 3531 times 814) LDA accuracy for thisclassification task with 31 possible outcomes was 072 All 31derivational lexomes were correctly classified

When the formatives are randomly permuted thus break-ing the relation between the semantic vectors and theirformatives LDA accuracy was on average 0528 (range across

10 Complexity

SGPR

ESEN

TPE

RSIS

TEN

CE PLCO

NTI

NU

OU

SPA

ST LYG

ENPE

RSO

N3

PASS

IVE

CAN

FUTU

REU

ND

OU

ND

ERO

VER MIS

ISH

SELF IZ

EIN

STRU

MEN

TD

IS IST

OU

GH

T EE ISM

SUB

COM

PARA

TIV

ESU

PERL

ATIV

EPE

RFEC

TIV

EPE

RSO

N1

ION

ENCE IT

YN

ESS Y

SHA

LLO

RDIN

AL

OU

TAG

AIN

FUL

LESS

ABL

EAG

ENT

MEN

T ICO

US

NO

TIV

E

SGPRESENTPERSISTENCEPLCONTINUOUSPASTLYGENPERSON3PASSIVECANFUTUREUNDOUNDEROVERMISISHSELFIZEINSTRUMENTDISISTOUGHTEEISMSUBCOMPARATIVESUPERLATIVEPERFECTIVEPERSON1IONENCEITYNESSYSHALLORDINALOUTAGAINFULLESSABLEAGENTMENTICOUSNOTIVE

minus02 0 01 02Value

020

060

010

00Color Key

and Histogram

Cou

nt

Figure 2 Heatmap for the correlation matrix of the row vectors in S of derivational and inflectional function lexomes (individual words willbecome legible by zooming in on the figure at maximal magnification)

10 permutations 0519ndash0537) From this we conclude thatderived words show more clustering in the semantic space ofS than can be expected under randomness

How are the semantic vectors of derivational lexomespositioned with respect to the clusters of their derived wordsTo address this question we constructed heatmaps for thecorrelation matrices of the pertinent semantic vectors Anexample of such a heatmap is presented for ness in Figure 3Apart from the existence of clusters within the cluster of nesscontent lexomes it is striking that the ness lexome itself isfound at the very left edge of the dendrogram and at the very

left column and bottom row of the heatmapThe color codingindicates that surprisingly the ness derivational lexome isnegatively correlated with almost all content lexomes thathave ness as formative Thus the semantic vector of nessis not a prototype at the center of its cloud of exemplarsbut an antiprototype This vector is close to the cloud ofsemantic vectors but it is outside its periphery This patternis not specific to ness but is found for the other derivationallexomes as well It is intrinsic to our model

The reason for this is straightforward During learningalthough for a derived wordrsquos lexome 119894 and its derivational

Complexity 11

NES

Ssle

eple

ssne

ssas

sert

iven

ess

dire

ctne

ssso

undn

ess

bign

ess

perm

issiv

enes

sap

prop

riate

ness

prom

ptne

sssh

rew

dnes

sw

eakn

ess

dark

ness

wild

erne

ssbl

ackn

ess

good

ness

kind

ness

lone

lines

sfo

rgiv

enes

ssti

llnes

sco

nsci

ousn

ess

happ

ines

ssic

knes

saw

aren

ess

blin

dnes

sbr

ight

ness

wild

ness

mad

ness

grea

tnes

sso

ftnes

sth

ickn

ess

tend

erne

ssbi

ttern

ess

effec

tiven

ess

hard

ness

empt

ines

sw

illin

gnes

sfo

olish

ness

eage

rnes

sne

atne

sspo

liten

ess

fresh

ness

high

ness

nerv

ousn

ess

wet

ness

unea

sines

slo

udne

ssco

olne

ssid

lene

ssdr

ynes

sbl

uene

ssda

mpn

ess

fond

ness

liken

ess

clean

lines

ssa

dnes

ssw

eetn

ess

read

ines

scle

vern

ess

noth

ingn

ess

cold

ness

unha

ppin

ess

serio

usne

ssal

ertn

ess

light

ness

uglin

ess

flatn

ess

aggr

essiv

enes

sse

lfminusco

nsci

ousn

ess

ruth

lessn

ess

quie

tnes

sva

guen

ess

shor

tnes

sbu

sines

sho

lines

sop

enne

ssfit

ness

chee

rful

ness

earn

estn

ess

right

eous

ness

unpl

easa

ntne

ssqu

ickn

ess

corr

ectn

ess

disti

nctiv

enes

sun

cons

ciou

snes

sfie

rcen

ess

orde

rline

ssw

hite

ness

talln

ess

heav

ines

sw

icke

dnes

sto

ughn

ess

lawle

ssne

ssla

zine

ssnu

mbn

ess

sore

ness

toge

ther

ness

resp

onsiv

enes

scle

arne

ssne

arne

ssde

afne

ssill

ness

dizz

ines

sse

lfish

ness

fatn

ess

rest

lessn

ess

shyn

ess

richn

ess

thor

ough

ness

care

less

ness

tired

ness

fairn

ess

wea

rines

stig

htne

ssge

ntle

ness

uniq

uene

ssfr

iend

lines

sva

stnes

sco

mpl

eten

ess

than

kful

ness

slow

ness

drun

kenn

ess

wre

tche

dnes

sfr

ankn

ess

exac

tnes

she

lples

snes

sat

trac

tiven

ess

fulln

ess

roug

hnes

sm

eann

ess

hope

lessn

ess

drow

sines

ssti

ffnes

sclo

sene

ssgl

adne

ssus

eful

ness

firm

ness

rude

ness

bold

ness

shar

pnes

sdu

llnes

sw

eigh

tless

ness

stra

ngen

ess

smoo

thne

ss

NESSsleeplessnessassertivenessdirectnesssoundnessbignesspermissivenessappropriatenesspromptnessshrewdnessweaknessdarknesswildernessblacknessgoodnesskindnesslonelinessforgivenessstillnessconsciousnesshappinesssicknessawarenessblindnessbrightnesswildnessmadnessgreatnesssoftnessthicknesstendernessbitternesseffectivenesshardnessemptinesswillingnessfoolishnesseagernessneatnesspolitenessfreshnesshighnessnervousnesswetnessuneasinessloudnesscoolnessidlenessdrynessbluenessdampnessfondnesslikenesscleanlinesssadnesssweetnessreadinessclevernessnothingnesscoldnessunhappinessseriousnessalertnesslightnessuglinessflatnessaggressivenessselfminusconsciousnessruthlessnessquietnessvaguenessshortnessbusinessholinessopennessfitnesscheerfulnessearnestnessrighteousnessunpleasantnessquicknesscorrectnessdistinctivenessunconsciousnessfiercenessorderlinesswhitenesstallnessheavinesswickednesstoughnesslawlessnesslazinessnumbnesssorenesstogethernessresponsivenessclearnessnearnessdeafnessillnessdizzinessselfishnessfatnessrestlessnessshynessrichnessthoroughnesscarelessnesstirednessfairnesswearinesstightnessgentlenessuniquenessfriendlinessvastnesscompletenessthankfulnessslownessdrunkennesswretchednessfranknessexactnesshelplessnessattractivenessfullnessroughnessmeannesshopelessnessdrowsinessstiffnessclosenessgladnessusefulnessfirmnessrudenessboldnesssharpnessdullnessweightlessnessstrangenesssmoothness

Color Keyand Histogram

0 05minus05

Value

020

0040

0060

0080

00C

ount

Figure 3 Heatmap for the correlation matrix of lexomes for content words with ness as well as the derivational lexome of ness itself Thislexome is found at the very left edge of the dendrograms and is negatively correlated with almost all content lexomes (Individual words willbecome legible by zooming in on the figure at maximal magnification)

lexome ness are co-present cues the derivational lexomeoccurs in many other words 119895 and each time anotherword 119895 is encountered weights are reduced from ness to119894 As this happens for all content lexomes the derivationallexome is during learning slowly but steadily discriminatedaway from its content lexomes We shall see that this is animportant property for our model to capture morphologicalproductivity for comprehension and speech production

When the additive model of Mitchell and Lapata [67]is used to construct a semantic vector for ness ie when

the average vector is computed for the vectors obtained bysubtracting the vector of the derived word from that of thebase word the result is a vector that is embedded insidethe cluster of derived vectors and hence inherits semanticidiosyncracies from all these derived words

324 Semantic Plausibility and Transparency Ratings forDerived Words In order to obtain further evidence for thevalidity of inflectional and derivational lexomes we re-analyzed the semantic plausibility judgements for word pairs

12 Complexity

consisting of a base and a novel derived form (eg accentaccentable) reported byMarelli and Baroni [60] and availableat httpcliccimecunitnitcomposesFRACSSThe number of pairs available to us for analysis 236 wasrestricted compared to the original 2559 pairs due to theconstraints that the base words had to occur in the tasacorpus Furthermore we also explored the role of wordsrsquoemotional valence arousal and dominance as availablein Warriner et al [83] which restricted the number ofitems even further The reason for doing so is that humanjudgements such as those of age of acquisition may reflectdimensions of emotion [59]

A measure derived from the semantic vectors of thederivational lexomes activation diversity and the L1-normof the semantic vector (the sum of the absolute values of thevector elements ie its city-block distance) turned out tobe predictive for these plausibility ratings as documented inTable 3 and the left panel of Figure 4 Activation diversity is ameasure of lexicality [58] In an auditory word identificationtask for instance speech that gives rise to a low activationdiversity elicits fast rejections whereas speech that generateshigh activation diversity elicits higher acceptance rates but atthe cost of longer response times [18]

Activation diversity interacted with word length Agreater word length had a strong positive effect on ratedplausibility but this effect progressively weakened as acti-vation diversity increases In turn activation diversity hada positive effect on plausibility for shorter words and anegative effect for longer words Apparently the evidencefor lexicality that comes with higher L1-norms contributespositively when there is little evidence coming from wordlength As word length increases and the relative contributionof the formative in theword decreases the greater uncertaintythat comes with higher lexicality (a greater L1-norm impliesmore strong links to many other lexomes) has a detrimentaleffect on the ratings Higher arousal scores also contributedto higher perceived plausibility Valence and dominance werenot predictive and were therefore left out of the specificationof themodel reported here (word frequency was not includedas predictor because all derived words are neologisms withzero frequency addition of base frequency did not improvemodel fit 119901 gt 091)

The degree to which a complex word is semanticallytransparent with respect to its base word is of both practi-cal and theoretical interest An evaluation of the semantictransparency of complex words using transformation matri-ces is developed in Marelli and Baroni [60] Within thepresent framework semantic transparency can be examinedstraightforwardly by comparing the correlations of (i) thesemantic vectors of the base word to which the semanticvector of the affix is added with (ii) the semantic vector ofthe derived word itself The more distant the two vectors arethe more negative their correlation should be This is exactlywhat we find For ness for instance the six most negativecorrelations are business (119903 = minus066) effectiveness (119903 = minus051)awareness (119903 = minus050) loneliness (119903 = minus045) sickness (119903 =minus044) and consciousness (119903 = minus043) Although ness canhave an anaphoric function in discourse [84] words such asbusiness and consciousness have a much deeper and richer

semantics than just reference to a previously mentioned stateof affairs A simple comparison of the wordrsquos actual semanticvector in S (derived from the TASA corpus) and its semanticvector obtained by accumulating evidence over base and affix(ie summing the vectors of base and affix) brings this outstraightforwardly

However evaluating correlations for transparency isprone to researcher bias We therefore also investigated towhat extent the semantic transparency ratings collected byLazaridou et al [68] can be predicted from the activationdiversity of the semantic vector of the derivational lexomeThe original dataset of Lazaridou and colleagues comprises900 words of which 330 meet the criterion of occurringmore than 8 times in the tasa corpus and for which wehave semantic vectors available The summary of a Gaussianlocation-scale additivemodel is given in Table 4 and the rightpanel of Figure 4 visualizes the interaction of word lengthby activation diversity Although modulated by word lengthoverall transparency ratings decrease as activation diversityis increased Apparently the stronger the connections aderivational lexome has with other lexomes the less clearit becomes what its actual semantic contribution to a novelderived word is In other words under semantic uncertaintytransparency ratings decrease

4 Comprehension

Now that we have established that the present semanticvectors make sense even though they are based on a smallcorpus we next consider a comprehension model that hasform vectors as input and semantic vectors as output (a pack-age for R implementing the comprehension and productionalgorithms of linear discriminative learning is available athttpwwwsfsuni-tuebingendesimhbaayenpublicationsWpmWithLdl 10targz The package isdescribed in Baayen et al [85]) We begin with introducingthe central concepts underlying mappings from form tomeaning We then discuss visual comprehension and thenturn to auditory comprehension

41 Setting Up the Mapping Let C denote the cue matrix amatrix that specifies for each word (rows) the form cues ofthat word (columns) For a toy lexicon with the words onetwo and three the Cmatrix is

C

= wV wVn Vn tu tu Tr Tri rionetwothree

( 100100

100010

010001

001001 ) (12)

where we use the disc keyboard phonetic alphabet (theldquoDIstinct SingleCharacterrdquo representationwith one characterfor each phoneme was introduced by CELEX [86]) fortriphones the symbol denotes the word boundary Supposethat the semantic vectors for these words are the row vectors

Complexity 13

26 27

28

29

29

3 3

31

31

32

32

33

33

34

34

35

35

3

3

35

4

45 36

38

40

42

44

Activ

atio

n D

iver

sity

of th

e Der

ivat

iona

l Lex

ome

8 10 12 14 16 186Word Length

36

38

40

42

44

Activ

atio

n D

iver

sity

of th

e Der

ivat

iona

l Lex

ome

6 8 10 12 144Word Length

Figure 4 Interaction of word length by activation diversity in GAMs fitted to plausibility ratings for complex words (left) and to semantictransparency ratings (right)

Table 3 GAM fitted to plausibility ratings for derivational neologisms with the derivational lexomes able again agent ist less and notdata from Marelli and Baroni [60]

A parametric coefficients Estimate Std Error t-value p-valueIntercept -18072 27560 -06557 05127Word Length 05901 02113 27931 00057Activation Diversity 11221 06747 16632 00976Arousal 03847 01521 25295 00121Word Length times Activation Diversity -01318 00500 -26369 00089B smooth terms edf Refdf F-value p-valueRandom Intercepts Affix 33610 40000 556723 lt 00001

of the following matrix S

S = one two threeonetwothree

( 100201031001

040110 ) (13)

We are interested in a transformation matrix 119865 such that

CF = S (14)

The transformation matrix is straightforward to obtain LetC1015840 denote the Moore-Penrose generalized inverse of Cavailable in R as the ginv function in theMASS package [82]Then

F = C1015840S (15)

For the present example

F =one two three

wVwVnVntutuTrTriri

((((((((((((

033033033010010003003003

010010010050050003003003

013013013005005033033033

))))))))))))

(16)

and for this simple example CF is exactly equal to SIn the remainder of this section we investigate how well

this very simple end-to-end model performs for visual wordrecognition as well as auditory word recognition For visualword recognition we use the semantic matrix S developed inSection 3 but we consider two different cue matrices C oneusing letter trigrams (following [58]) and one using phonetrigramsA comparison of the performance of the twomodels

14 Complexity

Table 4 Gaussian location-scale additive model fitted to the semantic transparency ratings for derived words te tensor product smooth ands thin plate regression spline

A parametric coefficients Estimate Std Error t-value p-valueintercept [location] 55016 02564 214537 lt 00001intercept [scale] -04135 00401 -103060 lt 00001B smooth terms edf Refdf F-value p-valuete(activation diversity word length) [location] 53879 62897 237798 00008s(derivational lexome) [location] 143180 150000 3806254 lt 00001s(activation diversity) [scale] 21282 25993 284336 lt 00001s(word length) [scale] 17722 21403 1327311 lt 00001

sheds light on the role of wordsrsquo ldquosound imagerdquo on wordrecognition in reading (cf [87ndash89] for phonological effectsin visual word Recognition) For auditory word recognitionwe make use of the acoustic features developed in Arnold etal [18]

Although the features that we selected as cues are basedon domain knowledge they do not have an ontological statusin our framework in contrast to units such as phonemes andmorphemes in standard linguistic theories In these theoriesphonemes and morphemes are Russelian atomic units offormal phonological and syntactic calculi and considerableresearch has been directed towards showing that these unitsare psychologically real For instance speech errors involvingmorphemes have been interpreted as clear evidence for theexistence in the mind of morphemes [90] Research hasbeen directed at finding behavioral and neural correlates ofphonemes and morphemes [6 91 92] ignoring fundamentalcriticisms of both the phoneme and the morpheme astheoretical constructs [25 93 94]

The features that we use for representing aspects of formare heuristic features that have been selected or developedprimarily because they work well as discriminators We willgladly exchange the present features for other features if theseother features can be shown to afford higher discriminativeaccuracy Nevertheless the features that we use are groundedin domain knowledge For instance the letter trigrams usedfor modeling reading (see eg[58]) are motivated by thefinding in stylometry that letter trigrams are outstandingfeatures for discriminating between authorial hands [95] (inwhat follows we work with letter triplets and triphoneswhich are basically contextually enriched letter and phoneunits For languages with strong phonotactic restrictionssuch that the syllable inventory is quite small (eg Viet-namese) digraphs work appear to work better than trigraphs[96] Baayen et al [85] show that working with four-gramsmay enhance performance and current work in progress onmany other languages shows that for highly inflecting lan-guages with long words 4-grams may outperform 3-gramsFor computational simplicity we have not experimented withmixtures of units of different length nor with algorithmswithwhich such units might be determined) Letter n-grams arealso posited by Cohen and Dehaene [97] for the visual wordform system at the higher endof a hierarchy of neurons tunedto increasingly large visual features of words

Triphones the units that we use for representing thelsquoacoustic imagersquo or lsquoauditory verbal imageryrsquo of canonicalword forms have the same discriminative potential as lettertriplets but have as additional advantage that they do betterjustice compared to phonemes to phonetic contextual inter-dependencies such as plosives being differentiated primarilyby formant transitions in adjacent vowels The acousticfeatures that we use for modeling auditory comprehensionto be introduced in more detail below are motivated in partby the sensitivity of specific areas on the cochlea to differentfrequencies in the speech signal

Evaluation of model predictions proceeds by comparingthe predicted semantic vector s obtained by multiplying anobserved cue vector c with the transformation matrix F (s =cF) with the corresponding target row vector s of S A word 119894is counted as correctly recognized when s119894 is most stronglycorrelated with the target semantic vector s119894 of all targetvectors s119895 across all words 119895

Inflected words do not have their own semantic vectors inS We therefore created semantic vectors for inflected wordsby adding the semantic vectors of stem and affix and addedthese as additional row vectors to S before calculating thetransformation matrix F

We note here that there is only one transformationmatrix ie one discrimination network that covers allaffixes inflectional and derivational as well as derived andmonolexomic (simple) words This approach contrasts withthat of Marelli and Baroni [60] who pair every affix with itsown transformation matrix

42 Visual Comprehension For visual comprehension wefirst consider a model straightforwardly mapping form vec-tors onto semantic vectors We then expand the model withan indirect route first mapping form vectors onto the vectorsfor the acoustic image (derived from wordsrsquo triphone repre-sentations) and thenmapping the acoustic image vectors ontothe semantic vectors

The dataset on which we evaluated our models com-prised 3987 monolexomic English words 6595 inflectedvariants of monolexomic words and 898 derived words withmonolexomic base words to a total of 11480 words Thesecounts follow from the simultaneous constraints of (i) a wordappearing with sufficient frequency in TASA (ii) the wordbeing available in the British Lexicon Project (BLP [98]) (iii)

Complexity 15

the word being available in the CELEX database and theword being of the abovementioned morphological type Inthe present study we did not include inflected variants ofderived words nor did we include compounds

421 The Direct Route Straight from Orthography to Seman-tics To model single word reading as gauged by the visuallexical decision task we used letter trigrams as cues In ourdataset there were a total of 3465 unique trigrams resultingin a 11480 times 3465 orthographic cue matrix C The semanticvectors of the monolexomic and derived words were takenfrom the semantic weight matrix described in Section 3 Forinflected words semantic vectors were obtained by summa-tion of the semantic vectors of base words and inflectionalfunctions For the resulting semantic matrix S we retainedthe 5030 column vectors with the highest variances settingthe cutoff value for the minimal variance to 034times10minus7 FromS and C we derived the transformation matrix F which weused to obtain estimates s of the semantic vectors s in S

For 59 of the words s had the highest correlation withthe targeted semantic vector s (for the correctly predictedwords the mean correlation was 083 and the median was086 With respect to the incorrectly predicted words themean and median correlation were both 048) The accuracyobtained with naive discriminative learning using orthog-onal lexomes as outcomes instead of semantic vectors was27Thus as expected performance of linear discriminationlearning (ldl) is substantially better than that of naivediscriminative learning (ndl)

To assess whether this model is productive in the sensethat it canmake sense of novel complex words we considereda separate set of inflected and derived words which were notincluded in the original dataset For an unseen complexwordboth base word and the inflectional or derivational functionappeared in the training set but not the complex word itselfThe network F therefore was presented with a novel formvector which it mapped onto a novel vector in the semanticspace Recognition was successful if this novel vector wasmore strongly correlated with the semantic vector obtainedby summing the semantic vectors of base and inflectional orderivational function than with any other semantic vector

Of 553 unseen inflected words 43 were recognized suc-cessfully The semantic vectors predicted from their trigramsby the network F were overall well correlated in the meanwith the targeted semantic vectors of the novel inflectedwords (obtained by summing the semantic vectors of theirbase and inflectional function) 119903 = 067 119901 lt 00001The predicted semantic vectors also correlated well with thesemantic vectors of the inflectional functions (119903 = 061 119901 lt00001) For the base words the mean correlation dropped to119903 = 028 (119901 lt 00001)

For unseen derivedwords (514 in total) we also calculatedthe predicted semantic vectors from their trigram vectorsThe resulting semantic vectors s had moderate positivecorrelations with the semantic vectors of their base words(119903 = 040 119901 lt 00001) but negative correlations withthe semantic vectors of their derivational functions (119903 =minus013 119901 lt 00001) They did not correlate with the semantic

vectors obtained by summation of the semantic vectors ofbase and affix (119903 = 001 119901 = 056)

The reduced correlations for the derived words as com-pared to those for the inflected words likely reflects tosome extent that the model was trained on many moreinflected words (in all 6595) than derived words (898)whereas the number of different inflectional functions (7)was much reduced compared to the number of derivationalfunctions (24) However the negative correlations of thederivational semantic vectors with the semantic vectors oftheir derivational functions fit well with the observation inSection 323 that derivational functions are antiprototypesthat enter into negative correlations with the semantic vectorsof the corresponding derived words We suspect that theabsence of a correlation of the predicted vectors with thesemantic vectors obtained by integrating over the vectorsof base and derivational function is due to the semanticidiosyncracies that are typical for derivedwords For instanceausterity can denote harsh discipline or simplicity or apolicy of deficit cutting or harshness to the taste Anda worker is not just someone who happens to work butsomeone earning wages or a nonreproductive bee or waspor a thread in a computer program Thus even within theusages of one and the same derived word there is a lotof semantic heterogeneity that stands in stark contrast tothe straightforward and uniform interpretation of inflectedwords

422 The Indirect Route from Orthography via Phonology toSemantics Although reading starts with orthographic inputit has been shown that phonology actually plays a role duringthe reading process as well For instance developmentalstudies indicate that childrenrsquos reading development can bepredicted by their phonological abilities [87 99] Evidence ofthe influence of phonology on adultsrsquo reading has also beenreported [89 100ndash105] As pointed out by Perrone-Bertolottiet al [106] silent reading often involves an imagery speechcomponent we hear our own ldquoinner voicerdquo while readingSince written words produce a vivid auditory experiencealmost effortlessly they argue that auditory verbal imageryshould be part of any neurocognitive model of readingInterestingly in a study using intracranial EEG recordingswith four epileptic neurosurgical patients they observed thatsilent reading elicits auditory processing in the absence ofany auditory stimulation suggesting that auditory images arespontaneously evoked during reading (see also [107])

To explore the role of auditory images in silent readingwe operationalized sound images bymeans of phone trigrams(in our model phone trigrams are independently motivatedas intermediate targets of speech production see Section 5for further discussion) We therefore ran the model againwith phone trigrams instead of letter triplets The semanticmatrix S was exactly the same as above However a newtransformation matrix F was obtained for mapping a 11480times 5929 cue matrix C of phone trigrams onto S To our sur-prise the triphone model outperformed the trigram modelsubstantially Overall accuracy rose to 78 an improvementof almost 20

16 Complexity

In order to integrate this finding into our model weimplemented an additional network K that maps ortho-graphic vectors (coding the presence or absence of trigrams)onto phonological vectors (indicating the presence or absenceof triphones) The result is a dual route model for (silent)reading with a first route utilizing a network mappingstraight from orthography to semantics and a secondindirect route utilizing two networks one mapping fromtrigrams to triphones and the other subsequently mappingfrom triphones to semantics

The network K which maps trigram vectors onto tri-phone vectors provides good support for the relevant tri-phones but does not provide guidance as to their orderingUsing a graph-based algorithm detailed in the Appendix (seealso Section 52) we found that for 92 of the words thecorrect sequence of triphones jointly spanning the wordrsquossequence of letters received maximal support

We then constructed amatrix Twith the triphone vectorspredicted from the trigrams by networkK and defined a newnetwork H mapping these predicted triphone vectors ontothe semantic vectors SThemean correlation for the correctlyrecognized words was 090 and the median correlation was094 For words that were not recognized correctly the meanand median correlation were 045 and 043 respectively Wenow have for each word two estimated semantic vectors avector s1 obtained with network F of the first route and avector s2 obtained with network H from the auditory targetsthat themselves were predicted by the orthographic cuesThemean of the correlations of these pairs of vectors was 119903 = 073(119901 lt 00001)

To assess to what extent the networks that we definedare informative for actual visual word recognition we madeuse of the reaction times in the visual lexical decision taskavailable in the BLPAll 11480words in the current dataset arealso included in BLPWe derived threemeasures from each ofthe two networks

The first measure is the sum of the L1-norms of s1 ands2 to which we will refer as a wordrsquos total activation diversityThe total activation diversity is an estimate of the support fora wordrsquos lexicality as provided by the two routes The secondmeasure is the correlation of s1 and s2 to which we will referas a wordrsquos route congruency The third measure is the L1-norm of a lexomersquos column vector in S to which we referas its prior This last measure has previously been observedto be a strong predictor for lexical decision latencies [59] Awordrsquos prior is an estimate of a wordrsquos entrenchment and prioravailability

We fitted a generalized additive model to the inverse-transformed response latencies in the BLP with these threemeasures as key covariates of interest including as controlpredictors word length andword type (derived inflected andmonolexomic with derived as reference level) As shown inTable 5 inflected words were responded to more slowly thanderived words and the same holds for monolexomic wordsalbeit to a lesser extent Response latencies also increasedwith length Total activation diversity revealed a U-shapedeffect (Figure 5 left panel) For all but the lowest totalactivation diversities we find that response times increase as

total activation diversity increases This result is consistentwith previous findings for auditory word recognition [18]As expected given the results of Baayen et al [59] the priorwas a strong predictor with larger priors affording shorterresponse times There was a small effect of route congruencywhich emerged in a nonlinear interaction with the priorsuch that for large priors a greater route congruency affordedfurther reduction in response time (Figure 5 right panel)Apparently when the two routes converge on the samesemantic vector uncertainty is reduced and a faster responsecan be initiated

For comparison the dual-route model was implementedwith ndl as well Three measures were derived from thenetworks and used to predict the same response latenciesThese measures include activation activation diversity andprior all of which have been reported to be reliable predictorsfor RT in visual lexical decision [54 58 59] However sincehere we are dealing with two routes the three measures canbe independently derived from both routes We thereforesummed up the measures of the two routes obtaining threemeasures total activation total activation diversity and totalprior With word type and word length as control factorsTable 6 shows that thesemeasures participated in a three-wayinteraction presented in Figure 6 Total activation showed aU-shaped effect on RT that is increasingly attenuated as totalactivation diversity is increased (left panel) Total activationalso interacted with the total prior (right panel) For mediumrange values of total activation RTs increased with totalactivation and as expected decreased with the total priorThecenter panel shows that RTs decrease with total prior formostof the range of total activation diversity and that the effect oftotal activation diversity changes sign going from low to highvalues of the total prior

Model comparison revealed that the generalized additivemodel with ldl measures (Table 5) provides a substantiallyimproved fit to the data compared to the GAM using ndlmeasures (Table 6) with a difference of no less than 65101AIC units At the same time the GAM based on ldl is lesscomplex

Given the substantially better predictability of ldl mea-sures on human behavioral data one remaining question iswhether it is really the case that two routes are involved insilent reading After all the superior accuracy of the secondroute might be an artefact of a simple machine learningtechnique performing better for triphones than for trigramsThis question can be addressed by examining whether thefit of the GAM summarized in Table 5 improves or worsensdepending on whether the activation diversity of the first orthe second route is taken out of commission

When the GAM is provided access to just the activationdiversity of s2 the AIC increased by 10059 units Howeverwhen themodel is based only on the activation diversity of s1themodel fit increased by no less than 14790 AIC units Fromthis we conclude that at least for the visual lexical decisionlatencies in the BLP the second route first mapping trigramsto triphones and then mapping triphones onto semanticvectors plays the more important role

The superiority of the second route may in part bedue to the number of triphone features being larger

Complexity 17

minus0295

minus00175

026

part

ial e

ffect

1 2 3 4 50Total activation diversity

000

005

010

015

Part

ial e

ffect

05

10

15

20

Prio

r

02 04 06 08 1000Selfminuscorrelation

Figure 5 The partial effects of total activation diversity (left) and the interaction of route congruency and prior (right) on RT in the BritishLexicon Project

Table 5 Summary of a generalized additive model fitted to response latencies in visual lexical decision using measures based on ldl s thinplate regression spline smooth te tensor product smooth

A parametric coefficients Estimate Std Error t-value p-valueintercept -17774 00087 -205414 lt0001word typeInflected 01110 00059 18742 lt0001word typeMonomorphemic 00233 00054 4322 lt0001word length 00117 00011 10981 lt0001B smooth terms edf Refdf F p-values(total activation diversity) 6002 7222 236 lt0001te(route congruency prior) 14673 17850 2137 lt0001

02 06 10 14

NDL total activation

minus0171

0031

0233

part

ial e

ffect

minus4 minus2 0 2 4

10

15

20

25

30

35

40

NDL total prior

ND

L to

tal a

ctiv

atio

n di

vers

ity

minus0416

01915

0799

part

ial e

ffect

02 06 10 14

minus4

minus2

02

4

NDL total activation

ND

L to

tal p

rior

minus0158

01635

0485

part

ial e

ffect

10

15

20

25

30

35

40

ND

L to

tal a

ctiv

atio

n di

vers

ity

Figure 6The interaction of total activation and total activation diversity (left) of total prior by total activation diversity (center) and of totalactivation by total prior (right) on RT in the British Lexicon Project based on ndl

18 Complexity

Table 6 Summary of a generalized additive model fitted to response latencies in visual lexical decision using measures from ndl te tensorproduct smooth

A parametric coefficients Estimate Std Error t-value p-valueintercept -16934 00086 -195933 lt0001word typeInflected 00286 00052 5486 lt0001word typeMonomorphemic 00042 00055 0770 = 4word length 00066 00011 5893 lt0001B smooth terms edf Refdf F p-valuete(activation activation diversity prior) 4134 4972 1084 lt0001

than the number of trigram features (3465 trigrams ver-sus 5929 triphones) More features which mathematicallyamount to more predictors enable more precise map-pings Furthermore the heavy use made in English ofletter sequences such as ough (with 10 different pro-nunciations httpswwwdictionarycomesough)reduces semantic discriminability compared to the corre-sponding spoken forms It is noteworthy however that thebenefits of the triphone-to-semantics mapping are possibleonly thanks to the high accuracy with which orthographictrigram vectors are mapped onto phonological triphonevectors (92)

It is unlikely that the lsquophonological routersquo is always dom-inant in silent reading Especially in fast lsquodiagonalrsquo readingthe lsquodirect routersquomay bemore dominantThere is remarkablealthough for the present authors unexpected convergencewith the dual route model for reading aloud of Coltheartet al [108] and Coltheart [109] However while the text-to-phonology route of their model has as primary function toexplain why nonwords can be pronounced our results showthat both routes can actually be active when silently readingreal words A fundamental difference is however that in ourmodel wordsrsquo semantics play a central role

43 Auditory Comprehension For the modeling of readingwe made use of letter trigrams as cues These cues abstractaway from the actual visual patterns that fall on the retinapatterns that are already transformed at the retina beforebeing sent to the visual cortex Our hypothesis is that lettertrigrams represent those high-level cells or cell assemblies inthe visual system that are critical for reading andwe thereforeleave the modeling possibly with deep learning networksof how patterns on the retina are transformed into lettertrigrams for further research

As a consequence of the high level of abstraction of thetrigrams a word form is represented by a unique vectorspecifying which of a fixed set of letter trigrams is present inthe word Although one might consider modeling auditorycomprehension with phone triplets (triphones) replacing theletter trigrams of visual comprehension such an approachwould not do justice to the enormous variability of actualspeech Whereas the modeling of reading printed wordscan depart from the assumption that the pixels of a wordrsquosletters on a computer screen are in a fixed configurationindependently of where the word is shown on the screenthe speech signal of the same word type varies from token

to token as illustrated in Figure 7 for the English wordcrisis A survey of the Buckeye corpus [110] of spontaneousconversations recorded at Columbus Ohio [111] indicatesthat around 5 of the words are spoken with one syllablemissing and that a little over 20 of words have at least onephone missing

It is widely believed that the phoneme as an abstractunit of sound is essential for coming to grips with thehuge variability that characterizes the speech signal [112ndash114]However the phoneme as theoretical linguistic constructis deeply problematic [93] and for many spoken formscanonical phonemes do not do justice to the phonetics ofthe actual sounds [115] Furthermore if words are defined assequences of phones the problem arises what representationsto posit for words with two ormore reduced variants Addingentries for reduced forms to the lexicon turns out not toafford better overal recognition [116] Although exemplarmodels have been put forward to overcome this problem[117] we take a different approach here and following Arnoldet al [18] lay out a discriminative approach to auditorycomprehension

The cues that we make use of to represent the acousticsignal are the Frequency Band Summary Features (FBSFs)introduced by Arnold et al [18] as input cues FBSFs summa-rize the information present in the spectrogram of a speechsignal The algorithm that derives FBSFs first chunks theinput at the minima of the Hilbert amplitude envelope of thesignalrsquos oscillogram (see the upper panel of Figure 8) Foreach chunk the algorithm distinguishes 21 frequency bandsin the MEL scaled spectrum and intensities are discretizedinto 5 levels (lower panel in Figure 8) for small intervalsof time For each chunk and for each frequency band inthese chunks a discrete feature is derived that specifies chunknumber frequency band number and a description of thetemporal variation in the band bringing together minimummaximum median initial and final intensity values The21 frequency bands are inspired by the 21 receptive areason the cochlear membrane that are sensitive to differentranges of frequencies [118] Thus a given FBSF is a proxyfor cell assemblies in the auditory cortex that respond to aparticular pattern of changes over time in spectral intensityThe AcousticNDLCodeR package [119] for R [120] wasemployed to extract the FBSFs from the audio files

We tested ldl on 20 hours of speech sampled from theaudio files of the ucla library broadcast newsscapedata a vast repository of multimodal TV news broadcasts

Complexity 19

0 5000 10000 15000 20000 25000 30000 35000 40000minus6000

minus4000

minus2000

0

2000

4000

6000

8000

0 5000 10000 15000 20000 25000 30000 35000 40000minus6000

minus4000

minus2000

0

2000

4000

6000

8000

Figure 7Oscillogram for twodifferent realizations of theword crisiswith different degrees of reduction in theNewsScape archive [kraız](left)and [khraısıs](right)

00 01 02 03 04 05 06time

minus020minus015minus010minus005

000005010015

ampl

itude

1

21

frequ

ency

ban

d

Figure 8 Oscillogram of the speech signal for a realization of the word economic with Hilbert envelope (in orange) outlining the signal isshown on the top panel The lower panel depicts the discretized MEL scaled spectrum of the signalThe vertical bars are the boundary pointsthat partition the signal into 4 chunks For one chunk horizontal lines (in red) highlight one of the frequency bands for which the FBSFsprovide summaries of the variation over time in that frequency band The FBSF for the highlighted band is ldquoband11-start3-median3-min2-max5-end3-part2rdquo

provided to us by the Distributed Little Red Hen Lab Theaudio files of this resource were automatically classified asclean for relatively clean parts where there is speech withoutbackground noise or music and noisy for speech snippetswhere background noise or music is present Here we reportresults for 20 hours of clean speech to a total of 131673word tokens (representing 4779word types) with in all 40639distinct FBSFs

The FBSFs for the word tokens are brought together ina matrix C119886 with dimensions 131673 audio tokens times 40639FBSFs The targeted semantic vectors are taken from the Smatrix which is expanded to a matrix with 131673 rows onefor each audio token and 4609 columns the dimension of

the semantic vectors Although the transformation matrix Fcould be obtained by calculating C1015840S the calculation of C1015840is numerically expensive To reduce computational costs wecalculated F as follows (the transpose of a square matrix Xdenoted by X119879 is obtained by replacing the upper triangleof the matrix by the lower triangle and vice versa Thus( 3 87 2 )119879 = ( 3 78 2 ) For a nonsquare matrix the transpose isobtained by switching rows and columnsThus a 4times2 matrixbecomes a 2 times 4 matrix when transposed)

CF = S

C119879CF = C119879S

20 Complexity

(C119879C)minus1 (C119879C) F = (C119879C)minus1 C119879SF = (C119879C)minus1 C119879S

(17)

In this way matrix inversion is required for a much smallermatrixC119879C which is a square matrix of size 40639 times 40639

To evaluate model performance we compared the esti-mated semantic vectors with the targeted semantic vectorsusing the Pearson correlation We therefore calculated the131673 times 131673 correlation matrix for all possible pairs ofestimated and targeted semantic vectors Precisionwas calcu-lated by comparing predicted vectors with the gold standardprovided by the targeted vectors Recognition was defined tobe successful if the correlation of the predicted vector withthe targeted gold vector was the highest of all the pairwisecorrelations of this predicted vector with any of the other goldsemantic vectors Precision defined as the proportion of cor-rect recognitions divided by the total number of audio tokenswas at 3361 For the correctly identified words the meancorrelation was 072 and for the incorrectly identified wordsit was 055 To place this in perspective a naive discrimi-nation model with discrete lexomes as output performed at12 and a deep convolution network Mozilla DeepSpeech(httpsgithubcommozillaDeepSpeech based onHannun et al [9]) performed at 6 The low performanceofMozilla DeepSpeech is due primarily due to its dependenceon a languagemodelWhenpresentedwith utterances insteadof single words it performs remarkably well

44 Discussion A problem with naive discriminative learn-ing that has been puzzling for a long time is that measuresbased on ndl performed well as predictors of processingtimes [54 58] whereas accuracy for lexical decisions waslow An accuracy of 27 for a model performing an 11480-classification task is perhaps reasonable but the lack ofprecision is unsatisfactory when the goal is to model humanvisual lexicality decisions Bymoving fromndl to ldlmodelaccuracy is substantially improved (to 59) At the sametime predictions for reaction times improved considerablyas well As lexicality decisions do not require word identifica-tion further improvement in predicting decision behavior isexpected to be possible by considering not only whether thepredicted semantic is closest to the targeted vector but alsomeasures such as how densely the space around the predictedsemantic vector is populated

Accuracy for auditory comprehension is lower for thedata we considered above at around 33 Interestingly aseries of studies indicates that recognizing isolated wordstaken out of running speech is a nontrivial task also forhuman listeners [121ndash123] Correct identification by nativespeakers of 1000 randomly sampled word tokens from aGerman corpus of spontaneous conversational speech rangedbetween 21 and 44 [18] For both human listeners andautomatic speech recognition systems recognition improvesconsiderably when words are presented in their naturalcontext Given that ldl with FBSFs performs very wellon isolated word recognition it seems worth investigating

further whether the present approach can be developed into afully-fledged model of auditory comprehension that can takefull utterances as input For a blueprint of how we plan toimplement such a model see Baayen et al [124]

5 Speech Production

This section examines whether we can predict wordsrsquo formsfrom the semantic vectors of S If this is possible for thepresent dataset with reasonable accuracy we have a proofof concept that discriminative morphology is feasible notonly for comprehension but also for speech productionThe first subsection introduces the computational imple-mentation The next subsection reports on the modelrsquosperformance which is evaluated first for monomorphemicwords then for inflected words and finally for derivedwords Section 53 provides further evidence for the pro-duction network by showing that as the support from thesemantics for the triphones becomes weaker the amount oftime required for articulating the corresponding segmentsincreases

51 Computational Implementation For a productionmodelsome representational format is required for the output thatin turn drives articulation In what follows we make useof triphones as output features Triphones capture part ofthe contextual dependencies that characterize speech andthat render problematic the phoneme as elementary unit ofa phonological calculus [93] Triphones are in many waysnot ideal in that they inherit the limitations that come withdiscrete units Other output features structured along thelines of gestural phonology [125] or time series of movementsof key articulators registered with electromagnetic articulog-raphy or ultrasound are on our list for further explorationFor now we use triphones as a convenience construct andwe will show that given the support from the semantics forthe triphones the sequence of phones can be reconstructedwith high accuracy As some models of speech productionassemble articulatory targets from phone segments (eg[126]) the present implementation can be seen as a front-endfor this type of model

Before putting the model to the test we first clarify theway the model works by means of our toy lexicon with thewords one two and three Above we introduced the semanticmatrix S (13) which we repeat here for convenience

S = one two threeonetwothree

( 100201031001

040110 ) (18)

as well as an C indicator matrix specifying which triphonesoccur in which words (12) As in what follows this matrixspecifies the triphones targeted for productionwe henceforthrefer to this matrix as the Tmatrix

Complexity 21

T

= wV wVn Vn tu tu Tr Tri rionetwothree

( 100100

100010

010001

001001 ) (19)

For production our interest is in thematrixG that transformsthe row vectors of S into the row vectors of T ie we need tosolve

SG = T (20)

Given G we can predict for any semantic vector s the vectorof triphones t that quantifies the support for the triphonesprovided by s simply by multiplying s with G

t = sG (21)

As before the transformation matrix G is straightforwardto estimate Let S1015840 denote the Moore-Penrose generalizedinverse of S Since

S1015840SG = S1015840T (22)

we have

G = S1015840T (23)

Given G we can predict the triphone matrix T from thesemantic matrix S

SG = T (24)

For the present example the inverse of S and S1015840 is

S1015840 = one two threeonetwothree

( 110minus021minus009minus029107minus008

minus041minus002104 ) (25)

and the transformation matrix G is

G = wV wVn Vn tu tu Tr Tri ri

onetwothree

( 110minus021minus009110minus021minus009

110minus021minus009minus029107minus008

minus029107minus008minus041minus002104

minus041minus002104minus041minus002104 ) (26)

For this simple example T is virtually identical to T Forrealistic data T will not be identical to T but will be anapproximation of it that is optimal in the least squares senseThe triphones with the strongest support are expected to bethe most likely to be the triphones defining a wordrsquos form

We made use of the S matrix which we derived fromthe tasa corpus as described in Section 3 The majority ofcolumns of the 23561 times 23561 matrix S show very smalldeviations from zero and hence are uninformative As beforewe reduced the number of columns of S by removing columnswith very low variance Here one option is to remove allcolumns with a variance below a preset threshold 120579 Howeverto avoid adding a threshold as a free parameter we setthe number of columns retained to the number of differenttriphones in a given dataset a number which is around 119899 =4500 In summary S denotes a 119908times119899 matrix that specifies foreach of 119908 words an 119899-dimensional semantic vector

52 Model Performance

521 Performance on Monolexomic Words We first exam-ined model performance on a dataset comprising monolex-omicwords that did not carry any inflectional exponentsThisdataset of 3987words comprised three irregular comparatives

(elder less andmore) two irregular superlatives (least most)28 irregular past tense forms and one irregular past participle(smelt) For this set of words we constructed a 3987 times 4446matrix of semantic vectors S119898 (a submatrix of the S matrixintroduced above in Section 3) and a 3987 times 4446 triphonematrix T119898 (a submatrix of T) The number of colums of S119898was set to the number of different triphones (the columndimension of T119898) Those column vectors of S119898 were retainedthat had the 4446 highest variances We then estimated thetransformation matrix G and used this matrix to predict thetriphones that define wordsrsquo forms

We evaluated model performance in two ways First weinspected whether the triphones with maximal support wereindeed the targeted triphones This turned out to be the casefor all words Targeted triphones had an activation valueclose to one and nontargeted triphones an activation closeto zero As the triphones are not ordered we also investi-gated whether the sequence of phones could be constructedcorrectly for these words To this end we set a thresholdof 099 extracted all triphones with an activation exceedingthis threshold and used the all simple paths (a path issimple if the vertices it visits are not visited more than once)function from the igraph package [127] to calculate all pathsstarting with any left-edge triphone in the set of extractedtriphones From the resulting set of paths we selected the

22 Complexity

longest path which invariably was perfectly aligned with thesequence of triphones that defines wordsrsquo forms

We also evaluated model performance with a secondmore general heuristic algorithm that also makes use of thesame algorithm from graph theory Our algorithm sets up agraph with vertices collected from the triphones that are bestsupported by the relevant semantic vectors and considers allpaths it can find that lead from an initial triphone to a finaltriphoneThis algorithm which is presented inmore detail inthe appendix and which is essential for novel complex wordsproduced the correct form for 3982 out of 3987 words Itselected a shorter form for five words int for intent lin forlinnen mis for mistress oint for ointment and pin for pippinThe correct forms were also found but ranked second due toa simple length penalty that is implemented in the algorithm

From these analyses it is clear that mapping nearly4000 semantic vectors on their corresponding triphone pathscan be accomplished with very high accuracy for Englishmonolexomic words The question to be addressed nextis how well this approach works for complex words Wefirst address inflected forms and limit ourselves here to theinflected variants of the present set of 3987 monolexomicwords

522 Performance on Inflected Words Following the classicdistinction between inflection and word formation inflectedwords did not receive semantic vectors of their own Never-theless we can create semantic vectors for inflected words byadding the semantic vector of an inflectional function to thesemantic vector of its base However there are several ways inwhich themapping frommeaning to form for inflectedwordscan be set up To explain this we need some further notation

Let S119898 and T119898 denote the submatrices of S and T thatcontain the semantic and triphone vectors of monolexomicwords Assume that a subset of 119896 of these monolexomicwords is attested in the training corpus with inflectionalfunction 119886 Let T119886 denote the matrix with the triphonevectors of these inflected words and let S119886 denote thecorresponding semantic vectors To obtain S119886 we take thepertinent submatrix S119898119886 from S119898 and add the semantic vectors119886 of the affix

S119886 = S119898119886 + i⨂ s119886 (27)

Here i is a unit vector of length 119896 and ⨂ is the generalizedKronecker product which in (27) stacks 119896 copies of s119886 row-wise As a first step we could define a separate mapping G119886for each inflectional function 119886

S119886G119886 = T119886 (28)

but in this set-up learning of inflected words does not benefitfrom the knowledge of the base words This can be remediedby a mapping for augmented matrices that contain the rowvectors for both base words and inflected words[S119898

S119886]G119886 = [T119898

T119886] (29)

The dimension of G119886 (length of semantic vector by lengthof triphone vector) remains the same so this option is notmore costly than the preceding one Nevertheless for eachinflectional function a separate large matrix is required Amuch more parsimonious solution is to build augmentedmatrices for base words and all inflected words jointly

[[[[[[[[[[S119898S1198861S1198862S119886119899

]]]]]]]]]]G = [[[[[[[[[[

T119898T1198861T1198862T119886119899

]]]]]]]]]](30)

The dimension of G is identical to that of G119886 but now allinflectional functions are dealt with by a single mappingIn what follows we report the results obtained with thismapping

We selected 6595 inflected variants which met the cri-terion that the frequency of the corresponding inflectionalfunction was at least 50 This resulted in a dataset with 91comparatives 97 superlatives 2401 plurals 1333 continuousforms (eg walking) 859 past tense forms and 1086 formsclassed as perfective (past participles) as well as 728 thirdperson verb forms (eg walks) Many forms can be analyzedas either past tenses or as past participles We followed theanalyses of the treetagger which resulted in a dataset inwhichboth inflectional functions are well-attested

Following (30) we obtained an (augmented) 10582 times5483 semantic matrix S whereas before we retained the 5483columns with the highest column variance The (augmented)triphone matrix T for this dataset had the same dimensions

Inspection of the activations of the triphones revealedthat targeted triphones had top activations for 85 of themonolexomic words and 86 of the inflected words Theproportion of words with at most one intruding triphone was97 for both monolexomic and inflected words The graph-based algorithm performed with an overall accuracy of 94accuracies broken down bymorphology revealed an accuracyof 99 for the monolexomic words and an accuracy of 92for inflected words One source of errors for the productionalgorithm is inconsistent coding in the celex database Forinstance the stem of prosper is coded as having a final schwafollowed by r but the inflected forms are coded without ther creating a mismatch between a partially rhotic stem andcompletely non-rhotic inflected variants

We next put model performance to a more stringent testby using 10-fold cross-validation for the inflected variantsFor each fold we trained on all stems and 90 of all inflectedforms and then evaluated performance on the 10of inflectedforms that were not seen in training In this way we can ascer-tain the extent to which our production system (network plusgraph-based algorithm for ordering triphones) is productiveWe excluded from the cross-validation procedure irregularinflected forms forms with celex phonological forms withinconsistent within-paradigm rhoticism as well as forms thestem of which was not available in the training set Thus

Complexity 23

cross-validation was carried out for a total of 6236 inflectedforms

As before the semantic vectors for inflected words wereobtained by addition of the corresponding content and inflec-tional semantic vectors For each training set we calculatedthe transformationmatrixG from the S andTmatrices of thattraining set For an out-of-bag inflected form in the test setwe calculated its semantic vector and multiplied this vectorwith the transformation matrix (using (21)) to obtain thepredicted triphone vector t

The proportion of forms that were predicted correctlywas 062 The proportion of forms ranked second was 025Forms that were incorrectly ranked first typically were otherinflected forms (including bare stems) that happened toreceive stronger support than the targeted form Such errorsare not uncommon in spoken English For instance inthe Buckeye corpus [110] closest is once reduced to closFurthermore Dell [90] classified forms such as concludementfor conclusion and he relax for he relaxes as (noncontextual)errors

The algorithm failed to produce the targeted form for3 of the cases Examples of the forms produced instead arethe past tense for blazed being realized with [zId] insteadof [zd] the plural mouths being predicted as having [Ts] ascoda rather than [Ds] and finest being reduced to finst Thevoiceless production formouth does not follow the dictionarynorm but is used as attested by on-line pronunciationdictionaries Furthermore the voicing alternation is partlyunpredictable (see eg [128] for final devoicing in Dutch)and hence model performance here is not unreasonable Wenext consider model accuracy for derived words

523 Performance on Derived Words When building thevector space model we distinguished between inflectionand word formation Inflected words did not receive theirown semantic vectors By contrast each derived word wasassigned its own lexome together with a lexome for itsderivational function Thus happiness was paired with twolexomes happiness and ness Since the semantic matrix forour dataset already contains semantic vectors for derivedwords we first investigated how well forms are predictedwhen derived words are assessed along with monolexomicwords without any further differentiation between the twoTo this end we constructed a semantic matrix S for 4885words (rows) by 4993 (columns) and constructed the trans-formation matrix G from this matrix and the correspondingtriphone matrix T The predicted form vectors of T =SG supported the targeted triphones above all other tri-phones without exception Furthermore the graph-basedalgorithm correctly reconstructed 99 of all forms withonly 5 cases where it assigned the correct form secondrank

Next we inspected how the algorithm performs whenthe semantic vectors of derived forms are obtained fromthe semantic vectors of their base words and those of theirderivational lexomes instead of using the semantic vectors ofthe derived words themselves To allow subsequent evalua-tion by cross-validation we selected those derived words that

contained an affix that occured at least 30 times in our data set(again (38) agent (177) ful (45) instrument (82) less(54) ly (127) and ness (57)) to a total of 770 complex words

We first combined these derived words with the 3987monolexomic words For the resulting 4885 words the Tand S matrices were constructed from which we derivedthe transformation matrix G and subsequently the matrixof predicted triphone strengths T The proportion of wordsfor which the targeted triphones were the best supportedtriphones was 096 and the graph algorithm performed withan accuracy of 989

In order to assess the productivity of the system weevaluated performance on derived words by means of 10-fold cross-validation For each fold we made sure that eachaffix was present proportionally to its frequency in the overalldataset and that a derived wordrsquos base word was included inthe training set

We first examined performance when the transformationmatrix is estimated from a semantic matrix that contains thesemantic vectors of the derived words themselves whereassemantic vectors for unseen derived words are obtained bysummation of the semantic vectors of base and affix It turnsout that this set-up results in a total failure In the presentframework the semantic vectors of derived words are tooidiosyncratic and too finely tuned to their own collocationalpreferences They are too scattered in semantic space tosupport a transformation matrix that supports the triphonesof both stem and affix

We then used exactly the same cross-validation procedureas outlined above for inflected words constructing semanticvectors for derived words from the semantic vectors of baseand affix and calculating the transformation matrix G fromthese summed vectors For unseen derivedwordsGwas usedto transform the semantic vectors for novel derived words(obtained by summing the vectors of base and affix) intotriphone space

For 75 of the derived words the graph algorithmreconstructed the targeted triphones For 14 of the derivednouns the targeted form was not retrieved These includecases such as resound which the model produced with [s]instead of the (unexpected) [z] sewer which the modelproduced with R instead of only R and tumbler where themodel used syllabic [l] instead of nonsyllabic [l] given in thecelex target form

53 Weak Links in the Triphone Graph and Delays in SpeechProduction An important property of the model is that thesupport for trigrams changes where a wordrsquos graph branchesout for different morphological variants This is illustratedin Figure 9 for the inflected form blending Support for thestem-final triphone End is still at 1 but then blend can endor continue as blends blended or blending The resultinguncertainty is reflected in the weights on the edges leavingEnd In this example the targeted ing form is driven by theinflectional semantic vector for continuous and hence theedge to ndI is best supported For other forms the edgeweights will be different and hence other paths will be bettersupported

24 Complexity

1

1

1051

024023

03

023

1

001 001

1

02

03

023

002

1 0051

bl

blE

lEn

EndndI

dIN

IN

INI

NIN

dId

Id

IdI

dII

IIN

ndz

dz

dzI

zIN

nd

EnI

nIN

lEI

EIN

Figure 9 The directed graph for blending Vertices represent triphones including triphones such as IIN that were posited by the graphalgorithm to bridge potentially relevant transitions that are not instantiated in the training data Edges are labelled with their edge weightsThe graph the edge weights of which are specific to the lexomes blend and continuous incorporates not only the path for blending butalso the paths for blend blends and blended

The uncertainty that arises at branching points in thetriphone graph is of interest in the light of several exper-imental results For instance inter keystroke intervals intyping become longer at syllable and morph boundaries[129 130] Evidence from articulography suggests variabilityis greater at morph boundaries [131] Longer keystroke exe-cution times and greater articulatory variability are exactlywhat is expected under reduced edge support We thereforeexamined whether the edge weights at the first branchingpoint are predictive for lexical processing To this end weinvestigated the acoustic duration of the segment at the centerof the first triphone with a reduced LDL edge weight (in thepresent example d in ndI henceforth lsquobranching segmentrsquo)for those words in our study that are attested in the Buckeyecorpus [110]

The dataset we extracted from the Buckeye corpus com-prised 15105 tokens of a total of 1327 word types collectedfrom 40 speakers For each of these words we calculatedthe relative duration of the branching segment calculatedby dividing segment duration by word duration For thepurposes of statistical evaluation the distribution of thisrelative duration was brought closer to normality bymeans ofa logarithmic transformationWe fitted a generalized additivemixed model [132] to log relative duration with randomintercepts for word and speaker log LDL edge weight aspredictor of interest and local speech rate (Phrase Rate)log neighborhood density (Ncount) and log word frequencyas control variables A summary of this model is presentedin Table 7 As expected an increase in LDL Edge Weight

goes hand in hand with a reduction in the duration of thebranching segment In other words when the edge weight isreduced production is slowed

The adverse effects of weaknesses in a wordrsquos path inthe graph where the path branches is of interest againstthe discussion about the function of weak links in diphonetransitions in the literature on comprehension [133ndash138] Forcomprehension it has been argued that bigram or diphonelsquotroughsrsquo ie low transitional probabilities in a chain of hightransitional probabilities provide points where sequencesare segmented and parsed into their constituents Fromthe perspective of discrimination learning however low-probability transitions function in exactly the opposite wayfor comprehension [54 124] Naive discriminative learningalso predicts that troughs should give rise to shorter process-ing times in comprehension but not because morphologicaldecomposition would proceed more effectively Since high-frequency boundary bigrams and diphones are typically usedword-internally across many words they have a low cuevalidity for thesewords Conversely low-frequency boundarybigrams are much more typical for very specific base+affixcombinations and hence are better discriminative cues thatafford enhanced activation of wordsrsquo lexomes This in turngives rise to faster processing (see also Ramscar et al [78] forthe discriminative function of low-frequency bigrams in low-frequency monolexomic words)

There is remarkable convergence between the directedgraphs for speech production such as illustrated in Figure 9and computational models using temporal self-organizing

Complexity 25

Table 7 Statistics for the partial effects of a generalized additive mixed model fitted to the relative duration of edge segments in the Buckeyecorpus The trend for log frequency is positive accelerating TPRS thin plate regression spline

A parametric coefficients Estimate Std Error t-value p-valueIntercept -19165 00592 -323908 lt 00001Log LDL Edge Weight -01358 00400 -33960 00007Phrase Rate 00047 00021 22449 00248Log Ncount 01114 00189 58837 lt 00001B smooth terms edf Refdf F-value p-valueTPRS log Frequency 19007 19050 75495 00004random intercepts word 11498839 13230000 314919 lt 00001random intercepts speaker 302995 390000 345579 lt 00001

maps (TSOMs [139ndash141]) TSOMs also use error-drivenlearning and in recent work attention is also drawn toweak edges in wordsrsquo paths in the TSOM and the conse-quences thereof for lexical processing [142] We are not usingTSOMs however one reason being that in our experiencethey do not scale up well to realistically sized lexiconsA second reason is that we are pursuing the hypothesisthat form serves meaning and that self-organization ofform by itself is not necessary This hypothesis is likelytoo strong especially as we do not provide a rationale forthe trigram and triphone units that we use to representaspects of form It is worth noting that spatial organizationof form features is not restricted to TSOMs Although inour approach the spatial organization of triphone features isleft unspecified such an organization can be easily enforced(see [85] for further details) by using an algorithm suchas graphopt which self-organizes the vertices of a graphinto a two-dimensional plane Figure 9 was obtained withthis algorithm (Graphopt was developed for the layout oflarge graphs httpwwwschmuhlorggraphopt andis implemented in the igraph package Graphopt uses basicprinciples of physics to iteratively determine an optimallayout Each node in the graph is given both mass and anelectric charge and edges between nodes are modeled asspringsThis sets up a system inwhich there are attracting andrepelling forces between the vertices of the graph and thisphysical system is simulated until it reaches an equilibrium)

54 Discussion Our production network reconstructsknown wordsrsquo forms with a high accuracy 999 formonolexomic words 92 for inflected words and 99 forderived words For novel complex words accuracy under10-fold cross-validation was 62 for inflected words and 75for derived words

The drop in performance for novel forms is perhapsunsurprising given that speakers understand many morewords than they themselves produce even though they hearnovel forms on a fairly regular basis as they proceed throughlife [79 143]

However we also encountered several technical problemsthat are not due to the algorithm but to the representationsthat the algorithm has had to work with First it is surprisingthat accuracy is as high as it is given that the semanticvectors are constructed from a small corpus with a very

simple discriminative algorithm Second we encounteredinconsistencies in the phonological forms retrieved from thecelex database inconsistencies that in part are due to theuse of discrete triphones Third many cases where the modelpredictions do not match the targeted triphone sequence thetargeted forms have a minor irregularity (eg resound with[z] instead of [s]) Fourth several of the typical errors that themodel makes are known kinds of speech errors or reducedforms that one might encounter in engaged conversationalspeech

It is noteworthy that themodel is almost completely data-driven The triphones are derived from celex and otherthan somemanual corrections for inconsistencies are derivedautomatically from wordsrsquo phone sequences The semanticvectors are based on the tasa corpus and were not in any wayoptimized for the production model Given the triphone andsemantic vectors the transformation matrix is completelydetermined No by-hand engineering of rules and exceptionsis required nor is it necessary to specify with hand-codedlinks what the first segment of a word is what its secondsegment is etc as in the weaver model [37] It is only in theheuristic graph algorithm that three thresholds are requiredin order to avoid that graphs become too large to remaincomputationally tractable

The speech production system that emerges from thisapproach comprises first of all an excellent memory forforms that have been encountered before Importantly thismemory is not a static memory but a dynamic one It is not arepository of stored forms but it reconstructs the forms fromthe semantics it is requested to encode For regular unseenforms however a second network is required that projects aregularized semantic space (obtained by accumulation of thesemantic vectors of content and inflectional or derivationalfunctions) onto the triphone output space Importantly itappears that no separate transformations or rules are requiredfor individual inflectional or derivational functions

Jointly the two networks both of which can also betrained incrementally using the learning rule ofWidrow-Hoff[44] define a dual route model attempts to build a singleintegrated network were not successfulThe semantic vectorsof derived words are too idiosyncratic to allow generalizationfor novel forms It is the property of the semantic vectorsof derived lexomes described in Section 323 namely thatthey are close to but not inside the cloud of their content

26 Complexity

lexomes which makes them productive Novel forms do notpartake in the idiosyncracies of lexicalized existing wordsbut instead remain bound to base and derivational lexomesIt is exactly this property that makes them pronounceableOnce produced a novel form will then gravitate as experi-ence accumulates towards the cloud of semantic vectors ofits morphological category meanwhile also developing itsown idiosyncracies in pronunciation including sometimeshighly reduced pronunciation variants (eg tyk for Dutchnatuurlijk ([natyrlk]) [111 144 145] It is perhaps possibleto merge the two routes into one but for this much morefine-grained semantic vectors are required that incorporateinformation about discourse and the speakerrsquos stance withrespect to the adressee (see eg [115] for detailed discussionof such factors)

6 Bringing in Time

Thus far we have not considered the role of time The audiosignal comes in over time longerwords are typically readwithmore than one fixation and likewise articulation is a temporalprocess In this section we briefly outline how time can bebrought into the model We do so by discussing the readingof proper names

Proper names (morphologically compounds) pose achallenge to compositional theories as the people referredto by names such as Richard Dawkins Richard NixonRichardThompson and SandraThompson are not obviouslysemantic composites of each other We therefore assignlexomes to names irrespective of whether the individu-als referred to are real or fictive alive or dead Further-more we assume that when the personal name and familyname receive their own fixations both names are under-stood as pertaining to the same named entity which istherefore coupled with its own unique lexome Thus forthe example sentences in Table 8 the letter trigrams ofJohn are paired with the lexome JohnClark and like-wise the letter trigrams of Clark are paired with this lex-ome

By way of illustration we obtained thematrix of semanticvectors training on 100 randomly ordered tokens of thesentences of Table 8 We then constructed a trigram cuematrix C specifying for each word (John Clark wrote )which trigrams it contains In parallel a matrix L specify-ing for each word its corresponding lexomes (JohnClarkJohnClark write ) was set up We then calculatedthe matrix F by solving CF = L and used F to cal-culate estimated (predicted) semantic vectors L Figure 10presents the correlations of the estimated semantic vectorsfor the word forms John John Welsch Janet and Clarkwith the targeted semantic vectors for the named entitiesJohnClark JohnWelsch JohnWiggam AnneHastieand JaneClark For the sequentially read words John andWelsch the semantic vector generated by John and thatgenerated by Welsch were summed to obtain the integratedsemantic vector for the composite name

Figure 10 upper left panel illustrates that upon readingthe personal name John there is considerable uncertainty

about which named entity is at issue When subsequentlythe family name is read (upper right panel) uncertainty isreduced and John Welsch now receives full support whereasother named entities have correlations close to zero or evennegative correlations As in the small world of the presenttoy example Janet is a unique personal name there is nouncertainty about what named entity is at issue when Janetis read For the family name Clark on its own by contrastJohn Clark and Janet Clark are both viable options

For the present example we assumed that every ortho-graphic word received one fixation However depending onwhether the eye lands far enough into the word namessuch as John Clark and compounds such as graph theorycan also be read with a single fixation For single-fixationreading learning events should comprise the joint trigrams ofthe two orthographic words which are then simultaneouslycalibrated against the targeted lexome (JohnClark graph-theory)

Obviously the true complexities of reading such asparafoveal preview are not captured by this example Never-theless as a rough first approximation this example showsa way forward for integrating time into the discriminativelexicon

7 General Discussion

We have presented a unified discrimination-driven modelfor visual and auditory word comprehension and wordproduction the discriminative lexicon The layout of thismodel is visualized in Figure 11 Input (cochlea and retina)and output (typing and speaking) systems are presented inlight gray the representations of themodel are shown in darkgray For auditory comprehension we make use of frequencyband summary (FBS) features low-level cues that we deriveautomatically from the speech signal The FBS features serveas input to the auditory network F119886 that maps vectors of FBSfeatures onto the semantic vectors of S For reading lettertrigrams represent the visual input at a level of granularitythat is optimal for a functional model of the lexicon (see [97]for a discussion of lower-level visual features) Trigrams arerelatively high-level features compared to the FBS featuresFor features implementing visual input at a much lower levelof visual granularity using histograms of oriented gradients[146] see Linke et al [15] Letter trigrams are mapped by thevisual network F119900 onto S

For reading we also implemented a network heredenoted by K119886 that maps trigrams onto auditory targets(auditory verbal images) represented by triphones Theseauditory triphones in turn are mapped by network H119886onto the semantic vectors S This network is motivated notonly by our observation that for predicting visual lexicaldecision latencies triphones outperform trigrams by a widemargin Network H119886 is also part of the control loop forspeech production see Hickok [147] for discussion of thiscontrol loop and the necessity of auditory targets in speechproduction Furthermore the acoustic durations with whichEnglish speakers realize the stem vowels of English verbs[148] as well as the duration of word final s in English [149]

Complexity 27

Table 8 Example sentences for the reading of proper names To keep the example simple inflectional lexomes are not taken into account

John Clark wrote great books about antsJohn Clark published a great book about dangerous antsJohn Welsch makes great photographs of airplanesJohn Welsch has a collection of great photographs of airplanes landingJohn Wiggam will build great harpsichordsJohn Wiggam will be a great expert in tuning harpsichordsAnne Hastie has taught statistics to great second year studentsAnne Hastie has taught probability to great first year studentsJanet Clark teaches graph theory to second year studentsJohnClark write great book about antJohnClark publish a great book about dangerous antJohnWelsch make great photograph of airplaneJohnWelsch have a collection of great photograph airplane landingJohnWiggam future build great harpsichordJohnWiggam future be a great expert in tuning harpsichordAnneHastie have teach statistics to great second year studentAnneHastie have teach probability to great first year studentJanetClark teach graphtheory to second year student

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

John

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

John Welsch

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

Janet

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

Clark

Pear

son

corr

elat

ion

minus0

50

00

51

0

Figure 10 Correlations of estimated semantic vectors for John JohnWelsch Janet andClarkwith the targeted semantic vectors of JohnClarkJohnWelsch JohnWiggam AnneHastie and JaneClark

can be predicted from ndl networks using auditory targets todiscriminate between lexomes [150]

Speech production is modeled by network G119886 whichmaps semantic vectors onto auditory targets which in turnserve as input for articulation Although current modelsof speech production assemble articulatory targets fromphonemes (see eg [126 147]) a possibility that is certainlycompatible with our model when phones are recontextual-ized as triphones we think that it is worthwhile investigatingwhether a network mapping semantic vectors onto vectors ofarticulatory parameters that in turn drive the articulators canbe made to work especially since many reduced forms arenot straightforwardly derivable from the phone sequences oftheir lsquocanonicalrsquo forms [111 144]

We have shown that the networks of our model havereasonable to high production and recognition accuraciesand also generate a wealth of well-supported predictions forlexical processing Central to all networks is the hypothesisthat the relation between form and meaning can be modeleddiscriminatively with large but otherwise surprisingly simplelinear networks the underlyingmathematics of which (linearalgebra) is well-understood Given the present results it isclear that the potential of linear algebra for understandingthe lexicon and lexical processing has thus far been severelyunderestimated (the model as outlined in Figure 11 is amodular one for reasons of computational convenience andinterpretabilityThe different networks probably interact (seefor some evidence [47]) One such interaction is explored

28 Complexity

typing speaking

orthographic targets

trigrams

auditory targets

triphones

semantic vectors

TASA

auditory cues

FBS features

orthographic cues

trigrams

cochlea retina

To Ta

Ha

GoGa

KaS

Fa Fo

CoCa

Figure 11 Overview of the discriminative lexicon Input and output systems are presented in light gray the representations of the model areshown in dark gray Each of these representations is a matrix with its own set of features Arcs are labeled with the discriminative networks(transformationmatrices) thatmapmatrices onto each otherThe arc from semantic vectors to orthographic vectors is in gray as thismappingis not explored in the present study

in Baayen et al [85] In their production model for Latininflection feedback from the projected triphone paths backto the semantics (synthesis by analysis) is allowed so thatof otherwise equally well-supported paths the path that bestexpresses the targeted semantics is selected For informationflows between systems in speech production see Hickok[147]) In what follows we touch upon a series of issues thatare relevant for evaluating our model

Incremental learning In the present study we esti-mated the weights on the connections of these networkswith matrix operations but importantly these weights canalso be estimated incrementally using the learning rule ofWidrow and Hoff [44] further improvements in accuracyare expected when using the Kalman filter [151] As allmatrices can be updated incrementally (and this holds aswell for the matrix with semantic vectors which are alsotime-variant) and in theory should be updated incrementally

whenever information about the order of learning events isavailable the present theory has potential for studying lexicalacquisition and the continuing development of the lexiconover the lifetime [79]

Morphology without compositional operations Wehave shown that our networks are predictive for a rangeof experimental findings Important from the perspective ofdiscriminative linguistics is that there are no compositionaloperations in the sense of Frege and Russell [1 2 152] Thework on compositionality in logic has deeply influencedformal linguistics (eg [3 153]) and has led to the beliefthat the ldquoarchitecture of the language facultyrdquo is groundedin a homomorphism between a calculus (or algebra) ofsyntactic representations and a calculus (or algebra) based onsemantic primitives Within this tradition compositionalityarises when rules combining representations of form arematched with rules combining representations of meaning

Complexity 29

The approach of Marelli and Baroni [60] who derive thesemantic vector of a complex word from the semantic vectorof its base and a dedicated affix-specific mapping from thepertinent base words to the corresponding derived wordsis in the spirit of compositionality in that a change in form(through affixation) goes hand in hand with a change inmeaning (modeled with a linear mapping in semantic space)

The perspective on morphological processing developedin the present study breaks with this tradition Althoughsemantic vectors for inflected words are obtained by summa-tion of the semantic vectors of the base word and those ofits inflectional functions (see [85] for modeling of the richverbal paradigms of Latin) the form vectors correspondingto the resulting semantic vectors are not obtained by thesummation of vectors for chunks of form nor by any otheroperations on forms Attempts to align chunks of wordsrsquo formwith their semantic structures mdash typically represented bygraphs constructed over semantic primitives (see eg [154])mdash are destined to end in quagmires of arbitrary decisionsabout the number of inflectional classes required (Estonian30 or 300) what the chunks should look like (does Germanhave a morpheme d in its articles and demonstratives derdie das diese ) whether Semitic morphology requiresoperations on form on one or two tiers [155ndash157] andhow many affixal position slots are required for Athebascanlanguages such asNavajo [158] A language that is particularlypainful for compositional theories is Yelı Dnye an Indo-Pacific language spoken on Rossel island (in the south-east ofPapuaNewGuinea)The language has a substantial inventoryof over a 1000 function words used for verb agreementTypically these words are monosyllables that simultaneouslyspecify values for negation tense aspect person and numberof subject deixis evidentiality associated motion and coun-terfactuality Furthermore verbs typically have no less thaneight different stems the use of which depends on particularconstellations of inflectional functions [159 160] Yelı Dnyethus provides a striking counterexample to the hypothesisthat a calculus of meaning would be paralleled by a calculusof form

Whereas the theory of linear discriminative learningradically rejects the idea that morphological processing isgrounded in compositionality as defined in mainstream for-mal linguistic theory it does allow for the conjoint expressionin form of lexical meanings and inflectional functions andfor forms to map onto such conjoint semantic vectors Butjust asmultiple inflectional functions can be expressed jointlyin a single word multiple words can map onto nonconjointsemantic vectors as was illustrated above for named entityrecognition John Wiggam can be understood to refer to avery specific person without this person being a composi-tional function of the proper name John and the surnameWiggam In both these cases there is no isomorphy betweeninflectional functions and lexical meanings on the one handand chunks of form on the other hand

Of course conjoint vectors for lexical meanings andinflectional functions can be implemented andmade to workonly because a Latin verb form such as sapıvissemus isunderstood when building the semantic space S to expresslexomes for person (first) number (plural) voice (active)

tense (pluperfect) mood (subjunctive) and lexical meaning(to know) (see [85] for computational modeling details)Although this could loosely be described as a decomposi-tional approach to inflected words we prefer not to use theterm lsquodecompositionalrsquo as in formal linguistics this termas outlined above has a very different meaning A moreadequate terminology would be that the system is conjointrealizational for production and conjoint inferential for com-prehension (we note here that we have focused on temporallyconjoint realization and inference leaving temporally disjointrealization and inference for sequences of words and possiblycompounds and fixed expression for further research)

Naive versus linear discriminative learning There isone important technical difference between the learningengine of the current study ldl and naive discriminativelearning (ndl) Because ndl makes the simplifying assump-tion that outcomes are orthogonal the weights from the cuesto a given outcome are independent of the weights from thecues to the other outcomes This independence assumptionwhich motivates the word naive in naive discriminativelearning makes it possible to very efficiently update networkweights during incremental learning In ldl by contrast thisindependence no longer holds learning is no longer naiveWe therefore refer to our networks not as ndl networksbut simply as linear discriminative learning networks Actualincremental updating of ldl networks is computationallymore costly but further numerical optimization is possibleand implementations are underway

Scaling An important property of the present approachis that good results are obtained already for datasets ofmodestsize Our semantic matrix is derived from the tasa corpuswhich with only 10 million words is dwarfed by the 28 billionwords of the ukWaC corpus used by Marelli and Baroni [60]more words than any individual speaker can ever encounterin their lifetime Similarly Arnold et al [18] report goodresults for word identification when models are trained on20 hours of speech which contrasts with the huge volumesof speech required for training deep learning networks forspeech recognition Although performance increases some-whatwithmore training data [161] it is clear that considerableheadway can be made with relatively moderate volumesof speech It thus becomes feasible to train models onfor instance Australian English and New Zealand Englishusing already existing speech corpora and to implementnetworks that make precise quantitative predictions for howunderstanding is mediated by listenersrsquo expectations abouttheir interlocutors (see eg [162])

In this study we have obtained good results with asemantic matrix with a dimensionality of around 4000lexomes By itself we find it surprising that a semantic vectorspace can be scaffolded with a mere 4000 words But thisfindingmay also shed light on the phenomenon of childhoodamnesia [163 164] Bauer and Larkina [165] pointed outthat young children have autobiographical memories thatthey may not remember a few years later They argue thatthe rate of forgetting diminishes as we grow older andstabilizes in adulthood Part of this process of forgettingmay relate to the changes in the semantic matrix Cruciallywe expect the correlational structure of lexomes to change

30 Complexity

considerably over time as experience accumulates If this isindeed the case the semantic vectors for the lexomes foryoung children are different from the corresponding lexomesfor older children which will again differ from those ofadults As a consequence autobiographical memories thatwere anchored to the lexomes at a young age can no longerbe accessed as the semantic vectors to which these memorieswere originally anchored no longer exist

No examplars Given how we estimate the transforma-tion matrices with which we navigate between form andmeaning it might seem that our approach is in essence animplementation of an exemplar-based theory of the mentallexicon After all for instance the C119886 matrix specifies foreach auditory word form (exemplar) its pertinent frequencyband summary features However the computational imple-mentation of the present study should not be confused withthe underlying algorithmic conceptualization The underly-ing conceptualization is that learning proceeds incrementallytrial by trial with the weights of transformation networksbeing updated with the learning rule of Widrow and Hoff[44] In the mean trial-by-trial updating with the Widrow-Hoff learning rule results in the same expected connectionweights as obtained by the present estimates using thegeneralized inverse The important conceptual advantage ofincremental learning is however that there is no need to storeexemplars By way of example for auditory comprehensionacoustic tokens are given with the input to the learningsystemThese tokens the ephemeral result of interactingwiththe environment leave their traces in the transformationmatrix F119886 but are not themselves stored The same holdsfor speech production We assume that speakers dynamicallyconstruct the semantic vectors from their past experiencesrather than retrieving these semantic vectors from somedictionary-like fixed representation familiar from currentfile systems The dynamic nature of these semantic vectorsemerged loud and clear from the modeling of novel complexwords for speech production under cross-validation In otherwords our hypothesis is that all input and output vectors areephemeral in the sense that they represent temporary statesof the cognitive system that are not themselves represented inthat system but that leave their traces in the transformationmatrices that jointly characterize and define the cognitivesystem that we approximate with the discriminative lexicon

This dynamic perspective on the mental lexicon asconsisting of several coupled networks that jointly bringthe cognitive system into states that in turn feed into fur-ther networks (not modeled here) for sentence integration(comprehension) and articulation (production) raises thequestion about the status of units in this theory Units playan important role when constructing numeric vectors forform and meaning Units for letter trigrams and phonetrigrams are of course context-enriched letter and phoneunits Content lexomes as well as lexomes for derivationaland inflectional functions are crutches for central semanticfunctions ranging from functions realizing relatively concreteonomasiological functions (lsquothis is a dog not a catrsquo) toabstract situational functions allowing entities events andstates to be specified on the dimensions of time spacequantity aspect etc However crucial to the central thrust of

the present study there are no morphological symbols (unitscombining form andmeaning) normorphological formunitssuch as stems and exponents of any kind in the modelAbove we have outlined the many linguistic considerationsthat have led us not to want to incorporate such units aspart of our theory Importantly as there are no hidden layersin our model it is not possible to seek for lsquosubsymbolicrsquoreflexes of morphological units in patterns of activation overhidden layers Here we depart from earlier connectionistmodels which rejected symbolic representations but retainedthe belief that there should be morphological representationalbeit subsymbolic ones Seidenberg and Gonnerman [166]for instance discuss morphological representations as beinglsquointerlevel representationsrsquo that are emergent reflections ofcorrelations among orthography phonology and semanticsThis is not to say that themore agglutinative a language is themore the present model will seem to the outside observer tobe operating withmorphemes But themore a language tendstowards fusional or polysyntheticmorphology the less strongthis impression will be

Model complexity Next consider the question of howdifficult lexical processing actually is Sidestepping the com-plexities of the neural embedding of lexical processing [126147] here we narrow down this question to algorithmiccomplexity State-of-the-art models in psychology [37 114167] implementmany layers of hierarchically organized unitsand many hold it for an established fact that such units arein some sense psychologically real (see eg [19 91 168])However empirical results can be equally well understoodwithout requiring the theoretical construct of the morpheme(see eg [47 54 58 166 169 170]) The present studyillustrates this point for lsquoboundaryrsquo effects in written andoral production When paths in the triphone graph branchuncertainty increases and processing slows down Althoughobserved lsquoboundaryrsquo effects are compatible with a post-Bloomfieldian lexicon the very same effects arise in ourmodel even though morphemes do not figure in any way inour computations

Importantly the simplicity of the linear mappings in ourmodel of the discriminative lexicon allows us to sidestep theproblem that typically arise in computationally implementedfull-scale hierarchical systems namely that errors arisingat lower levels propagate to higher levels The major stepsforward made in recent end-to-end models in languageengineering suggest that end-to-end cognitive models arealso likely to perform much better One might argue thatthe hidden layers of deep learning networks represent thetraditional lsquohidden layersrsquo of phonemes morphemes andword forms mediating between form and meaning in lin-guistic models and offshoots thereof in psychology Howeverthe case of baboon lexical learning [12 13] illustrates thatthis research strategy is not without risk and that simplerdiscriminative networks can substantially outperform deeplearning networks [15] We grant however that it is conceiv-able that model predictions will improve when the presentlinear networks are replaced by deep learning networks whenpresented with the same input and output representationsused here (see Zhao et al [171] for a discussion of lossfunctions for deep learning targeting numeric instead of

Complexity 31

categorical output vectors) What is unclear is whethersuch networks will improve our understandingmdashthe currentlinear networks offer the analyst connection weights betweeninput and output representations that are straightforwardlylinguistically interpretable Where we think deep learningnetworks really will come into their own is bridging the gapbetween for instance the cochlea and retina on the one handand the heuristic representations (letter trigrams) that wehave used as a starting point for our functional theory of themappings between form and meaning

Network flexibility An important challenge for deeplearning networks designed to model human lexical process-ing is to keep the networks flexible and open to rapid updat-ing This openness to continuous learning is required notonly by findings on categorization [40 41 41 42] includingphonetic categorization [172 173] but is also visible in timeseries of reaction times Baayen and Hendrix [17] reportedimproved prediction of visual lexical decision reaction timesin the British Lexicon Project when a naive discriminativelearning network is updated from trial to trial simulating thewithin-experiment learning that goes on as subjects proceedthrough the experiment In natural speech the phenomenonof phonetic convergence between interlocutors [174] likewisebears witness to a lexicon that is constantly recalibratedin interaction with its environment (for the many ways inwhich interlocutorsrsquo speech aligns see [175]) Likewise therapidity with which listeners adapt to foreign accents [176]within a few minutes shows that lexical networks exhibitfast local optimization As deep learning networks typicallyrequire huge numbers of cycles through the training dataonce trained they are at risk of not having the requiredflexibility for fast local adaptation This risk is reduced for(incremental) wide learning as illustrated by the phoneticconvergence evidenced by the model of Arnold et al [18] forauditory comprehension (a very similar point was made byLevelt [177] in his critique of connectionist models in thenineties)

Open questions This initial study necessarily leavesmany questions unanswered Compounds and inflectedderived words have not been addressed and morphologicaloperations such as reduplication infixation and nonconcate-native morphology provide further testing grounds for thepresent approach We also need to address the processingof compounds with multiple constituents ubiquitous inlanguages as different as German Mandarin and EstonianFurthermore in actual utterances words undergo assimila-tion at their boundaries and hence a crucial test case forthe discriminative lexicon is to model the production andcomprehension of words in multi-word utterances For ourproduction model we have also completely ignored stresssyllabification and pitch which however likely has renderedthe production task more difficult than necessary

A challenge for discriminative morphology is the devel-opment of proper semantic matrices For languages withrich morphology such as Estonian current morphologicalparsers will identify case-inflected words as nominativesgenitives or partitives (among others) but these labelsfor inflectional forms do not correspond to the semanticfunctions encoded in these forms (see eg [178 179]) What

is required are computational tools that detect the appropriateinflectional lexomes for these words in the sentences orutterances in which they are embedded

A related problem specifically for the speech productionmodel is how to predict the strongly reduced word formsthat are rampant in conversational speech [111] Our auditorycomprehension network F119886 is confronted with reduced formsin training and the study of Arnold et al [18] indicates thatpresent identification accuracymay approximate human per-formance for single-word recognition Whereas for auditorycomprehension we are making progress what is required forspeech production is a much more detailed understanding ofthe circumstances under which speakers actually use reducedword forms The factors driving reduction may be in partcaptured by the semantic vectors but we anticipate thataspects of the unfolding discourse play an important role aswell which brings us back to the unresolved question of howto best model not isolated words but words in context

An important challenge for our theory which formalizesincremental implicit learning is how to address and modelthe wealth of explicit knowledge that speakers have abouttheir language We play with the way words sound in poetrywe are perplexed by wordsrsquo meanings and reinterpret themso that they make more sense (eg the folk etymology [180]reflected in modern English crawfish a crustacean and nota fish originally Middle English crevis compare ecrevissein French) we teach morphology in schools and we evenfind that making morphological relations explicit may bebeneficial for aphasic patients [181] The rich culture of worduse cannot but interact with the implicit system but howexactly this interaction unfolds andwhat its consequences areat both sides of the consciousunconscious divide is presentlyunclear

Although many questions remain our model is remark-ably successful in capturing many aspects of lexical andmorphological processing in both comprehension and pro-duction Since themodel builds on a combination of linguisti-callymotivated lsquosmartrsquo low-level representations for form andmeaning and large but otherwise straightforward andmathe-matically well-understood two-layer linear networks (to beclear networks without any hidden layers) the conclusionseems justified that algorithmically the ldquomental lexiconrdquomaybe much simpler than previously thought

Appendix

A Vectors Matrices and MatrixMultiplication

Figure 12 presents two data points 119886 and 119887 with coordinates(3 4) and (-2 -3) Arrows drawn from the origin to thesepoints are shown in blue with a solid and dashed linerespectively We refer to these points as vectors Vectors aredenoted with lower case letters in bold and we place the 119909coordinate of a point next to the 119910 coordinate

a = (3 4) b = (minus2 minus3) (A1)

32 Complexity

minus4 minus2 0 2 4

minus4

minus2

02

4

x

y

a

b

x

y

Figure 12 Points 119886 and 119887 can be transformed into points 119901 and 119902 bya linear transformation

We can represent a and b jointly by placing them next to eachother in a matrix For matrices we use capital letters in boldfont

A = ( 3 4minus2 minus3) (A2)

In addition to the points 119886 and 119887 Figure 12 also shows twoother datapoints 119909 and 119910 The matrix for these two points is

B = (minus5 24 minus1) (A3)

Let us assume that points 119886 and 119887 represent the forms oftwo words 1205961 and 1205962 and that the points 119909 and 119910 representthe meanings of these two words (below we discuss in detailhow exactly wordsrsquo forms and meanings can be representedas vectors of numbers) As morphology is the study of therelation between wordsrsquo forms and their meanings we areinterested in how to transform points 119886 and b into points119909 and 119910 and likewise in how to transform points 119909 and119910 back into points 119886 and 119887 The first transformation isrequired for comprehension and the second transformationfor production

We begin with the transformation for comprehensionFormally we have matrix A which we want to transforminto matrix B A way of mapping A onto B is by means ofa linear transformation using matrix multiplication For 2 times2 matrices matrix multiplication is defined as follows(1198861 11988621198871 1198872) (1199091 11990921199101 1199102) = (11988611199091 + 11988621199101 11988611199092 + 1198862119910211988711199091 + 11988721199101 11988711199092 + 11988721199102) (A4)

Such a mapping of A onto B is given by a matrix F

F = ( 1 2minus2 minus1) (A5)

and it is straightforward to verify that indeed( 3 4minus2 minus3) ( 1 2minus2 minus1) = (minus5 24 minus1) (A6)

Using matrix notation we can write

AF = B (A7)

Given A and B how do we obtain F The answeris straightforward but we need two further concepts theidentity matrix and the matrix inverse For multiplication ofnumbers the identity multiplication is multiplication with 1For matrices multiplication with the identity matrix I leavesthe locations of the points unchangedThe identitymatrix hasones on the main diagonal and zeros elsewhere It is easilyverified that indeed(1 00 1) (1199091 11990921199101 1199102) = (1199091 11990921199101 1199102) (1 00 1) = (1199091 11990921199101 1199102) (A8)

For numbers the inverse of multiplication with 119904 is dividingby 119904 (1119904 ) (119904119909) = 119909 (A9)

For matrices the inverse of a square matrix X is that matrixXminus1 such that their product is the identity matrix

Xminus1X = XXminus1 = I (A10)

For nonsquare matrices the inverse Yminus1 is defined such that

Yminus1 (YX) = (XY) 119884minus1 = X (A11)

We find the square matrix F that maps the square matrices Aonto B as follows

AF = B

Aminus1AF = Aminus1B

IF = Aminus1B

F = Aminus1B(A12)

Since for the present example A and its inverse happen to beidentical (Aminus1 = A) we obtain (A6)

Fmaps wordsrsquo form vectors onto wordsrsquo semantic vectorsLet us now consider the reverse and see how we can obtainwordsrsquo form vectors from wordsrsquo semantic vectors That isgiven B we want to find that matrix G which maps B ontoA ie

BG = A (A13)

Solving this equation proceeds exactly as above

BG = A

Bminus1BG = Bminus1A

IG = Bminus1A

G = Bminus1A(A14)

Complexity 33

The inverse of B is

Bminus1 = (13 2343 53) (A15)

and hence we have

(13 2343 53) ( 3 4minus2 minus3) = (minus13 minus2323 13 ) = G (A16)

The inverse of a matrix needs not exist A square matrixthat does not have an inverse is referred to as a singularmatrix Almost all matrices with which we work in theremainder of this study are singular For singular matricesan approximation of the inverse can be used such as theMoore-Penrose generalized inverse (In R the generalizedinverse is available in theMASS package function ginv Thenumeric library that is used by ginv is LAPACK availableat httpwwwnetliborglapack) In this study wedenote the generalized inverse of a matrix X by X1015840

The examples presented here are restricted to 2 times 2matrices but themathematics generalize tomatrices of largerdimensions When multiplying two matrices X and Y theonly constraint is that the number of columns ofX is the sameas the number of rows ofYThe resultingmatrix has the samenumber of rows as X and the same number of columns as Y

In the present study the rows of matrices represent wordsand columns the features of thesewords For instance the rowvectors of A can represent wordsrsquo form features and the rowvectors of and B the corresponding semantic features Thefirst row vector of A is mapped onto the first row vector ofB as follows (3 4) ( 1 2minus2 minus1) = (minus5 2) (A17)

A transformationmatrix such as F can be represented as atwo-layer networkThe network corresponding to F is shownin Figure 13 When the input vector (3 4) is presented to thenetwork the nodes 1198941 and 1198942 are set to 3 and 4 respectively Toobtain the activations 1199001 = minus5 and 1199002 = 2 for the output vector(minus5 2) we cumulate the evidence at the input weighted bythe connection strengths specified on the edges of the graphThus for 1199001 we obtain the value 3 times 1 + 4 times minus2 = minus5 and for1199002 we have 3 times 2 + 4 times minus1 = 2 In other words the networkmaps the input vector (3 4) onto the output vector (minus5 2)

An important property of linear maps is that they areproductive We can present a novel form vector say

s = (2 2) (A18)

to the network and it will map this vector onto a novelsemantic vector Using matrices this new semantic vector isstraightforwardly calculated(2 2) ( 1 2minus2 minus1) = (minus2 2) (A19)

1 2 minus2 minus1

i1 i2

o1 o2

semantic vectors come out here

form features go in here

connection weightsare the transformationmatrix

Figure 13 The linear network corresponding to the transformationmatrix F There are two input nodes 1198941 and 1198942 (in blue)and twooutput nodes 1199001 and 1199002 in red When the input vector (3 4) ispresented to the network the value at 1199001 is 3 times 1 + 4 times minus2 = minus5and that at 1199002 is 3 times 2 + 4 times minus1 = 2 The network maps the point(3 4) onto the point (minus5 2)B Graph-Based Triphone Sequencing

The hypothesis underlying graph-based triphone sequencingis that in the directed graph that has triphones as verticesand an edge between any two vertices that overlap (iesegments 2 and 3 of triphone 1 are identical to segments 1and 2 of triphone 2) the path from a wordrsquos initial triphone(the triphone starting with ) to a wordrsquos final triphone (thetriphone ending with a ) is the path receiving the bestsupport of all possible paths from the initial to the finaltriphone Unfortunately the directed graph containing allvertices and edges is too large tomake computations tractablein a reasonable amount of computation time The followingalgorithm is a heuristic algorithm that first collects potentiallyrelevant vertices and edges and then calculates all pathsstarting fromany initial vertex using theall simple pathsfunction from the igraph package [127] As a first step allvertices that are supported by the stem and whole wordwith network support exceeding 01 are selected To theresulting set those vertices are added that are supported bythe affix which is conditional on these vertices having anetwork support exceeding 095 A directed graph can nowbe constructed and for each initial vertex all paths startingat these vertices are calculated The subset of those pathsreaching a final vertex is selected and the support for eachpath is calculated As longer paths trivially will have greatersupport the support for a path is weighted for its lengthsimply by dividing the raw support by the path length Pathsare ranked by weighted path support and the path withmaximal support is selected as the acoustic image drivingarticulation

It is possible that no path from an initial vertex to a finalvertex is found due to critical boundary triphones not beinginstantiated in the data on which the model was trainedThis happens when the model is assessed on novel inflected

34 Complexity

nobo

dyso

met

hing

ever

ythi

ngno

thin

gev

eryo

nean

ythi

ngan

yone

som

eone

hers

him his

via

than I

exce

ptou

tside

thei

rsw

hoev

erth

ence

min

ein

side

amon

gst its

thei

rou

rm

yyo

uyo

ur her

she

your

selv

esw

hate

ver us we it

they he

them m

em

yself

him

self

hers

elfou

rselv

esyo

urse

lfits

elfth

emse

lves

abou

tw

ithou

tw

ithin

beyo

ndam

ong

betw

een

thro

ugho

utdu

ring

since by

from in of

with at fo

r tobe

low

abov

e up offdo

wn

upon on

agai

nst

tow

ard

tow

ards

arou

ndne

aral

ong

unde

rbe

hind

besid

eth

roug

hon

toin

toov

erac

ross

nobodysomethingeverythingnothingeveryoneanythinganyonesomeonehershimhisviathanIexceptoutsidetheirswhoeverthencemineinsideamongstitstheirourmyyouyourhersheyourselveswhateverusweittheyhethemmemyselfhimselfherselfourselvesyourselfitselfthemselvesaboutwithoutwithinbeyondamongbetweenthroughoutduringsincebyfrominofwithatfortobelowaboveupoffdownupononagainsttowardtowardsaroundnearalongunderbehindbesidethroughontointooveracross

minus01 0 01 02Value

Color Keyand Histogram

050

015

0025

00C

ount

Figure 14 Heatmap for the correlations of the semantic vectors of pronouns and prepositions Note that quantifier pronouns form a groupof their own that prepositions and the other pronouns form distinguisable groups and that within groups further subclusters are visible egforwe and us she and her and across and over All diagonal elements represent correlations equal to 1 (this is not brought out by the heatmapwhich color-codes diagonal elements with the color for the highest correlation of the range we specified 025)

and derived words under cross-validationTherefore verticesand edges are added to the graph for all nonfinal verticesthat are at the end of any of the paths starting at any initialtriphone Such novel vertices and edges are assigned zeronetwork support Typically paths from the initial vertex tothe final vertex can now be constructed and path support isevaluated as above

For an algorithm that allows triphone paths to includecycles which may be the case in languages with much richer

morphology than English see Baayen et al [85] and for codethe R packageWpmWithLdl described therein

C A Heatmap for Function Words(Pronouns and Prepositions)

The heatmap for the correlations of the semantic vectors ofpronouns and prepositions is presented in Figure 14

Complexity 35

Data Availability

With the exception of the data from theDistributed Little RedHen Lab all primary data are publicly available from the citedsources

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The authors are indebted to Geoff Hollis Jessie Nixon ChrisWestbury and LuisMienhardt for their constructive feedbackon earlier versions of this manuscript This research wassupported by an ERC advancedGrant (no 742545) to the firstauthor

References

[1] G Frege ldquoBegriffsschrift a formula language modeled uponthat of arithmetic for pure thoughtrdquo in From Frege to GodelA Source Book in Mathematical Logic 1879-1931 pp 1ndash82 1967

[2] B RussellAn Inquiry intoMeaning andTruth Allen andUnwinLondon UK 1942

[3] R Montague ldquoThe proper treatment of quantification in ordi-nary englishrdquo in Approaches to Natural Language pp 221ndash242Springer 1973

[4] N Chomsky Syntactic Structures vol 33 Mouton The Hague1957

[5] W J M Levelt Formal Grammars in Linguistics And Psycholin-guistics Volume 1 An Introduction to the Theory of FormalLanguages and Automata Volume 2 Applications in LinguisticTheory Volume 3 Psycholinguistic Applications John BenjaminsPublishing 2008

[6] D F Kleinschmidt and T Florian Jaeger ldquoRobust speechperception Recognize the familiar generalize to the similar andadapt to the novelrdquo Psychological Review vol 122 no 2 pp 148ndash203 2015

[7] L Breiman ldquoRandom forestsrdquo Machine Learning vol 45 no 1pp 5ndash32 2001

[8] C Strobl A-L Boulesteix A Zeileis and T Hothorn ldquoBiasin random forest variable importance measures illustrationssources and a solutionrdquo BMC Bioinformatics vol 8 2007

[9] A Hannun C Case J Casper et al ldquoDeep speechScaling up end-to-end speech recognitionrdquo 2014httpsarxivorgabs14125567

[10] Z Tuske P Golik R Schluter and H Ney ldquoAcoustic modelingwith deep neural networks using raw time signal for lvcsrrdquo inProceedings of the INTERSPEECH pp 890ndash894 2014

[11] J Schmidhuber ldquoDeep learning in neural networks anoverviewrdquo Neural Networks vol 61 pp 85ndash117 2015

[12] T Hannagan J C Ziegler S Dufau J Fagot and J GraingerldquoDeep learning of orthographic representations in baboonsrdquoPLoS ONE vol 9 no 1 2014

[13] J Grainger S Dufau M Montant J C Ziegler and J FagotldquoOrthographic processing in baboons (Papio papio)rdquo Sciencevol 336 no 6078 pp 245ndash248 2012

[14] D Scarf K Boy A U Reinert J Devine O Gunturkun andM Colombo ldquoOrthographic processing in pigeons (columba

livia)rdquo Proceedings of the National Academy of Sciences vol 113no 40 pp 11272ndash11276 2016

[15] M Linke F Broker M Ramscar and R H Baayen ldquoArebaboons learning ldquoorthographicrdquo representations Probablynotrdquo PLoS ONE vol 12 no 8 2017

[16] D H Hubel and T N Wiesel ldquoReceptive fields binocularinteraction and functional architecture in the catrsquos visualcortexrdquo The Journal of Physiology vol 160 pp 106ndash154 1962

[17] R H Baayen and P Hendrix ldquoTwo-layer networks non-linear separation and human learningrdquo in From Semantics toDialectometry Festschrift in honor of JohnNerbonne Tributes 32M Wieling M Kroon G Van Noord and G Bouma Eds pp13ndash22 College Publications 2017

[18] D Arnold F Tomaschek K Sering F Lopez and R H BaayenldquoWords from spontaneous conversational speech can be rec-ognized with human-like accuracy by an error-driven learningalgorithm that discriminates between meanings straight fromsmart acoustic features bypassing the phoneme as recognitionunitrdquo PLoS ONE vol 12 no 4 2017

[19] P Zwitserlood ldquoProcessing and representation of morphologi-cal complexity in native language comprehension and produc-tionrdquo in The Construction of Words G E Booij Ed pp 583ndash602 Springer 2018

[20] G Booij ldquoConstruction morphologyrdquo Language and LinguisticsCompass vol 4 no 7 pp 543ndash555 2010

[21] J P Blevins ldquoWord-based morphologyrdquo Journal of Linguisticsvol 42 no 3 pp 531ndash573 2006

[22] G Stump Inflectional Morphology A Theory of ParadigmStructure Cambridge University Press 2001

[23] R Beard ldquoOn the extent and nature of irregularity in thelexiconrdquo Lingua vol 42 no 4 pp 305ndash341 1977

[24] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1974

[25] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1991

[26] C Hockett ldquoThe origin of speechrdquo Scientific American vol 203pp 89ndash97 1960

[27] M Erelt Estonian Language M Erelt Ed Estonian AcademyPublishers Tallinn Estonia 2003

[28] P H Matthews Grammatical Theory in the United States fromBloomfield to Chomsky Cambridge University Press Cam-bridge 1993

[29] N Chomsky andMHalleTheSound Pattern of English Harperand Row NY USA 1968

[30] T Landauer and S Dumais ldquoA solution to Platorsquos problem Thelatent semantic analysis theory of acquisition induction andrepresentation of knowledgerdquo Psychological Review vol 104 no2 pp 211ndash240 1997

[31] W Weaver ldquoTranslationrdquo in Machine Translation of LanguagesFourteen Essays W N Locke and A D Booth Eds pp 15ndash23MIT Press Cambridge 1955

[32] J R Firth Selected Papers of J R Firth 1952-59 IndianaUniversity Press 1968

[33] K Lund and C Burgess ldquoProducing high-dimensional seman-tic spaces from lexical co-occurrencerdquo Behavior Research Meth-ods Instruments and Computers vol 28 no 2 pp 203ndash2081996

[34] C Shaoul and C Westbury ldquoExploring lexical co-occurrencespace using HiDExrdquo Behavior Research Methods vol 42 no 2pp 393ndash413 2010

36 Complexity

[35] T Mikolov I Sutskever K Chen G Corrado and J DeanldquoDistributed representations of words and phrases and theircompositionalityrdquo inAdvances in Neural Information ProcessingSystems pp 3111ndash3119 2013

[36] A Roelofs ldquoThe WEAVER model of word-form encoding inspeech productionrdquo Cognition vol 64 no 3 pp 249ndash284 1997

[37] W J M Levelt A Roelofs and A S Meyer ldquoA theory of lexicalaccess in speech productionrdquoBehavioral and Brain Sciences vol22 no 1 pp 1ndash38 1999

[38] M Taft ldquoInteractive-activation as a Framework for Under-standing Morphological Processingrdquo Language and CognitiveProcesses vol 9 no 3 pp 271ndash294 1994

[39] R Schreuder and R H Baayen ldquoModeling morphologicalprocessingrdquo inMorphological Aspects of Language Processing LB Feldman Ed pp 131ndash154 Lawrence ErlbaumHillsdale NewJersey USA 1995

[40] B C Love D L Medin and T M Gureckis ldquoSUSTAIN ANetwork Model of Category Learningrdquo Psychological Reviewvol 111 no 2 pp 309ndash332 2004

[41] C J Marsolek ldquoWhat antipriming reveals about primingrdquoTrends in Cognitive Sciences vol 12 no 5 pp 176ndash181 2008

[42] M Ramscar and R Port ldquoCategorization (without categories)rdquoin Handbook of Cognitive Linguistics E Dabrowska and DDivjak Eds pp 75ndash99 De Gruyter Berlin Germany 2015

[43] R A Rescorla and A R Wagner ldquoA theory of Pavlovianconditioning Variations in the effectiveness of reinforcementand nonreinforcementrdquo in Classical Conditioning II CurrentResearch and Theory A H Black and W F Prokasy Eds pp64ndash99 Appleton Century Crofts NY USA 1972

[44] B Widrow and M E Hoff ldquoAdaptive switching circuitsrdquo 1960WESCON Convention Record Part IV pp 96ndash104 1960

[45] R H Baayen R Schreuder and R Sproat ldquoMorphology inthe Mental Lexicon A Computational Model for Visual WordRecognitionrdquo in Lexicon Development for Speech and LanguageProcessing F van Eynde and D Gibbon Eds pp 267ndash291Kluwer Academic Publishers 2000

[46] M Taft ldquoA morphological-decomposition model of lexicalrepresentationrdquo Linguistics vol 26 no 4 pp 657ndash667 1988

[47] MWHarmandM S Seidenberg ldquoComputing themeanings ofwords in reading Cooperative division of labor between visualand phonological processesrdquo Psychological Review vol 111 no3 pp 662ndash720 2004

[48] P C Trimmer J M McNamara A Houston and J A RMarshall ldquoDoes natural selection favour the Rescorla-Wagnerrulerdquo Journal of Theoretical Biology vol 302 pp 39ndash52 2012

[49] R R Miller R C Barnet and N J Grahame ldquoAssessment ofthe Rescorla-Wagner modelrdquo Psychological Bulletin vol 117 no3 pp 363ndash386 1995

[50] L J Kamin ldquoPredictability surprise attention and condition-ingrdquo in Punishment and Aversive Behavior B A Campbell andRMChurch Eds pp 276ndash296Appleton-Century-CroftsNYUSA 1969

[51] R A Rescorla ldquoPavlovian conditioning Itrsquos not what you thinkit isrdquo American Psychologist vol 43 no 3 pp 151ndash160 1988

[52] M Ramscar D Yarlett M Dye K Denny and K ThorpeldquoThe Effects of Feature-Label-Order and Their Implications forSymbolic Learningrdquo Cognitive Science vol 34 no 6 pp 909ndash957 2010

[53] M Ramscar and D Yarlett ldquoLinguistic self-correction in theabsence of feedback A new approach to the logical problem oflanguage acquisitionrdquo Cognitive Science vol 31 no 6 pp 927ndash960 2007

[54] R H Baayen P Milin D F ETHurđevic P Hendrix and MMarelli ldquoAn Amorphous Model for Morphological Process-ing in Visual Comprehension Based on Naive DiscriminativeLearningrdquo Psychological Review vol 118 no 3 pp 438ndash4812011

[55] F M Del Prado Martın R Bertram T Haikio R Schreuderand R H Baayen ldquoMorphological family size in a morphologi-cally rich languageThe case of Finnish compared to Dutch andHebrewrdquo Journal of Experimental Psychology LearningMemoryand Cognition vol 30 pp 1271ndash1278 2004

[56] P Milin D F Durdevic and F M Del Prado Martın ldquoThesimultaneous effects of inflectional paradigms and classes onlexical recognition Evidence from Serbianrdquo Journal of Memoryand Language vol 60 no 1 pp 50ndash64 2009

[57] B K Bergen ldquoThe psychological reality of phonaesthemesrdquoLanguage vol 80 no 2 pp 290ndash311 2004

[58] P Milin L B Feldman M Ramscar P Hendrix and R HBaayen ldquoDiscrimination in lexical decisionrdquo PLoS ONE vol 12no 2 2017

[59] R H Baayen P Milin and M Ramscar ldquoFrequency in lexicalprocessingrdquo Aphasiology vol 30 no 11 pp 1174ndash1220 2016

[60] M Marelli and M Baroni ldquoAffixation in semantic space Mod-eling morpheme meanings with compositional distributionalsemanticsrdquo Psychological Review vol 122 no 3 pp 485ndash5152015

[61] T Sering PMilin andRH Baayen ldquoLanguage comprehensionas amultiple label classification problemrdquo StatisticaNeerlandicapp 1ndash15 2018

[62] R Kaye and RWilson Linear Algebra Oxford University Press1998

[63] S H Ivens and B L Koslin Demands for Reading LiteracyRequire New Accountability Methods Touchstone Applied Sci-ence Associates 1991

[64] T K Landauer P W Foltz and D Laham ldquoAn introduction tolatent semantic analysisrdquoDiscourse Processes vol 25 no 2-3 pp259ndash284 1998

[65] H Schmid ldquoImprovements in Part-of-Speech Tagging with anApplication to Germanrdquo in Proceedings of the ACL SIGDAT-Workshop pp 13ndash25 Dublin Ireland 1995

[66] R H Baayen R Piepenbrock and L Gulikers The CELEXLexical Database (CD-ROM) Linguistic Data ConsortiumUniversity of Pennsylvania Philadelphia PA 1995

[67] J Mitchell and M Lapata ldquoVector-based models of semanticcompositionrdquo in Proceedings of the ACL pp 236ndash244 2008

[68] A Lazaridou M Marelli R Zamparelli and M BaronildquoCompositionally derived representations of morphologicallycomplex words in distributional semanticsrdquo in Proceedings ofthe ACL (1) pp 1517ndash1526 2013

[69] R Cotterell H Schutze and J Eisner ldquoMorphological smooth-ing and extrapolation of word embeddingsrdquo in Proceedings ofthe 54th Annual Meeting of the Association for ComputationalLinguistics vol 1 pp 1651ndash1660 2016

[70] B D Zeller S Pado and J Snajder ldquoTowards semantic valida-tion of a derivational lexiconrdquo in Proceedings of the COLING2014 the 25th International Conference on Computational Lin-guistics Technical Papers pp 1728ndash1739 2014

[71] H Marchand The Categories and Types of Present-Day EnglishWord Formation A Synchronic-Diachronic Approach BeckrsquoscheVerlagsbuchhandlung Munchen Germany 1969

[72] L Bauer EnglishWord-Formation Cambridge University PressCambridge 1983

Complexity 37

[73] I PlagWord-Formation in English CambridgeUniversity PressCambridge 2003

[74] R Beard Lexeme-Morpheme Base Morphology A GeneralTheory of Inflection and Word Formation State University ofNew York Press Albany NY USA 1995

[75] K Geeraert J Newman and R H Baayen ldquoIdiom VariationExperimental Data and a Blueprint of a Computational ModelrdquoTopics in Cognitive Science vol 9 no 3 pp 653ndash669 2017

[76] C Shaoul S Bitschau N Schilling et al ldquondl2 Naive discrim-inative learning an implementation in R R packagerdquo 2015

[77] P Milin D Divjak and R H Baayen ldquoA learning perspectiveon individual differences in skilled reading Exploring andexploiting orthographic and semantic discrimination cuesrdquoJournal of Experimental Psychology Learning Memory andCognition vol 43 no 11 pp 1730ndash1751 2017

[78] M Ramscar P Hendrix C Shaoul P Milin and R H BaayenldquoNonlinear dynamics of lifelong learning the myth of cognitivedeclinerdquo Topics in Cognitive Science vol 6 pp 5ndash42 2014

[79] M Ramscar C C Sun P Hendrix and R H Baayen ldquoTheMis-measurement of Mind Life-Span Changes in Paired-Associate-Learning Scores Reflect the ldquoCostrdquo of Learning Not CognitiveDeclinerdquo Psychological Science vol 28 no 8 pp 1171ndash1179 2017

[80] G desRosiers and D Ivison ldquoPaired associate learning Form 1and form 2 of the Wechsler memory scalerdquo Archives of ClinicalNeuropsychology vol 3 no 1 pp 47ndash67 1988

[81] E Bruni N K Tran andM Baroni ldquoMultimodal distributionalsemanticsrdquo Journal of Artificial Intelligence Research vol 49 pp1ndash47 2014

[82] W N Venables and B D RipleyModern Applied Statistics withS-PLUS Springer NY USA 2002

[83] A B Warriner V Kuperman and M Brysbaert ldquoNorms ofvalence arousal and dominance for 13915 English lemmasrdquoBehavior Research Methods vol 45 no 4 pp 1191ndash1207 2013

[84] D Kastovsky ldquoProductivity in word formationrdquo Linguistics vol24 no 3 pp 585ndash600 1986

[85] R H Baayen Y Chuang and J P Blevins ldquoInflectionalmorphology with linear mappingsrdquoTheMental Lexicon vol 13no 2 pp 232ndash270 2018

[86] Burnage CELEX A Guide for Users Centre for Lexical Infor-mation Nijmegen The Netherlands 1988

[87] C McBride-Chang ldquoModels of Speech Perception and Phono-logical Processing in Readingrdquo Child Development vol 67 no4 pp 1836ndash1856 1996

[88] G C Van Orden H Kloos et al ldquoThe question of phonologyand readingrdquo inThe Science of Reading A Handbook pp 61ndash782005

[89] D Jared and K OrsquoDonnell ldquoSkilled adult readers activate themeanings of high-frequency words using phonology Evidencefrom eye trackingrdquoMemoryampCognition vol 45 no 2 pp 334ndash346 2017

[90] G S Dell ldquoA Spreading-Activation Theory of Retrieval inSentence Productionrdquo Psychological Review vol 93 no 3 pp283ndash321 1986

[91] A Marantz ldquoNo escape from morphemes in morphologicalprocessingrdquo Language and Cognitive Processes vol 28 no 7 pp905ndash916 2013

[92] M Bozic W D Marslen-Wilson E A Stamatakis M HDavis and L K Tyler ldquoDifferentiating morphology formand meaning Neural correlates of morphological complexityrdquoCognitive Neuroscience vol 19 no 9 pp 1464ndash1475 2007

[93] R F Port andA P Leary ldquoAgainst formal phonologyrdquoLanguagevol 81 no 4 pp 927ndash964 2005

[94] J P Blevins ldquoStems and paradigmsrdquo Language vol 79 no 4 pp737ndash767 2003

[95] R Forsyth and D Holmes ldquoFeature-finding for test classifica-tionrdquo Literary and Linguistic Computing vol 11 no 4 pp 163ndash174 1996

[96] H Pham and R H Baayen ldquoVietnamese compounds showan anti-frequency effect in visual lexical decisionrdquo LanguageCognition and Neuroscience vol 30 no 9 pp 1077ndash1095 2015

[97] L Cohen and S Dehaene ldquoVentral and dorsal contributions toword readingrdquo inTheCognitive Neurosciences M S GazzanigaEd pp 789ndash804 Massachusetts Institute of Technology 2009

[98] E Keuleers P Lacey K Rastle and M Brysbaert ldquoThe BritishLexicon Project Lexical decision data for 28730 monosyllabicand disyllabic English wordsrdquo Behavior Research Methods vol44 no 1 pp 287ndash304 2012

[99] R K Wagner J K Torgesen and C A Rashotte ldquoDevel-opment of reading-related phonological processing abilitiesnew evidence of bidirectional causality from a latent variablelongitudinal studyrdquoDevelopmental Psychology vol 30 no 1 pp73ndash87 1994

[100] K F EWong andH-C Chen ldquoOrthographic and phonologicalprocessing in reading Chinese text Evidence from eye fixa-tionsrdquo Language and Cognitive Processes vol 14 no 5-6 pp461ndash480 1999

[101] R L Newman D Jared and C A Haigh ldquoDoes phonologyplay a role when skilled readers read high-frequency wordsEvidence from ERPsrdquo Language and Cognitive Processes vol 27no 9 pp 1361ndash1384 2012

[102] D Jared J Ashby S J Agauas and B A Levy ldquoPhonologicalactivation of word meanings in grade 5 readersrdquo Journal ofExperimental Psychology LearningMemory andCognition vol42 no 4 pp 524ndash541 2016

[103] T Bitan A Kaftory A Meiri-Leib Z Eviatar and O PelegldquoPhonological ambiguity modulates resolution of semanticambiguity during reading An fMRI study of Hebrewrdquo Neu-ropsychology vol 31 no 7 pp 759ndash777 2017

[104] D Jared and S Bainbridge ldquoReading homophone puns Evi-dence from eye trackingrdquo Canadian Journal of ExperimentalPsychology vol 71 no 1 pp 2ndash13 2017

[105] S Amenta M Marelli and S Sulpizio ldquoFrom sound tomeaning Phonology-to-Semantics mapping in visual wordrecognitionrdquo Psychonomic Bulletin amp Review vol 24 no 3 pp887ndash893 2017

[106] M Perrone-Bertolotti J Kujala J R Vidal et al ldquoHow silent issilent reading intracerebral evidence for top-down activationof temporal voice areas during readingrdquo The Journal of Neuro-science vol 32 no 49 pp 17554ndash17562 2012

[107] B Yao P Belin andC Scheepers ldquoSilent reading of direct versusindirect speech activates voice-selective areas in the auditorycortexrdquo Journal of Cognitive Neuroscience vol 23 no 10 pp3146ndash3152 2011

[108] M Coltheart B Curtis P Atkins and M Haller ldquoMod-els of Reading Aloud Dual-Route and Parallel-Distributed-Processing Approachesrdquo Psychological Review vol 100 no 4pp 589ndash608 1993

[109] M Coltheart ldquoModeling reading The dual-route approachrdquo inThe Science of Reading A Handbook pp 6ndash23 2005

38 Complexity

[110] M Pitt K Johnson E Hume S Kiesling and W RaymondldquoThe Buckeye corpus of conversational speech Labeling con-ventions and a test of transcriber reliabilityrdquo Speech Communi-cation vol 45 no 1 pp 89ndash95 2005

[111] K Johnson ldquoMassive reduction in conversational AmericanEnglishrdquo in Proceedings of the Spontaneous Speech Data andAnalysis Proceedings of the 1st Session of the 10th InternationalSymposium pp 29ndash54 The National International Institute forJapanese Language Tokyo Japan 2004

[112] C Phillips ldquoLevels of representation in the electrophysiology ofspeech perceptionrdquo Cognitive Science vol 25 no 5 pp 711ndash7312001

[113] R L Diehl A J Lotto and L L Holt ldquoSpeech perceptionrdquoAnnual Review of Psychology vol 55 pp 149ndash179 2004

[114] D Norris and J M McQueen ldquoShortlist B a Bayesian model ofcontinuous speech recognitionrdquo Psychological Review vol 115no 2 pp 357ndash395 2008

[115] S Hawkins ldquoRoles and representations of systematic finephonetic detail in speech understandingrdquo Journal of Phoneticsvol 31 no 3-4 pp 373ndash405 2003

[116] CCucchiarini andH Strik ldquoAutomatic Phonetic TranscriptionAn overviewrdquo in Proceedings of the 15th ICPhS pp 347ndash350Barcelona Spain 2003

[117] K JohnsonTheAuditoryPerceptual Basis for Speech Segmenta-tion Ohio State University Working Papers in Linguistics 1997

[118] H Fletcher ldquoAuditory patternsrdquo Reviews of Modern Physics vol12 no 1 pp 47ndash65 1940

[119] D Arnold ldquoAcousticndlcoder Coding sound files for use withndl R package version 101rdquo 2017

[120] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2016

[121] J Pickett and I Pollack ldquoIntelligibility of excerpts from fluentspeech Effects of rate of utterance and duration of excerptrdquoLanguage and Speech vol 6 pp 151ndash164 1963

[122] L Shockey ldquoPerception of reduced forms by non-native speak-ers of Englishrdquo in Sound Patterns of Spontaneous Speech DDuez Ed pp 97ndash100 ESCA Aix 1998

[123] M Ernestus H Baayen and R Schreuder ldquoThe recognition ofreduced word formsrdquo Brain and Language vol 81 no 1-3 pp162ndash173 2002

[124] R H Baayen C Shaoul J Willits and M Ramscar ldquoCom-prehension without segmentation a proof of concept withnaive discriminative learningrdquo Language Cognition and Neu-roscience vol 31 no 1 pp 106ndash128 2016

[125] C P Browman and L Goldstein ldquoArticulatory phonology Anoverviewrdquo Phonetica vol 49 no 3-4 pp 155ndash180 1992

[126] J A Tourville and F H Guenther ldquoThe DIVA model A neuraltheory of speech acquisition and productionrdquo Language andCognitive Processes vol 26 no 7 pp 952ndash981 2011

[127] G Csardi and T Nepusz ldquoThe igraph software package forcomplex network researchrdquo InterJournal Complex Systems vol1695 no 5 pp 1ndash9 2006

[128] M Ernestus and R H Baayen ldquoPredicting the unpredictableInterpreting neutralized segments in Dutchrdquo Language vol 79no 1 pp 5ndash38 2003

[129] R Weingarten G Nottbusch and U Will ldquoMorphemes sylla-bles and graphemes in written word productionrdquo inMultidisci-plinary Approaches to Speech Production T Pechmann and CHabel Eds pp 529ndash572 Mouton de Gruyter Berlin Germany2004

[130] R Bertram F E Toslashnnessen S Stromqvist J Hyona andP Niemi ldquoCascaded processing in written compound wordproductionrdquo Frontiers in Human Neuroscience vol 9 pp 1ndash102015

[131] T Cho ldquoEffects of morpheme boundaries on intergesturaltiming Evidence from Koreanrdquo Phonetica vol 58 no 3 pp129ndash162 2001

[132] S N Wood Generalized Additive Models Chapman ampHallCRC NY USA 2017

[133] M Seidenberg ldquoSublexical structures in visual word recogni-tion Access units or orthographic redundancyrdquo in Attentionand Performance XII M Coltheart Ed pp 245ndash264 LawrenceErlbaum Associates Hove 1987

[134] J M McQueen ldquoSegmentation of continuous speech usingphonotacticsrdquo Journal of Memory and Language vol 39 no 1pp 21ndash46 1998

[135] J B Hay Causes and Consequences of Word Structure Rout-ledge NY USA London UK 2003

[136] J B Hay ldquoFrom speech perception to morphology Affix-ordering revisitedrdquo Language vol 78 pp 527ndash555 2002

[137] J B Hay and R H Baayen ldquoPhonotactics parsing and produc-tivityrdquo Italian Journal of Linguistics vol 15 no 1 pp 99ndash1302003

[138] J Hay J Pierrehumbert and M Beckman ldquoSpeech perceptionwell-formedness and the statistics of the lexiconrdquo Papers inlaboratory phonology VI pp 58ndash74 2004

[139] M Ferro C Marzi and V Pirrelli ldquoA self-organizing modelof word storage and processing implications for morphologylearningrdquo Lingue e Linguaggio vol 10 no 2 pp 209ndash226 2011

[140] F ChersiM Ferro G Pezzulo andV Pirrelli ldquoTopological self-organization and prediction learning support both action andlexical chains in the brainrdquo Topics in Cognitive Science vol 6no 3 pp 476ndash491 2014

[141] V Pirrelli M Ferro and C Marzi ldquoComputational complexityof abstractive morphologyrdquo in Understanding and MeasuringMorphological Complexity M Bearman D Brown and G GCorbett Eds pp 141ndash166 Oxford University Press 2015

[142] C Marzi M Ferro and V Pirrelli ldquoIs inflectional irregularitydysfunctional to human processingrdquo in Abstract Booklet TheMental Lexicon 2018 V Kuperman Ed University of AlbertaEdmonton 2018

[143] E Keuleers M Stevens P Mandera and M Brysbaert ldquoWordknowledge in the crowd Measuring vocabulary size and wordprevalence in a massive online experimentrdquo The QuarterlyJournal of Experimental Psychology vol 68 no 8 pp 1665ndash16922015

[144] M Ernestus Voice Assimilation And Segment Reduction inCasual Dutch A Corpus-Based Study of the Phonology-PhoneticsInterface LOT Utrecht the Netherlands 2000

[145] R Kemps M Ernestus R Schreuder and H Baayen ldquoProcess-ing reducedword formsThe suffix restoration effectrdquoBrain andLanguage vol 90 no 1-3 pp 117ndash127 2004

[146] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) vol 1 pp 886ndash893 June 2005

[147] G Hickok ldquoThe architecture of speech production and the roleof the phoneme in speech processingrdquo Language Cognition andNeuroscience vol 29 no 1 pp 2ndash20 2014

[148] B V Tucker M Sims and R H Baayen Opposing Forceson Acoustic Duration Manuscript University of Alberta andUniversity of Tubingen 2018

Complexity 39

[149] I Plag J Homann and G Kunter ldquoHomophony and mor-phology The acoustics of word-final S in Englishrdquo Journal ofLinguistics vol 53 no 1 pp 181ndash216 2017

[150] F Tomaschek I Plag M Ernestus and R H BaayenModeling the Duration of Word-Final s in English withNaive Discriminative Learning Manuscript University ofSiegenTubingenNijmegen 2018

[151] R E Kalman ldquoA new approach to linear filtering and predictionproblemsrdquo Journal of Fluids Engineering vol 82 no 1 pp 35ndash451960

[152] B Russell ldquoOn denotingrdquo Mind vol 14 no 56 pp 479ndash4931905

[153] N Hornstein Logical Form FromGB toMinimalism Blackwell1995

[154] R Jackendoff Semantic StructuresMITPress Cambridge 1990[155] J J McCarthy ldquoA prosodic theory of non-concatenative mor-

phologyrdquo Linguistic Inquiry vol 12 pp 373ndash418 1981[156] A Ussishkin ldquoA Fixed Prosodic Theory of Nonconcatenative

Templaticmorphologyrdquo Natural Language amp Linguistic Theoryvol 23 no 1 pp 169ndash218 2005

[157] A Ussishkin ldquoAffix-favored contrast inequity and psycholin-guistic grounding for non-concatenative morphologyrdquo Mor-phology vol 16 no 1 pp 107ndash125 2006

[158] RW Young andWMorganTheNavajo Language A Grammarand Colloquial Dictionary University of New Mexico Press1980

[159] S C Levinson and A Majid ldquoThe island of time Yelı Dnye thelanguage of Rossel islandrdquo Frontiers in psychology vol 4 2013

[160] S C Levinson ldquoThe language of space in Yelı Dnyerdquo inGrammars of Space Explorations in Cognitive Diversity pp 157ndash203 Cambridge University Press 2006

[161] E Shafaei-Bajestan and R H Baayen ldquoWide Learning forAuditory Comprehensionrdquo in Proceedings of the Interspeech2018 2018

[162] J Hay and K Drager ldquoStuffed toys and speech perceptionrdquoLinguistics vol 48 no 4 pp 865ndash892 2010

[163] V Henri and C Henri ldquoOn our earliest recollections ofchildhoodrdquo Psychological Review vol 2 pp 215-216 1895

[164] S Freud ldquoChildhood and concealing memoriesrdquo in The BasicWritings of Sigmund Freud The Modern Library NY USA19051953

[165] P J Bauer and M Larkina ldquoThe onset of childhood amnesiain childhood A prospective investigation of the course anddeterminants of forgetting of early-life eventsrdquoMemory vol 22no 8 pp 907ndash924 2014

[166] M S Seidenberg and L M Gonnerman ldquoExplaining deriva-tional morphology as the convergence of codesrdquo Trends inCognitive Sciences vol 4 no 9 pp 353ndash361 2000

[167] M Coltheart K Rastle C Perry R Langdon and J ZieglerldquoDRC a dual route cascaded model of visual word recognitionand reading aloudrdquoPsychological Review vol 108 no 1 pp 204ndash256 2001

[168] M V Butz and E F Kutter How the Mind Comes into BeingIntroducing Cognitive Science from a Functional and Computa-tional Perspective Oxford University Press 2016

[169] M W Harm and M S Seidenberg ldquoPhonology readingacquisition and dyslexia Insights from connectionist modelsrdquoPsychological Review vol 106 no 3 pp 491ndash528 1999

[170] L M Gonnerman M S Seidenberg and E S AndersenldquoGraded semantic and phonological similarity effects in prim-ing Evidence for a distributed connectionist approach to

morphologyrdquo Journal of Experimental Psychology General vol136 no 2 pp 323ndash345 2007

[171] H Zhao O Gallo I Frosio and J Kautz ldquoLoss Functions forImage Restoration With Neural Networksrdquo IEEE Transactionson Computational Imaging vol 3 no 1 pp 47ndash57 2017

[172] C Clarke and P Luce ldquoPerceptual adaptation to speakercharacteristics Vot boundaries in stop voicing categorizationrdquoin Proceedings of the ISCA Workshop on Plasticity in SpeechPerception 2005

[173] DNorris JMMcQueen andACutler ldquoPerceptual learning inspeechrdquo Cognitive Psychology vol 47 no 2 pp 204ndash238 2003

[174] A Schweitzer and N Lewandowski ldquoSocial factors in conver-gence of f1 and f2 in spontaneous speechrdquo in Proceedings ofthe 10th International Seminar on Speech Production CologneGermany 2014

[175] M J Pickering and S Garrod ldquoToward a mechanistic psychol-ogy of dialoguerdquo Behavioral and Brain Sciences vol 27 no 2 pp169ndash190 2004

[176] C M Clarke and M F Garrett ldquoRapid adaptation to foreign-accented Englishrdquo The Journal of the Acoustical Society ofAmerica vol 116 no 6 pp 3647ndash3658 2004

[177] W J Levelt ldquoDie konnektionistische moderdquo Sprache UndKognition vol 10 no 2 pp 61ndash72 1991

[178] A Kostic ldquoInformational load constraints on processinginflected morphologyrdquo in Morphological Aspects of LanguageProcessing L B Feldman Ed Lawrence Erlbaum Inc Publish-ers New Jersey USA 1995

[179] J P BlevinsWord andParadigmMorphology OxfordUniversityPress 2016

[180] E Forstemann Uber Deutsche volksetymologie Zeitschrift furvergleichende Sprachforschung auf dem Gebiete des DeutschenGriechischen und Lateinischen 1852

[181] K Nault Morphological Therapy Protocal [Ph D thesis] Uni-versity of Alberta Edmonton 2010

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 5: The Discriminative Lexicon: A Unified Computational Model ...downloads.hindawi.com/journals/complexity/2019/4895891.pdf · as developed within computational linguistics and com-puter

Complexity 5

C =aa ab bb

aaaaababbabb

( 11000111

0011 ) (2)

The third and fourth words are homographs they shareexactly the same bigrams We next define a target matrix Sthat defines for each of the four words the correspondinglexome 120582 This is done by setting one bit to 1 and all otherbits to zero in the row vectors of this matrix

S =1205821 1205822 1205823 1205824

aaaaababbabb

( 10000100

00100001 ) (3)

The problem that arises at this point is that the word formsjointly set up a space with three dimensions whereas thetargeted lexomes set up a four-dimensional space This isproblematic for the following reason A linear mapping of alower-dimensional spaceA onto a higher-dimensional spaceB will result in a subspace of B with a dimensionality thatcannot be greater than that of A [62] If A is a space in R2

and B is a space in R3 a linear mapping of A onto B willresult in a plane inB All the points inB that are not on thisplane cannot be reached fromA with the linear mapping

As a consequence it is impossible for ndl to perfectlydiscriminate between the four lexomes (which set up a four-dimensional space) given their bigram cues (which jointlydefine a three-dimensional space) For the input bb thenetwork therefore splits its support equally over the lexomes1205823 and 1205824 The transformation matrix F is

F = C1015840S = 1205821 1205822 1205823 1205824aaabbb

( 1minus1101minus1

00050005 ) (4)

and the matrix with predicted vectors S is

S = CF =1205821 1205822 1205823 1205824

aaaaababbabb

( 10000100

000505000505 ) (5)

When a cuematrixC and a target matrix S are set up for largenumbers of words the dimension of the ndl cue space willbe substantially smaller than the space of S the dimensionof which is equal to the number of lexomes Thus in the set-up of naive discriminative learning we do take into accountthat words tend to have similar forms but we do not take into

account that words are also similar in meaning By settingup S as completely orthogonal we are therefore making itunnecessarily hard to relate form to meaning

This study therefore implements discrimination learningwith semantic vectors of reals replacing the one-bit-on rowvectors of the target matrix S By doing so we properlyreduce the dimensionality of the target space which in turnmakes discriminative learningmore accurateWewill refer tothis new implementation of discrimination learning as lineardiscriminative learning (ldl) instead of naive discriminativelearning as the outcomes are no longer assumed to beindependent and the networks aremathematically equivalentto linear mappings onto continuous target matrices

25 Linear Transformations and Proportional Analogy Acentral concept in word and paradigm morphology is thatthe form of a regular inflected form stands in a relationof proportional analogy to other words in the inflectionalsystem As explained by Matthews ([25] p 192f)

In effect we are predicting the inflections of servusby analogy with those of dominus As GenitiveSingular domini is to Nominative Singular domi-nus so x (unknown) must be to NominativeSingular servus What then is x Answer it mustbe servi In notation dominus domini = servusservi ([25] p 192f)

Here form variation is associated with lsquomorphosyntacticfeaturesrsquo Such features are often naively assumed to functionas proxies for a notion of lsquogrammatical meaningrsquo Howeverin word and paradigm morphology these features actu-ally represent something more like distribution classes Forexample the accusative singular forms of nouns belongingto different declensions will typically differ in form and beidentified as the lsquosamersquo case in different declensions by virtueof distributional parallels The similarity in meaning thenfollows from the distributional hypothesis which proposesthat linguistic items with similar distributions have similarmeanings [31 32] Thus the analogy of forms

dominus domini = servus servi (6)

is paralleled by an analogy of distributions 119889119889 (dominus) 119889 (domini) = 119889 (servus) 119889 (servi) (7)

Borrowing notational conventions of matrix algebra we canwrite

(dominusdominiservusservi

) sim (119889 (dominus)119889 (domini)119889 (servus)119889 (servi) ) (8)

In this study we operationalize the proportional analogy ofword and paradigmmorphology by replacing the word formsin (8) by vectors of features that are present or absent in aword and by replacing wordsrsquo distributions by semantic vec-tors Linear mappings between form and meaning formalize

6 Complexity

two distinct proportional analogies one analogy for goingfrom form to meaning and a second analogy for going frommeaning to form Consider for instance a semantic matrix Sand a form matrix C with rows representing words

S = (1198861 11988621198871 1198872)C = (1199011 11990121199021 1199022) (9)

The transformation matrix G that maps S onto C111988611198872 minus 11988621198871 ( 1198872 minus1198862minus1198871 1198861 )⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟Sminus1

(1199011 11990121199022 1199022)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟C= 111988611198872 minus 11988621198871 [( 11988721199011 11988721199012minus11988711199011 minus11988711199012) + (minus11988621199021 minus1198862119902211988611199021 11988611199022 )]⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

G

(10)

can be written as the sum of two matrices (both of whichare scaled by 11988611198872 minus 11988621198871) The first matrix takes the firstform vector of C and weights it by the elements of thesecond semantic vector The second matrix takes the secondform vector of C and weights this by the elements of thefirst semantic vector of S Thus with this linear mapping apredicted form vector is a semantically weighted mixture ofthe form vectors of C

The remainder of this paper is structured as follows Wefirst introduce the semantic vector space S that we will beusing which we derived from the tasa corpus [63 64] Nextwe show that we canmodel word comprehension bymeans ofa matrix (or network) F that transforms the cue row vectorsof C into the semantic row vectors of S ie CF = S We thenshow that we can model the production of word forms giventheir semantics by a transformation matrix G ie SG = CwhereC is a matrix specifying for each word which triphonesit contains As the network is informative only about whichtriphones are at issue for a word but remains silent abouttheir order an algorithm building on graph theory is usedto properly order the triphones All models are evaluated ontheir accuracy and are also tested against experimental data

In the general discussion we will reflect on the conse-quences for linguistic theory of our finding that it is indeedpossible to model the lexicon and lexical processing usingsystemic discriminative learning without requiring lsquohiddenunitsrsquo such as morphemes and stems

3 A Semantic Vector Space Derived fromthe TASA Corpus

The first part of this section describes how we constructedsemantic vectors Specific to our approach is that we con-structed semantic vectors not only for content words butalso for inflectional and derivational functions The semanticvectors for these functions are of especial importance forthe speech production component of the model The second

part of this section addresses the validation of the semanticvectors for which we used paired associate learning scoressemantic relatedness ratings and semantic plausibility andsemantic transparency ratings for derived words

31 Constructing the Semantic Vector Space The semanticvector space that we use in this study is derived from thetasa corpus [63 64] We worked with 752130 sentencesfrom this corpus to a total of 10719386 word tokens Weused the treetagger software [65] to obtain for eachword token its stem and a part of speech tag Compoundswritten with a space such as apple pie and jigsaw puzzlewere when listed in the celex lexical database [66] joinedinto one onomasiological unit Information about wordsrsquomorphological structure was retrieved from celex

In computational morphology several options have beenexplored for representing themeaning of affixesMitchell andLapata [67] proposed to model the meaning of an affix as avector that when added to the vector b of a base word givesthe vector d of the derived word They estimated this vectorby calculating d119894 minus b119894 for all available pairs 119894 of base andderived words and taking the average of the resulting vectorsLazaridou et al [68] andMarelli and Baroni [60]modeled thesemantics of affixes by means of matrix operations that takeb as input and produce d as output ie

d = Mb (11)

whereas Cotterell et al [69] constructed semantic vectorsas latent variables in a Gaussian graphical model Whatthese approaches have in common is that they derive orimpute semantic vectors given the semantic vectors of wordsproduced by algorithms such as word2vec algorithmswhichwork with (stemmed) words as input units

From the perspective of discriminative linguistics how-ever it does not make sense to derive one meaning fromanother Furthermore rather than letting writing conven-tions dictate what units are accepted as input to onersquosfavorite tool for constructing semantic vectors includingnot and again but excluding the prefixes un- in- and re-we included not only lexomes for content words but alsolexomes for prefixes and suffixes as units for constructinga semantic vector space As a result semantic vectors forprefixes and suffixes are obtained straightforwardly togetherwith semantic vectors for content words

To illustrate this procedure consider the sentence theboysrsquo happiness was great to observe Standard methods willapply stemming and remove stop words resulting in thelexomes boy happiness great and observe being theinput to the algorithm constructing semantic vectors fromlexical co-occurrences In our approach by contrast thelexomes considered are the boy happiness be greatto and observe and in addition plural ness past Stopwords are retained (although including function words mayseem counterproductive from the perspective of semanticvectors in natural language processing applications they areretained in the present approach for two reasons First sincein our model semantic vectors drive speech production inorder to model the production of function words semantic

Complexity 7

vectors for function words are required Second althoughthe semantic vectors of function words typically are lessinformative (they typically have small association strengthsto very large numbers of words) they still show structureas illustrated in Appendix C for pronouns and prepositions)inflectional endings are replaced by the inflectional functionsthey subserve and for derived words the semantic functionof the derivational suffix is identified as well In what followswe outline in more detail what considerations motivate thisway of constructing the input for our algorithm constructingsemantic vectors We note here that our method for handlingmorphologically complex words can be combined with anyof the currently available algorithms for building semanticvectors We also note here that we are not concerned withthe question of how lexomes for plurality or tense might beinduced from word forms from world knowledge or anycombination of the two All we assume is first that anyoneunderstanding the sentence the boysrsquo happiness was great toobserve knows that more than one boy is involved and thatthe narrator situates the event in the past Second we takethis understanding to drive learning

We distinguished the following seven inflectional func-tions comparative and superlative for adjectives sin-gular and plural for nouns and past perfective con-tinuous persistence (denotes the persistent relevance ofthe predicate in sentences such as London is a big city) andperson3 (third person singular) for verbs

The semantics of derived words tend to be characterizedby idiosyncracies [70] that can be accompanied by formalidiosyncracies (eg business and resound) but often areformally unremarkable (eg heady with meanings as diverseas intoxicating exhilerating impetuous and prudent) Wetherefore paired derived words with their own content lex-omes as well as with a lexome for the derivational functionexpressed in the derivedwordWe implemented the followingderivational lexomes ordinal for ordinal numbers notfor negative un and in undo for reversative un other fornonnegative in and its allomorphs ion for ation ution itionand ion ee for ee agent for agents with er instrumentfor instruments with er impagent for impersonal agentswith er causer for words with er expressing causers (thedifferentiation of different semantic functions for er wasbased on manual annotation of the list of forms with er theassignment of a given form to a category was not informedby sentence context and hence can be incorrect) again forre and ness ity ism ist ic able ive ous ize enceful ish under sub self over out mis and dis for thecorresponding prefixes and suffixes This set-up of lexomes isinformed by the literature on the semantics of English wordformation [71ndash73] and targets affixal semantics irrespective ofaffixal forms (following [74])

This way of setting up the lexomes for the semantic vectorspace model illustrates an important aspect of the presentapproach namely that lexomes target what is understoodand not particular linguistic forms Lexomes are not unitsof form For example in the study of Geeraert et al [75]the forms die pass away and kick the bucket are all linkedto the same lexome die In the present study we have notattempted to identify idioms and likewise no attempt was

made to disambiguate homographs These are targets forfurther research

In the present study we did implement the classicaldistinction between inflection and word formation by rep-resenting inflected words with a content lexome for theirstem but derived words with a content lexome for the derivedword itself In this way the sentence scientists believe thatexposure to shortwave ultraviolet rays can cause blindnessis associated with the following lexomes scientist plbelieve persistence that exposure sg to shortwaveultraviolet ray can present cause blindness andness In our model sentences constitute the learning eventsie for each sentence we train the model to predict thelexomes in that sentence from the very same lexomes inthat sentence Sentences or for spoken corpora utterancesare more ecologically valid than windows of say two wordspreceeding and two words following a target word mdash theinterpretation of a word may depend on the company thatit keeps at long distances in a sentence We also did notremove any function words contrary to standard practicein distributional semantics (see Appendix C for a heatmapillustrating the kind of clustering observable for pronounsand prepositions) The inclusion of semantic vectors for bothaffixes and function words is necessitated by the generalframework of our model which addresses not only com-prehension but also production and which in the case ofcomprehension needs to move beyond lexical identificationas inflectional functions play an important role during theintegration of words in sentence and discourse

Only words with a frequency exceeding 8 occurrences inthe tasa corpus were assigned lexomes This threshold wasimposed in order to avoid excessive data sparseness whenconstructing the distributional vector space The number ofdifferent lexomes that met the criterion for inclusion was23562

We constructed a semantic vector space by training anndl network on the tasa corpus using the ndl2 packagefor R [76] Weights on lexome-to-lexome connections wererecalibrated sentence by sentence in the order in which theyappear in the tasa corpus using the learning rule of ndl(ie a simplified version of the learning rule of Rescorlaand Wagner [43] that has only two free parameters themaximum amount of learning 120582 (set to 1) and a learning rate120588 (set to 0001) This resulted in a 23562 times 23562 matrixSentence-based training keeps the carbon footprint of themodel down as the number of learning events is restrictedto the number of utterances (approximately 750000) ratherthan the number of word tokens (approximately 10000000)The row vectors of the resulting matrix henceforth S are thesemantic vectors that we will use in the remainder of thisstudy (work is in progress to further enhance the S matrixby including word sense disambiguation and named entityrecognition when setting up lexomes The resulting lexomicversion of the TASA corpus and scripts (in python as wellas in R) for deriving the S matrix will be made available athttpwwwsfsuni-tuebingendesimhbaayen)

The weights on themain diagonal of thematrix tend to behigh unsurprisingly as in each sentence on which the modelis trained each of the words in that sentence is an excellent

8 Complexity

predictor of that sameword occurring in that sentenceWhenthe focus is on semantic similarity it is useful to set the maindiagonal of this matrix to zero However for more generalassociation strengths between words the diagonal elementsare informative and should be retained

Unlike standard models of distributional semantics wedid not carry out any dimension reduction using forinstance singular value decomposition Because we do notwork with latent variables each dimension of our semanticspace is given by a column vector of S and hence eachdimension is linked to a specific lexomeThus the rowvectorsof S specify for each of the lexomes how well this lexomediscriminates between all the lexomes known to the model

Although semantic vectors of length 23562 can providegood results vectors of this length can be challenging forstatistical evaluation Fortunately it turns out that many ofthe column vectors of S are characterized by extremely smallvariance Such columns can be removed from the matrixwithout loss of accuracy In practice we have found it sufficestoworkwith approximately 4 to 5 thousand columns selectedcalculating column variances and using only those columnswith a variance exceeding a threshold value

32 Validation of the Semantic Vector Space As shown byBaayen et al [59] and Milin et al [58 77] measures basedon matrices such as S are predictive for behavioral measuressuch as reaction times in the visual lexical decision taskas well as for self-paced reading latencies In what followswe first validate the semantic vectors of S on two data setsone data set with accuracies in paired associate learningand one dataset with semantic similarity ratings We thenconsidered specifically the validity of semantic vectors forinflectional and derivational functions by focusing first onthe correlational structure of the pertinent semantic vectorsfollowed by an examination of the predictivity of the semanticvectors for semantic plausibility and semantic transparencyratings for derived words

321 PairedAssociate Learning Thepaired associate learning(PAL) task is a widely used test in psychology for evaluatinglearning and memory Participants are given a list of wordpairs tomemorize Subsequently at testing they are given thefirst word and have to recall the second wordThe proportionof correctly recalled words is the accuracy measure on thePAL task Accuracy on the PAL test decreases with age whichhas been attributed to cognitive decline over the lifetimeHowever Ramscar et al [78] and Ramscar et al [79] provideevidence that the test actually measures the accumulationof lexical knowledge In what follows we use the data onPAL performance reported by desRosiers and Ivison [80]We fitted a linear mixed model to accuracy in the PAL taskas a function of the Pearson correlation 119903 of paired wordsrsquosemantic vectors in S (but with weights on the main diagonalincluded and using the 4275 columns with highest variance)with random intercepts for word pairs sex and age as controlvariables and crucially an interaction of 119903 by age Given thefindings of Ramscar and colleagues we expect to find that theslope of 119903 (which is always negative) as a predictor of PAL

00 01 02 03 04 05r

010

2030

4050

simila

rity

ratin

g

Figure 1 Similarity rating in the MEN dataset as a function of thecorrelation 119903 between the row vectors in S of the words in the testpairs

accuracy increases with age (indicating decreasing accuracy)Table 1 shows that this prediction is born out For age group20ndash29 (the reference level of age) the slope for 119903 is estimatedat 031 For the next age level this slope is adjusted upward by012 and as age increases these upward adjustments likewiseincrease to 021 024 and 032 for age groups 40ndash49 50-59 and 60-69 respectively Older participants know theirlanguage better and hence are more sensitive to the semanticsimilarity (or lack thereof) of thewords thatmake up PAL testpairs For the purposes of the present study the solid supportfor 119903 as a predictor for PAL accuracy contributes to validatingthe row vectors of S as semantic vectors

322 Semantic Relatedness Ratings We also examined per-formance of S on the MEN test collection [81] that providesfor 3000 word pairs crowd sourced ratings of semanticsimilarity For 2267 word pairs semantic vectors are availablein S for both words Figure 1 shows that there is a nonlinearrelation between the correlation of wordsrsquo semantic vectorsin S and the ratings The plot shows as well that for lowcorrelations the variability in the MEN ratings is larger Wefitted a Gaussian location scale additive model to this dataset summarized in Table 2 which supported 119903 as a predictorfor both mean MEN rating and the variability in the MENratings

To put this performance in perspective wecollected the latent semantic analysis (LSA) similarityscores for the MEN word pairs using the website athttplsacoloradoedu The Spearman correlationfor LSA scores andMEN ratings was 0697 and that for 119903 was0704 (both 119901 lt 00001) Thus our semantic vectors performon a par with those of LSA a well-established older techniquethat still enjoys wide use in psycholinguistics Undoubtedlyoptimized techniques from computational linguistics such asword2vec will outperform our model The important pointhere is that even with training on full sentences rather thanusing small windows and even when including function

Complexity 9

Table 1 Linear mixed model fit to paired associate learning scores with the correlation 119903 of the row vectors in S of the paired words aspredictor Treatment coding was used for factors For the youngest age group 119903 is not predictive but all other age groups show increasinglylarge slopes compared to the slope of the youngest age group The range of 119903 is [minus488 minus253] hence larger values for the coefficients of 119903 andits interactions imply worse performance on the PAL task

A parametric coefficients Estimate Std Error t-value p-valueintercept 36220 06822 53096 lt 00001119903 03058 01672 18293 00691age=39 02297 01869 12292 02207age=49 04665 01869 24964 00135age=59 06078 01869 32528 00014age=69 08029 01869 42970 lt 00001sex=male -01074 00230 -46638 lt 00001119903age=39 01167 00458 25490 00117119903age=49 02090 00458 45640 lt 00001119903age=59 02463 00458 53787 lt 00001119903age=69 03239 00458 70735 lt 00001B smooth terms edf Refdf F-value p-valuerandom intercepts word pair 178607 180000 1282283 lt 00001

Table 2 Summary of a Gaussian location scale additive model fitted to the similarity ratings in the MEN dataset with as predictor thecorrelation 119903 of wordrsquos semantic vectors in the Smatrix TPRS thin plate regression spline b theminimum standard deviation for the logb linkfunction Location parameter estimating the mean scale parameter estimating the variance through the logb link function (120578 = log(120590 minus 119887))A parametric coefficients Estimate Std Error t-value p-valueIntercept [location] 252990 01799 1406127 lt 00001Intercept [scale] 21297 00149 1430299 lt 00001B smooth terms edf Refdf F-value p-valueTPRS 119903 [location] 83749 88869 27662399 lt 00001TPRS 119903 [scale] 59333 70476 875384 lt 00001

words performance of our linear network with linguisticallymotivated lexomes is sufficiently high to serve as a basis forfurther analysis

323 Correlational Structure of Inflectional and DerivationalVectors Now that we have established that the semantic vec-tor space S obtainedwith perhaps the simplest possible error-driven learning algorithm discriminating between outcomesgiven multiple cues (1) indeed captures semantic similaritywe next consider how the semantic vectors of inflectional andderivational lexomes cluster in this space Figure 2 presents aheatmap for the correlation matrix of the functional lexomesthat we implemented as described in Section 31

Nominal and verbal inflectional lexomes as well as adver-bial ly cluster in the lower left corner and tend to be eithernot correlated with derivational lexomes (indicated by lightyellow) or to be negatively correlated (more reddish colors)Some inflectional lexomes however are interspersed withderivational lexomes For instance comparative superla-tive and perfective form a small subcluster togetherwithin the group of derivational lexomes

The derivational lexomes show for a majority of pairssmall positive correlationsand stronger correlations in thecase of the verb-forming derivational lexomes mis over

under and undo which among themselves show thevstrongest positive correlations of allThe inflectional lexomescreating abstract nouns ion ence ity and ness also form asubcluster In other words Figure 2 shows that there is somestructure to the distribution of inflectional and derivationallexomes in the semantic vector space

Derived words (but not inflected words) have their owncontent lexomes and hence their own row vectors in S Do thesemantic vectors of these words cluster according to the theirformatives To address this question we extracted the rowvectors from S for a total of 3500 derivedwords for 31 differentformatives resulting in a 3500 times 23562 matrix 119863 sliced outof S To this matrix we added the semantic vectors for the 31formatives We then used linear discriminant analysis (LDA)to predict a lexomersquos formative from its semantic vector usingthe lda function from the MASS package [82] (for LDA towork we had to remove columns from D with very smallvariance as a consequence the dimension of the matrix thatwas the input to LDA was 3531 times 814) LDA accuracy for thisclassification task with 31 possible outcomes was 072 All 31derivational lexomes were correctly classified

When the formatives are randomly permuted thus break-ing the relation between the semantic vectors and theirformatives LDA accuracy was on average 0528 (range across

10 Complexity

SGPR

ESEN

TPE

RSIS

TEN

CE PLCO

NTI

NU

OU

SPA

ST LYG

ENPE

RSO

N3

PASS

IVE

CAN

FUTU

REU

ND

OU

ND

ERO

VER MIS

ISH

SELF IZ

EIN

STRU

MEN

TD

IS IST

OU

GH

T EE ISM

SUB

COM

PARA

TIV

ESU

PERL

ATIV

EPE

RFEC

TIV

EPE

RSO

N1

ION

ENCE IT

YN

ESS Y

SHA

LLO

RDIN

AL

OU

TAG

AIN

FUL

LESS

ABL

EAG

ENT

MEN

T ICO

US

NO

TIV

E

SGPRESENTPERSISTENCEPLCONTINUOUSPASTLYGENPERSON3PASSIVECANFUTUREUNDOUNDEROVERMISISHSELFIZEINSTRUMENTDISISTOUGHTEEISMSUBCOMPARATIVESUPERLATIVEPERFECTIVEPERSON1IONENCEITYNESSYSHALLORDINALOUTAGAINFULLESSABLEAGENTMENTICOUSNOTIVE

minus02 0 01 02Value

020

060

010

00Color Key

and Histogram

Cou

nt

Figure 2 Heatmap for the correlation matrix of the row vectors in S of derivational and inflectional function lexomes (individual words willbecome legible by zooming in on the figure at maximal magnification)

10 permutations 0519ndash0537) From this we conclude thatderived words show more clustering in the semantic space ofS than can be expected under randomness

How are the semantic vectors of derivational lexomespositioned with respect to the clusters of their derived wordsTo address this question we constructed heatmaps for thecorrelation matrices of the pertinent semantic vectors Anexample of such a heatmap is presented for ness in Figure 3Apart from the existence of clusters within the cluster of nesscontent lexomes it is striking that the ness lexome itself isfound at the very left edge of the dendrogram and at the very

left column and bottom row of the heatmapThe color codingindicates that surprisingly the ness derivational lexome isnegatively correlated with almost all content lexomes thathave ness as formative Thus the semantic vector of nessis not a prototype at the center of its cloud of exemplarsbut an antiprototype This vector is close to the cloud ofsemantic vectors but it is outside its periphery This patternis not specific to ness but is found for the other derivationallexomes as well It is intrinsic to our model

The reason for this is straightforward During learningalthough for a derived wordrsquos lexome 119894 and its derivational

Complexity 11

NES

Ssle

eple

ssne

ssas

sert

iven

ess

dire

ctne

ssso

undn

ess

bign

ess

perm

issiv

enes

sap

prop

riate

ness

prom

ptne

sssh

rew

dnes

sw

eakn

ess

dark

ness

wild

erne

ssbl

ackn

ess

good

ness

kind

ness

lone

lines

sfo

rgiv

enes

ssti

llnes

sco

nsci

ousn

ess

happ

ines

ssic

knes

saw

aren

ess

blin

dnes

sbr

ight

ness

wild

ness

mad

ness

grea

tnes

sso

ftnes

sth

ickn

ess

tend

erne

ssbi

ttern

ess

effec

tiven

ess

hard

ness

empt

ines

sw

illin

gnes

sfo

olish

ness

eage

rnes

sne

atne

sspo

liten

ess

fresh

ness

high

ness

nerv

ousn

ess

wet

ness

unea

sines

slo

udne

ssco

olne

ssid

lene

ssdr

ynes

sbl

uene

ssda

mpn

ess

fond

ness

liken

ess

clean

lines

ssa

dnes

ssw

eetn

ess

read

ines

scle

vern

ess

noth

ingn

ess

cold

ness

unha

ppin

ess

serio

usne

ssal

ertn

ess

light

ness

uglin

ess

flatn

ess

aggr

essiv

enes

sse

lfminusco

nsci

ousn

ess

ruth

lessn

ess

quie

tnes

sva

guen

ess

shor

tnes

sbu

sines

sho

lines

sop

enne

ssfit

ness

chee

rful

ness

earn

estn

ess

right

eous

ness

unpl

easa

ntne

ssqu

ickn

ess

corr

ectn

ess

disti

nctiv

enes

sun

cons

ciou

snes

sfie

rcen

ess

orde

rline

ssw

hite

ness

talln

ess

heav

ines

sw

icke

dnes

sto

ughn

ess

lawle

ssne

ssla

zine

ssnu

mbn

ess

sore

ness

toge

ther

ness

resp

onsiv

enes

scle

arne

ssne

arne

ssde

afne

ssill

ness

dizz

ines

sse

lfish

ness

fatn

ess

rest

lessn

ess

shyn

ess

richn

ess

thor

ough

ness

care

less

ness

tired

ness

fairn

ess

wea

rines

stig

htne

ssge

ntle

ness

uniq

uene

ssfr

iend

lines

sva

stnes

sco

mpl

eten

ess

than

kful

ness

slow

ness

drun

kenn

ess

wre

tche

dnes

sfr

ankn

ess

exac

tnes

she

lples

snes

sat

trac

tiven

ess

fulln

ess

roug

hnes

sm

eann

ess

hope

lessn

ess

drow

sines

ssti

ffnes

sclo

sene

ssgl

adne

ssus

eful

ness

firm

ness

rude

ness

bold

ness

shar

pnes

sdu

llnes

sw

eigh

tless

ness

stra

ngen

ess

smoo

thne

ss

NESSsleeplessnessassertivenessdirectnesssoundnessbignesspermissivenessappropriatenesspromptnessshrewdnessweaknessdarknesswildernessblacknessgoodnesskindnesslonelinessforgivenessstillnessconsciousnesshappinesssicknessawarenessblindnessbrightnesswildnessmadnessgreatnesssoftnessthicknesstendernessbitternesseffectivenesshardnessemptinesswillingnessfoolishnesseagernessneatnesspolitenessfreshnesshighnessnervousnesswetnessuneasinessloudnesscoolnessidlenessdrynessbluenessdampnessfondnesslikenesscleanlinesssadnesssweetnessreadinessclevernessnothingnesscoldnessunhappinessseriousnessalertnesslightnessuglinessflatnessaggressivenessselfminusconsciousnessruthlessnessquietnessvaguenessshortnessbusinessholinessopennessfitnesscheerfulnessearnestnessrighteousnessunpleasantnessquicknesscorrectnessdistinctivenessunconsciousnessfiercenessorderlinesswhitenesstallnessheavinesswickednesstoughnesslawlessnesslazinessnumbnesssorenesstogethernessresponsivenessclearnessnearnessdeafnessillnessdizzinessselfishnessfatnessrestlessnessshynessrichnessthoroughnesscarelessnesstirednessfairnesswearinesstightnessgentlenessuniquenessfriendlinessvastnesscompletenessthankfulnessslownessdrunkennesswretchednessfranknessexactnesshelplessnessattractivenessfullnessroughnessmeannesshopelessnessdrowsinessstiffnessclosenessgladnessusefulnessfirmnessrudenessboldnesssharpnessdullnessweightlessnessstrangenesssmoothness

Color Keyand Histogram

0 05minus05

Value

020

0040

0060

0080

00C

ount

Figure 3 Heatmap for the correlation matrix of lexomes for content words with ness as well as the derivational lexome of ness itself Thislexome is found at the very left edge of the dendrograms and is negatively correlated with almost all content lexomes (Individual words willbecome legible by zooming in on the figure at maximal magnification)

lexome ness are co-present cues the derivational lexomeoccurs in many other words 119895 and each time anotherword 119895 is encountered weights are reduced from ness to119894 As this happens for all content lexomes the derivationallexome is during learning slowly but steadily discriminatedaway from its content lexomes We shall see that this is animportant property for our model to capture morphologicalproductivity for comprehension and speech production

When the additive model of Mitchell and Lapata [67]is used to construct a semantic vector for ness ie when

the average vector is computed for the vectors obtained bysubtracting the vector of the derived word from that of thebase word the result is a vector that is embedded insidethe cluster of derived vectors and hence inherits semanticidiosyncracies from all these derived words

324 Semantic Plausibility and Transparency Ratings forDerived Words In order to obtain further evidence for thevalidity of inflectional and derivational lexomes we re-analyzed the semantic plausibility judgements for word pairs

12 Complexity

consisting of a base and a novel derived form (eg accentaccentable) reported byMarelli and Baroni [60] and availableat httpcliccimecunitnitcomposesFRACSSThe number of pairs available to us for analysis 236 wasrestricted compared to the original 2559 pairs due to theconstraints that the base words had to occur in the tasacorpus Furthermore we also explored the role of wordsrsquoemotional valence arousal and dominance as availablein Warriner et al [83] which restricted the number ofitems even further The reason for doing so is that humanjudgements such as those of age of acquisition may reflectdimensions of emotion [59]

A measure derived from the semantic vectors of thederivational lexomes activation diversity and the L1-normof the semantic vector (the sum of the absolute values of thevector elements ie its city-block distance) turned out tobe predictive for these plausibility ratings as documented inTable 3 and the left panel of Figure 4 Activation diversity is ameasure of lexicality [58] In an auditory word identificationtask for instance speech that gives rise to a low activationdiversity elicits fast rejections whereas speech that generateshigh activation diversity elicits higher acceptance rates but atthe cost of longer response times [18]

Activation diversity interacted with word length Agreater word length had a strong positive effect on ratedplausibility but this effect progressively weakened as acti-vation diversity increases In turn activation diversity hada positive effect on plausibility for shorter words and anegative effect for longer words Apparently the evidencefor lexicality that comes with higher L1-norms contributespositively when there is little evidence coming from wordlength As word length increases and the relative contributionof the formative in theword decreases the greater uncertaintythat comes with higher lexicality (a greater L1-norm impliesmore strong links to many other lexomes) has a detrimentaleffect on the ratings Higher arousal scores also contributedto higher perceived plausibility Valence and dominance werenot predictive and were therefore left out of the specificationof themodel reported here (word frequency was not includedas predictor because all derived words are neologisms withzero frequency addition of base frequency did not improvemodel fit 119901 gt 091)

The degree to which a complex word is semanticallytransparent with respect to its base word is of both practi-cal and theoretical interest An evaluation of the semantictransparency of complex words using transformation matri-ces is developed in Marelli and Baroni [60] Within thepresent framework semantic transparency can be examinedstraightforwardly by comparing the correlations of (i) thesemantic vectors of the base word to which the semanticvector of the affix is added with (ii) the semantic vector ofthe derived word itself The more distant the two vectors arethe more negative their correlation should be This is exactlywhat we find For ness for instance the six most negativecorrelations are business (119903 = minus066) effectiveness (119903 = minus051)awareness (119903 = minus050) loneliness (119903 = minus045) sickness (119903 =minus044) and consciousness (119903 = minus043) Although ness canhave an anaphoric function in discourse [84] words such asbusiness and consciousness have a much deeper and richer

semantics than just reference to a previously mentioned stateof affairs A simple comparison of the wordrsquos actual semanticvector in S (derived from the TASA corpus) and its semanticvector obtained by accumulating evidence over base and affix(ie summing the vectors of base and affix) brings this outstraightforwardly

However evaluating correlations for transparency isprone to researcher bias We therefore also investigated towhat extent the semantic transparency ratings collected byLazaridou et al [68] can be predicted from the activationdiversity of the semantic vector of the derivational lexomeThe original dataset of Lazaridou and colleagues comprises900 words of which 330 meet the criterion of occurringmore than 8 times in the tasa corpus and for which wehave semantic vectors available The summary of a Gaussianlocation-scale additivemodel is given in Table 4 and the rightpanel of Figure 4 visualizes the interaction of word lengthby activation diversity Although modulated by word lengthoverall transparency ratings decrease as activation diversityis increased Apparently the stronger the connections aderivational lexome has with other lexomes the less clearit becomes what its actual semantic contribution to a novelderived word is In other words under semantic uncertaintytransparency ratings decrease

4 Comprehension

Now that we have established that the present semanticvectors make sense even though they are based on a smallcorpus we next consider a comprehension model that hasform vectors as input and semantic vectors as output (a pack-age for R implementing the comprehension and productionalgorithms of linear discriminative learning is available athttpwwwsfsuni-tuebingendesimhbaayenpublicationsWpmWithLdl 10targz The package isdescribed in Baayen et al [85]) We begin with introducingthe central concepts underlying mappings from form tomeaning We then discuss visual comprehension and thenturn to auditory comprehension

41 Setting Up the Mapping Let C denote the cue matrix amatrix that specifies for each word (rows) the form cues ofthat word (columns) For a toy lexicon with the words onetwo and three the Cmatrix is

C

= wV wVn Vn tu tu Tr Tri rionetwothree

( 100100

100010

010001

001001 ) (12)

where we use the disc keyboard phonetic alphabet (theldquoDIstinct SingleCharacterrdquo representationwith one characterfor each phoneme was introduced by CELEX [86]) fortriphones the symbol denotes the word boundary Supposethat the semantic vectors for these words are the row vectors

Complexity 13

26 27

28

29

29

3 3

31

31

32

32

33

33

34

34

35

35

3

3

35

4

45 36

38

40

42

44

Activ

atio

n D

iver

sity

of th

e Der

ivat

iona

l Lex

ome

8 10 12 14 16 186Word Length

36

38

40

42

44

Activ

atio

n D

iver

sity

of th

e Der

ivat

iona

l Lex

ome

6 8 10 12 144Word Length

Figure 4 Interaction of word length by activation diversity in GAMs fitted to plausibility ratings for complex words (left) and to semantictransparency ratings (right)

Table 3 GAM fitted to plausibility ratings for derivational neologisms with the derivational lexomes able again agent ist less and notdata from Marelli and Baroni [60]

A parametric coefficients Estimate Std Error t-value p-valueIntercept -18072 27560 -06557 05127Word Length 05901 02113 27931 00057Activation Diversity 11221 06747 16632 00976Arousal 03847 01521 25295 00121Word Length times Activation Diversity -01318 00500 -26369 00089B smooth terms edf Refdf F-value p-valueRandom Intercepts Affix 33610 40000 556723 lt 00001

of the following matrix S

S = one two threeonetwothree

( 100201031001

040110 ) (13)

We are interested in a transformation matrix 119865 such that

CF = S (14)

The transformation matrix is straightforward to obtain LetC1015840 denote the Moore-Penrose generalized inverse of Cavailable in R as the ginv function in theMASS package [82]Then

F = C1015840S (15)

For the present example

F =one two three

wVwVnVntutuTrTriri

((((((((((((

033033033010010003003003

010010010050050003003003

013013013005005033033033

))))))))))))

(16)

and for this simple example CF is exactly equal to SIn the remainder of this section we investigate how well

this very simple end-to-end model performs for visual wordrecognition as well as auditory word recognition For visualword recognition we use the semantic matrix S developed inSection 3 but we consider two different cue matrices C oneusing letter trigrams (following [58]) and one using phonetrigramsA comparison of the performance of the twomodels

14 Complexity

Table 4 Gaussian location-scale additive model fitted to the semantic transparency ratings for derived words te tensor product smooth ands thin plate regression spline

A parametric coefficients Estimate Std Error t-value p-valueintercept [location] 55016 02564 214537 lt 00001intercept [scale] -04135 00401 -103060 lt 00001B smooth terms edf Refdf F-value p-valuete(activation diversity word length) [location] 53879 62897 237798 00008s(derivational lexome) [location] 143180 150000 3806254 lt 00001s(activation diversity) [scale] 21282 25993 284336 lt 00001s(word length) [scale] 17722 21403 1327311 lt 00001

sheds light on the role of wordsrsquo ldquosound imagerdquo on wordrecognition in reading (cf [87ndash89] for phonological effectsin visual word Recognition) For auditory word recognitionwe make use of the acoustic features developed in Arnold etal [18]

Although the features that we selected as cues are basedon domain knowledge they do not have an ontological statusin our framework in contrast to units such as phonemes andmorphemes in standard linguistic theories In these theoriesphonemes and morphemes are Russelian atomic units offormal phonological and syntactic calculi and considerableresearch has been directed towards showing that these unitsare psychologically real For instance speech errors involvingmorphemes have been interpreted as clear evidence for theexistence in the mind of morphemes [90] Research hasbeen directed at finding behavioral and neural correlates ofphonemes and morphemes [6 91 92] ignoring fundamentalcriticisms of both the phoneme and the morpheme astheoretical constructs [25 93 94]

The features that we use for representing aspects of formare heuristic features that have been selected or developedprimarily because they work well as discriminators We willgladly exchange the present features for other features if theseother features can be shown to afford higher discriminativeaccuracy Nevertheless the features that we use are groundedin domain knowledge For instance the letter trigrams usedfor modeling reading (see eg[58]) are motivated by thefinding in stylometry that letter trigrams are outstandingfeatures for discriminating between authorial hands [95] (inwhat follows we work with letter triplets and triphoneswhich are basically contextually enriched letter and phoneunits For languages with strong phonotactic restrictionssuch that the syllable inventory is quite small (eg Viet-namese) digraphs work appear to work better than trigraphs[96] Baayen et al [85] show that working with four-gramsmay enhance performance and current work in progress onmany other languages shows that for highly inflecting lan-guages with long words 4-grams may outperform 3-gramsFor computational simplicity we have not experimented withmixtures of units of different length nor with algorithmswithwhich such units might be determined) Letter n-grams arealso posited by Cohen and Dehaene [97] for the visual wordform system at the higher endof a hierarchy of neurons tunedto increasingly large visual features of words

Triphones the units that we use for representing thelsquoacoustic imagersquo or lsquoauditory verbal imageryrsquo of canonicalword forms have the same discriminative potential as lettertriplets but have as additional advantage that they do betterjustice compared to phonemes to phonetic contextual inter-dependencies such as plosives being differentiated primarilyby formant transitions in adjacent vowels The acousticfeatures that we use for modeling auditory comprehensionto be introduced in more detail below are motivated in partby the sensitivity of specific areas on the cochlea to differentfrequencies in the speech signal

Evaluation of model predictions proceeds by comparingthe predicted semantic vector s obtained by multiplying anobserved cue vector c with the transformation matrix F (s =cF) with the corresponding target row vector s of S A word 119894is counted as correctly recognized when s119894 is most stronglycorrelated with the target semantic vector s119894 of all targetvectors s119895 across all words 119895

Inflected words do not have their own semantic vectors inS We therefore created semantic vectors for inflected wordsby adding the semantic vectors of stem and affix and addedthese as additional row vectors to S before calculating thetransformation matrix F

We note here that there is only one transformationmatrix ie one discrimination network that covers allaffixes inflectional and derivational as well as derived andmonolexomic (simple) words This approach contrasts withthat of Marelli and Baroni [60] who pair every affix with itsown transformation matrix

42 Visual Comprehension For visual comprehension wefirst consider a model straightforwardly mapping form vec-tors onto semantic vectors We then expand the model withan indirect route first mapping form vectors onto the vectorsfor the acoustic image (derived from wordsrsquo triphone repre-sentations) and thenmapping the acoustic image vectors ontothe semantic vectors

The dataset on which we evaluated our models com-prised 3987 monolexomic English words 6595 inflectedvariants of monolexomic words and 898 derived words withmonolexomic base words to a total of 11480 words Thesecounts follow from the simultaneous constraints of (i) a wordappearing with sufficient frequency in TASA (ii) the wordbeing available in the British Lexicon Project (BLP [98]) (iii)

Complexity 15

the word being available in the CELEX database and theword being of the abovementioned morphological type Inthe present study we did not include inflected variants ofderived words nor did we include compounds

421 The Direct Route Straight from Orthography to Seman-tics To model single word reading as gauged by the visuallexical decision task we used letter trigrams as cues In ourdataset there were a total of 3465 unique trigrams resultingin a 11480 times 3465 orthographic cue matrix C The semanticvectors of the monolexomic and derived words were takenfrom the semantic weight matrix described in Section 3 Forinflected words semantic vectors were obtained by summa-tion of the semantic vectors of base words and inflectionalfunctions For the resulting semantic matrix S we retainedthe 5030 column vectors with the highest variances settingthe cutoff value for the minimal variance to 034times10minus7 FromS and C we derived the transformation matrix F which weused to obtain estimates s of the semantic vectors s in S

For 59 of the words s had the highest correlation withthe targeted semantic vector s (for the correctly predictedwords the mean correlation was 083 and the median was086 With respect to the incorrectly predicted words themean and median correlation were both 048) The accuracyobtained with naive discriminative learning using orthog-onal lexomes as outcomes instead of semantic vectors was27Thus as expected performance of linear discriminationlearning (ldl) is substantially better than that of naivediscriminative learning (ndl)

To assess whether this model is productive in the sensethat it canmake sense of novel complex words we considereda separate set of inflected and derived words which were notincluded in the original dataset For an unseen complexwordboth base word and the inflectional or derivational functionappeared in the training set but not the complex word itselfThe network F therefore was presented with a novel formvector which it mapped onto a novel vector in the semanticspace Recognition was successful if this novel vector wasmore strongly correlated with the semantic vector obtainedby summing the semantic vectors of base and inflectional orderivational function than with any other semantic vector

Of 553 unseen inflected words 43 were recognized suc-cessfully The semantic vectors predicted from their trigramsby the network F were overall well correlated in the meanwith the targeted semantic vectors of the novel inflectedwords (obtained by summing the semantic vectors of theirbase and inflectional function) 119903 = 067 119901 lt 00001The predicted semantic vectors also correlated well with thesemantic vectors of the inflectional functions (119903 = 061 119901 lt00001) For the base words the mean correlation dropped to119903 = 028 (119901 lt 00001)

For unseen derivedwords (514 in total) we also calculatedthe predicted semantic vectors from their trigram vectorsThe resulting semantic vectors s had moderate positivecorrelations with the semantic vectors of their base words(119903 = 040 119901 lt 00001) but negative correlations withthe semantic vectors of their derivational functions (119903 =minus013 119901 lt 00001) They did not correlate with the semantic

vectors obtained by summation of the semantic vectors ofbase and affix (119903 = 001 119901 = 056)

The reduced correlations for the derived words as com-pared to those for the inflected words likely reflects tosome extent that the model was trained on many moreinflected words (in all 6595) than derived words (898)whereas the number of different inflectional functions (7)was much reduced compared to the number of derivationalfunctions (24) However the negative correlations of thederivational semantic vectors with the semantic vectors oftheir derivational functions fit well with the observation inSection 323 that derivational functions are antiprototypesthat enter into negative correlations with the semantic vectorsof the corresponding derived words We suspect that theabsence of a correlation of the predicted vectors with thesemantic vectors obtained by integrating over the vectorsof base and derivational function is due to the semanticidiosyncracies that are typical for derivedwords For instanceausterity can denote harsh discipline or simplicity or apolicy of deficit cutting or harshness to the taste Anda worker is not just someone who happens to work butsomeone earning wages or a nonreproductive bee or waspor a thread in a computer program Thus even within theusages of one and the same derived word there is a lotof semantic heterogeneity that stands in stark contrast tothe straightforward and uniform interpretation of inflectedwords

422 The Indirect Route from Orthography via Phonology toSemantics Although reading starts with orthographic inputit has been shown that phonology actually plays a role duringthe reading process as well For instance developmentalstudies indicate that childrenrsquos reading development can bepredicted by their phonological abilities [87 99] Evidence ofthe influence of phonology on adultsrsquo reading has also beenreported [89 100ndash105] As pointed out by Perrone-Bertolottiet al [106] silent reading often involves an imagery speechcomponent we hear our own ldquoinner voicerdquo while readingSince written words produce a vivid auditory experiencealmost effortlessly they argue that auditory verbal imageryshould be part of any neurocognitive model of readingInterestingly in a study using intracranial EEG recordingswith four epileptic neurosurgical patients they observed thatsilent reading elicits auditory processing in the absence ofany auditory stimulation suggesting that auditory images arespontaneously evoked during reading (see also [107])

To explore the role of auditory images in silent readingwe operationalized sound images bymeans of phone trigrams(in our model phone trigrams are independently motivatedas intermediate targets of speech production see Section 5for further discussion) We therefore ran the model againwith phone trigrams instead of letter triplets The semanticmatrix S was exactly the same as above However a newtransformation matrix F was obtained for mapping a 11480times 5929 cue matrix C of phone trigrams onto S To our sur-prise the triphone model outperformed the trigram modelsubstantially Overall accuracy rose to 78 an improvementof almost 20

16 Complexity

In order to integrate this finding into our model weimplemented an additional network K that maps ortho-graphic vectors (coding the presence or absence of trigrams)onto phonological vectors (indicating the presence or absenceof triphones) The result is a dual route model for (silent)reading with a first route utilizing a network mappingstraight from orthography to semantics and a secondindirect route utilizing two networks one mapping fromtrigrams to triphones and the other subsequently mappingfrom triphones to semantics

The network K which maps trigram vectors onto tri-phone vectors provides good support for the relevant tri-phones but does not provide guidance as to their orderingUsing a graph-based algorithm detailed in the Appendix (seealso Section 52) we found that for 92 of the words thecorrect sequence of triphones jointly spanning the wordrsquossequence of letters received maximal support

We then constructed amatrix Twith the triphone vectorspredicted from the trigrams by networkK and defined a newnetwork H mapping these predicted triphone vectors ontothe semantic vectors SThemean correlation for the correctlyrecognized words was 090 and the median correlation was094 For words that were not recognized correctly the meanand median correlation were 045 and 043 respectively Wenow have for each word two estimated semantic vectors avector s1 obtained with network F of the first route and avector s2 obtained with network H from the auditory targetsthat themselves were predicted by the orthographic cuesThemean of the correlations of these pairs of vectors was 119903 = 073(119901 lt 00001)

To assess to what extent the networks that we definedare informative for actual visual word recognition we madeuse of the reaction times in the visual lexical decision taskavailable in the BLPAll 11480words in the current dataset arealso included in BLPWe derived threemeasures from each ofthe two networks

The first measure is the sum of the L1-norms of s1 ands2 to which we will refer as a wordrsquos total activation diversityThe total activation diversity is an estimate of the support fora wordrsquos lexicality as provided by the two routes The secondmeasure is the correlation of s1 and s2 to which we will referas a wordrsquos route congruency The third measure is the L1-norm of a lexomersquos column vector in S to which we referas its prior This last measure has previously been observedto be a strong predictor for lexical decision latencies [59] Awordrsquos prior is an estimate of a wordrsquos entrenchment and prioravailability

We fitted a generalized additive model to the inverse-transformed response latencies in the BLP with these threemeasures as key covariates of interest including as controlpredictors word length andword type (derived inflected andmonolexomic with derived as reference level) As shown inTable 5 inflected words were responded to more slowly thanderived words and the same holds for monolexomic wordsalbeit to a lesser extent Response latencies also increasedwith length Total activation diversity revealed a U-shapedeffect (Figure 5 left panel) For all but the lowest totalactivation diversities we find that response times increase as

total activation diversity increases This result is consistentwith previous findings for auditory word recognition [18]As expected given the results of Baayen et al [59] the priorwas a strong predictor with larger priors affording shorterresponse times There was a small effect of route congruencywhich emerged in a nonlinear interaction with the priorsuch that for large priors a greater route congruency affordedfurther reduction in response time (Figure 5 right panel)Apparently when the two routes converge on the samesemantic vector uncertainty is reduced and a faster responsecan be initiated

For comparison the dual-route model was implementedwith ndl as well Three measures were derived from thenetworks and used to predict the same response latenciesThese measures include activation activation diversity andprior all of which have been reported to be reliable predictorsfor RT in visual lexical decision [54 58 59] However sincehere we are dealing with two routes the three measures canbe independently derived from both routes We thereforesummed up the measures of the two routes obtaining threemeasures total activation total activation diversity and totalprior With word type and word length as control factorsTable 6 shows that thesemeasures participated in a three-wayinteraction presented in Figure 6 Total activation showed aU-shaped effect on RT that is increasingly attenuated as totalactivation diversity is increased (left panel) Total activationalso interacted with the total prior (right panel) For mediumrange values of total activation RTs increased with totalactivation and as expected decreased with the total priorThecenter panel shows that RTs decrease with total prior formostof the range of total activation diversity and that the effect oftotal activation diversity changes sign going from low to highvalues of the total prior

Model comparison revealed that the generalized additivemodel with ldl measures (Table 5) provides a substantiallyimproved fit to the data compared to the GAM using ndlmeasures (Table 6) with a difference of no less than 65101AIC units At the same time the GAM based on ldl is lesscomplex

Given the substantially better predictability of ldl mea-sures on human behavioral data one remaining question iswhether it is really the case that two routes are involved insilent reading After all the superior accuracy of the secondroute might be an artefact of a simple machine learningtechnique performing better for triphones than for trigramsThis question can be addressed by examining whether thefit of the GAM summarized in Table 5 improves or worsensdepending on whether the activation diversity of the first orthe second route is taken out of commission

When the GAM is provided access to just the activationdiversity of s2 the AIC increased by 10059 units Howeverwhen themodel is based only on the activation diversity of s1themodel fit increased by no less than 14790 AIC units Fromthis we conclude that at least for the visual lexical decisionlatencies in the BLP the second route first mapping trigramsto triphones and then mapping triphones onto semanticvectors plays the more important role

The superiority of the second route may in part bedue to the number of triphone features being larger

Complexity 17

minus0295

minus00175

026

part

ial e

ffect

1 2 3 4 50Total activation diversity

000

005

010

015

Part

ial e

ffect

05

10

15

20

Prio

r

02 04 06 08 1000Selfminuscorrelation

Figure 5 The partial effects of total activation diversity (left) and the interaction of route congruency and prior (right) on RT in the BritishLexicon Project

Table 5 Summary of a generalized additive model fitted to response latencies in visual lexical decision using measures based on ldl s thinplate regression spline smooth te tensor product smooth

A parametric coefficients Estimate Std Error t-value p-valueintercept -17774 00087 -205414 lt0001word typeInflected 01110 00059 18742 lt0001word typeMonomorphemic 00233 00054 4322 lt0001word length 00117 00011 10981 lt0001B smooth terms edf Refdf F p-values(total activation diversity) 6002 7222 236 lt0001te(route congruency prior) 14673 17850 2137 lt0001

02 06 10 14

NDL total activation

minus0171

0031

0233

part

ial e

ffect

minus4 minus2 0 2 4

10

15

20

25

30

35

40

NDL total prior

ND

L to

tal a

ctiv

atio

n di

vers

ity

minus0416

01915

0799

part

ial e

ffect

02 06 10 14

minus4

minus2

02

4

NDL total activation

ND

L to

tal p

rior

minus0158

01635

0485

part

ial e

ffect

10

15

20

25

30

35

40

ND

L to

tal a

ctiv

atio

n di

vers

ity

Figure 6The interaction of total activation and total activation diversity (left) of total prior by total activation diversity (center) and of totalactivation by total prior (right) on RT in the British Lexicon Project based on ndl

18 Complexity

Table 6 Summary of a generalized additive model fitted to response latencies in visual lexical decision using measures from ndl te tensorproduct smooth

A parametric coefficients Estimate Std Error t-value p-valueintercept -16934 00086 -195933 lt0001word typeInflected 00286 00052 5486 lt0001word typeMonomorphemic 00042 00055 0770 = 4word length 00066 00011 5893 lt0001B smooth terms edf Refdf F p-valuete(activation activation diversity prior) 4134 4972 1084 lt0001

than the number of trigram features (3465 trigrams ver-sus 5929 triphones) More features which mathematicallyamount to more predictors enable more precise map-pings Furthermore the heavy use made in English ofletter sequences such as ough (with 10 different pro-nunciations httpswwwdictionarycomesough)reduces semantic discriminability compared to the corre-sponding spoken forms It is noteworthy however that thebenefits of the triphone-to-semantics mapping are possibleonly thanks to the high accuracy with which orthographictrigram vectors are mapped onto phonological triphonevectors (92)

It is unlikely that the lsquophonological routersquo is always dom-inant in silent reading Especially in fast lsquodiagonalrsquo readingthe lsquodirect routersquomay bemore dominantThere is remarkablealthough for the present authors unexpected convergencewith the dual route model for reading aloud of Coltheartet al [108] and Coltheart [109] However while the text-to-phonology route of their model has as primary function toexplain why nonwords can be pronounced our results showthat both routes can actually be active when silently readingreal words A fundamental difference is however that in ourmodel wordsrsquo semantics play a central role

43 Auditory Comprehension For the modeling of readingwe made use of letter trigrams as cues These cues abstractaway from the actual visual patterns that fall on the retinapatterns that are already transformed at the retina beforebeing sent to the visual cortex Our hypothesis is that lettertrigrams represent those high-level cells or cell assemblies inthe visual system that are critical for reading andwe thereforeleave the modeling possibly with deep learning networksof how patterns on the retina are transformed into lettertrigrams for further research

As a consequence of the high level of abstraction of thetrigrams a word form is represented by a unique vectorspecifying which of a fixed set of letter trigrams is present inthe word Although one might consider modeling auditorycomprehension with phone triplets (triphones) replacing theletter trigrams of visual comprehension such an approachwould not do justice to the enormous variability of actualspeech Whereas the modeling of reading printed wordscan depart from the assumption that the pixels of a wordrsquosletters on a computer screen are in a fixed configurationindependently of where the word is shown on the screenthe speech signal of the same word type varies from token

to token as illustrated in Figure 7 for the English wordcrisis A survey of the Buckeye corpus [110] of spontaneousconversations recorded at Columbus Ohio [111] indicatesthat around 5 of the words are spoken with one syllablemissing and that a little over 20 of words have at least onephone missing

It is widely believed that the phoneme as an abstractunit of sound is essential for coming to grips with thehuge variability that characterizes the speech signal [112ndash114]However the phoneme as theoretical linguistic constructis deeply problematic [93] and for many spoken formscanonical phonemes do not do justice to the phonetics ofthe actual sounds [115] Furthermore if words are defined assequences of phones the problem arises what representationsto posit for words with two ormore reduced variants Addingentries for reduced forms to the lexicon turns out not toafford better overal recognition [116] Although exemplarmodels have been put forward to overcome this problem[117] we take a different approach here and following Arnoldet al [18] lay out a discriminative approach to auditorycomprehension

The cues that we make use of to represent the acousticsignal are the Frequency Band Summary Features (FBSFs)introduced by Arnold et al [18] as input cues FBSFs summa-rize the information present in the spectrogram of a speechsignal The algorithm that derives FBSFs first chunks theinput at the minima of the Hilbert amplitude envelope of thesignalrsquos oscillogram (see the upper panel of Figure 8) Foreach chunk the algorithm distinguishes 21 frequency bandsin the MEL scaled spectrum and intensities are discretizedinto 5 levels (lower panel in Figure 8) for small intervalsof time For each chunk and for each frequency band inthese chunks a discrete feature is derived that specifies chunknumber frequency band number and a description of thetemporal variation in the band bringing together minimummaximum median initial and final intensity values The21 frequency bands are inspired by the 21 receptive areason the cochlear membrane that are sensitive to differentranges of frequencies [118] Thus a given FBSF is a proxyfor cell assemblies in the auditory cortex that respond to aparticular pattern of changes over time in spectral intensityThe AcousticNDLCodeR package [119] for R [120] wasemployed to extract the FBSFs from the audio files

We tested ldl on 20 hours of speech sampled from theaudio files of the ucla library broadcast newsscapedata a vast repository of multimodal TV news broadcasts

Complexity 19

0 5000 10000 15000 20000 25000 30000 35000 40000minus6000

minus4000

minus2000

0

2000

4000

6000

8000

0 5000 10000 15000 20000 25000 30000 35000 40000minus6000

minus4000

minus2000

0

2000

4000

6000

8000

Figure 7Oscillogram for twodifferent realizations of theword crisiswith different degrees of reduction in theNewsScape archive [kraız](left)and [khraısıs](right)

00 01 02 03 04 05 06time

minus020minus015minus010minus005

000005010015

ampl

itude

1

21

frequ

ency

ban

d

Figure 8 Oscillogram of the speech signal for a realization of the word economic with Hilbert envelope (in orange) outlining the signal isshown on the top panel The lower panel depicts the discretized MEL scaled spectrum of the signalThe vertical bars are the boundary pointsthat partition the signal into 4 chunks For one chunk horizontal lines (in red) highlight one of the frequency bands for which the FBSFsprovide summaries of the variation over time in that frequency band The FBSF for the highlighted band is ldquoband11-start3-median3-min2-max5-end3-part2rdquo

provided to us by the Distributed Little Red Hen Lab Theaudio files of this resource were automatically classified asclean for relatively clean parts where there is speech withoutbackground noise or music and noisy for speech snippetswhere background noise or music is present Here we reportresults for 20 hours of clean speech to a total of 131673word tokens (representing 4779word types) with in all 40639distinct FBSFs

The FBSFs for the word tokens are brought together ina matrix C119886 with dimensions 131673 audio tokens times 40639FBSFs The targeted semantic vectors are taken from the Smatrix which is expanded to a matrix with 131673 rows onefor each audio token and 4609 columns the dimension of

the semantic vectors Although the transformation matrix Fcould be obtained by calculating C1015840S the calculation of C1015840is numerically expensive To reduce computational costs wecalculated F as follows (the transpose of a square matrix Xdenoted by X119879 is obtained by replacing the upper triangleof the matrix by the lower triangle and vice versa Thus( 3 87 2 )119879 = ( 3 78 2 ) For a nonsquare matrix the transpose isobtained by switching rows and columnsThus a 4times2 matrixbecomes a 2 times 4 matrix when transposed)

CF = S

C119879CF = C119879S

20 Complexity

(C119879C)minus1 (C119879C) F = (C119879C)minus1 C119879SF = (C119879C)minus1 C119879S

(17)

In this way matrix inversion is required for a much smallermatrixC119879C which is a square matrix of size 40639 times 40639

To evaluate model performance we compared the esti-mated semantic vectors with the targeted semantic vectorsusing the Pearson correlation We therefore calculated the131673 times 131673 correlation matrix for all possible pairs ofestimated and targeted semantic vectors Precisionwas calcu-lated by comparing predicted vectors with the gold standardprovided by the targeted vectors Recognition was defined tobe successful if the correlation of the predicted vector withthe targeted gold vector was the highest of all the pairwisecorrelations of this predicted vector with any of the other goldsemantic vectors Precision defined as the proportion of cor-rect recognitions divided by the total number of audio tokenswas at 3361 For the correctly identified words the meancorrelation was 072 and for the incorrectly identified wordsit was 055 To place this in perspective a naive discrimi-nation model with discrete lexomes as output performed at12 and a deep convolution network Mozilla DeepSpeech(httpsgithubcommozillaDeepSpeech based onHannun et al [9]) performed at 6 The low performanceofMozilla DeepSpeech is due primarily due to its dependenceon a languagemodelWhenpresentedwith utterances insteadof single words it performs remarkably well

44 Discussion A problem with naive discriminative learn-ing that has been puzzling for a long time is that measuresbased on ndl performed well as predictors of processingtimes [54 58] whereas accuracy for lexical decisions waslow An accuracy of 27 for a model performing an 11480-classification task is perhaps reasonable but the lack ofprecision is unsatisfactory when the goal is to model humanvisual lexicality decisions Bymoving fromndl to ldlmodelaccuracy is substantially improved (to 59) At the sametime predictions for reaction times improved considerablyas well As lexicality decisions do not require word identifica-tion further improvement in predicting decision behavior isexpected to be possible by considering not only whether thepredicted semantic is closest to the targeted vector but alsomeasures such as how densely the space around the predictedsemantic vector is populated

Accuracy for auditory comprehension is lower for thedata we considered above at around 33 Interestingly aseries of studies indicates that recognizing isolated wordstaken out of running speech is a nontrivial task also forhuman listeners [121ndash123] Correct identification by nativespeakers of 1000 randomly sampled word tokens from aGerman corpus of spontaneous conversational speech rangedbetween 21 and 44 [18] For both human listeners andautomatic speech recognition systems recognition improvesconsiderably when words are presented in their naturalcontext Given that ldl with FBSFs performs very wellon isolated word recognition it seems worth investigating

further whether the present approach can be developed into afully-fledged model of auditory comprehension that can takefull utterances as input For a blueprint of how we plan toimplement such a model see Baayen et al [124]

5 Speech Production

This section examines whether we can predict wordsrsquo formsfrom the semantic vectors of S If this is possible for thepresent dataset with reasonable accuracy we have a proofof concept that discriminative morphology is feasible notonly for comprehension but also for speech productionThe first subsection introduces the computational imple-mentation The next subsection reports on the modelrsquosperformance which is evaluated first for monomorphemicwords then for inflected words and finally for derivedwords Section 53 provides further evidence for the pro-duction network by showing that as the support from thesemantics for the triphones becomes weaker the amount oftime required for articulating the corresponding segmentsincreases

51 Computational Implementation For a productionmodelsome representational format is required for the output thatin turn drives articulation In what follows we make useof triphones as output features Triphones capture part ofthe contextual dependencies that characterize speech andthat render problematic the phoneme as elementary unit ofa phonological calculus [93] Triphones are in many waysnot ideal in that they inherit the limitations that come withdiscrete units Other output features structured along thelines of gestural phonology [125] or time series of movementsof key articulators registered with electromagnetic articulog-raphy or ultrasound are on our list for further explorationFor now we use triphones as a convenience construct andwe will show that given the support from the semantics forthe triphones the sequence of phones can be reconstructedwith high accuracy As some models of speech productionassemble articulatory targets from phone segments (eg[126]) the present implementation can be seen as a front-endfor this type of model

Before putting the model to the test we first clarify theway the model works by means of our toy lexicon with thewords one two and three Above we introduced the semanticmatrix S (13) which we repeat here for convenience

S = one two threeonetwothree

( 100201031001

040110 ) (18)

as well as an C indicator matrix specifying which triphonesoccur in which words (12) As in what follows this matrixspecifies the triphones targeted for productionwe henceforthrefer to this matrix as the Tmatrix

Complexity 21

T

= wV wVn Vn tu tu Tr Tri rionetwothree

( 100100

100010

010001

001001 ) (19)

For production our interest is in thematrixG that transformsthe row vectors of S into the row vectors of T ie we need tosolve

SG = T (20)

Given G we can predict for any semantic vector s the vectorof triphones t that quantifies the support for the triphonesprovided by s simply by multiplying s with G

t = sG (21)

As before the transformation matrix G is straightforwardto estimate Let S1015840 denote the Moore-Penrose generalizedinverse of S Since

S1015840SG = S1015840T (22)

we have

G = S1015840T (23)

Given G we can predict the triphone matrix T from thesemantic matrix S

SG = T (24)

For the present example the inverse of S and S1015840 is

S1015840 = one two threeonetwothree

( 110minus021minus009minus029107minus008

minus041minus002104 ) (25)

and the transformation matrix G is

G = wV wVn Vn tu tu Tr Tri ri

onetwothree

( 110minus021minus009110minus021minus009

110minus021minus009minus029107minus008

minus029107minus008minus041minus002104

minus041minus002104minus041minus002104 ) (26)

For this simple example T is virtually identical to T Forrealistic data T will not be identical to T but will be anapproximation of it that is optimal in the least squares senseThe triphones with the strongest support are expected to bethe most likely to be the triphones defining a wordrsquos form

We made use of the S matrix which we derived fromthe tasa corpus as described in Section 3 The majority ofcolumns of the 23561 times 23561 matrix S show very smalldeviations from zero and hence are uninformative As beforewe reduced the number of columns of S by removing columnswith very low variance Here one option is to remove allcolumns with a variance below a preset threshold 120579 Howeverto avoid adding a threshold as a free parameter we setthe number of columns retained to the number of differenttriphones in a given dataset a number which is around 119899 =4500 In summary S denotes a 119908times119899 matrix that specifies foreach of 119908 words an 119899-dimensional semantic vector

52 Model Performance

521 Performance on Monolexomic Words We first exam-ined model performance on a dataset comprising monolex-omicwords that did not carry any inflectional exponentsThisdataset of 3987words comprised three irregular comparatives

(elder less andmore) two irregular superlatives (least most)28 irregular past tense forms and one irregular past participle(smelt) For this set of words we constructed a 3987 times 4446matrix of semantic vectors S119898 (a submatrix of the S matrixintroduced above in Section 3) and a 3987 times 4446 triphonematrix T119898 (a submatrix of T) The number of colums of S119898was set to the number of different triphones (the columndimension of T119898) Those column vectors of S119898 were retainedthat had the 4446 highest variances We then estimated thetransformation matrix G and used this matrix to predict thetriphones that define wordsrsquo forms

We evaluated model performance in two ways First weinspected whether the triphones with maximal support wereindeed the targeted triphones This turned out to be the casefor all words Targeted triphones had an activation valueclose to one and nontargeted triphones an activation closeto zero As the triphones are not ordered we also investi-gated whether the sequence of phones could be constructedcorrectly for these words To this end we set a thresholdof 099 extracted all triphones with an activation exceedingthis threshold and used the all simple paths (a path issimple if the vertices it visits are not visited more than once)function from the igraph package [127] to calculate all pathsstarting with any left-edge triphone in the set of extractedtriphones From the resulting set of paths we selected the

22 Complexity

longest path which invariably was perfectly aligned with thesequence of triphones that defines wordsrsquo forms

We also evaluated model performance with a secondmore general heuristic algorithm that also makes use of thesame algorithm from graph theory Our algorithm sets up agraph with vertices collected from the triphones that are bestsupported by the relevant semantic vectors and considers allpaths it can find that lead from an initial triphone to a finaltriphoneThis algorithm which is presented inmore detail inthe appendix and which is essential for novel complex wordsproduced the correct form for 3982 out of 3987 words Itselected a shorter form for five words int for intent lin forlinnen mis for mistress oint for ointment and pin for pippinThe correct forms were also found but ranked second due toa simple length penalty that is implemented in the algorithm

From these analyses it is clear that mapping nearly4000 semantic vectors on their corresponding triphone pathscan be accomplished with very high accuracy for Englishmonolexomic words The question to be addressed nextis how well this approach works for complex words Wefirst address inflected forms and limit ourselves here to theinflected variants of the present set of 3987 monolexomicwords

522 Performance on Inflected Words Following the classicdistinction between inflection and word formation inflectedwords did not receive semantic vectors of their own Never-theless we can create semantic vectors for inflected words byadding the semantic vector of an inflectional function to thesemantic vector of its base However there are several ways inwhich themapping frommeaning to form for inflectedwordscan be set up To explain this we need some further notation

Let S119898 and T119898 denote the submatrices of S and T thatcontain the semantic and triphone vectors of monolexomicwords Assume that a subset of 119896 of these monolexomicwords is attested in the training corpus with inflectionalfunction 119886 Let T119886 denote the matrix with the triphonevectors of these inflected words and let S119886 denote thecorresponding semantic vectors To obtain S119886 we take thepertinent submatrix S119898119886 from S119898 and add the semantic vectors119886 of the affix

S119886 = S119898119886 + i⨂ s119886 (27)

Here i is a unit vector of length 119896 and ⨂ is the generalizedKronecker product which in (27) stacks 119896 copies of s119886 row-wise As a first step we could define a separate mapping G119886for each inflectional function 119886

S119886G119886 = T119886 (28)

but in this set-up learning of inflected words does not benefitfrom the knowledge of the base words This can be remediedby a mapping for augmented matrices that contain the rowvectors for both base words and inflected words[S119898

S119886]G119886 = [T119898

T119886] (29)

The dimension of G119886 (length of semantic vector by lengthof triphone vector) remains the same so this option is notmore costly than the preceding one Nevertheless for eachinflectional function a separate large matrix is required Amuch more parsimonious solution is to build augmentedmatrices for base words and all inflected words jointly

[[[[[[[[[[S119898S1198861S1198862S119886119899

]]]]]]]]]]G = [[[[[[[[[[

T119898T1198861T1198862T119886119899

]]]]]]]]]](30)

The dimension of G is identical to that of G119886 but now allinflectional functions are dealt with by a single mappingIn what follows we report the results obtained with thismapping

We selected 6595 inflected variants which met the cri-terion that the frequency of the corresponding inflectionalfunction was at least 50 This resulted in a dataset with 91comparatives 97 superlatives 2401 plurals 1333 continuousforms (eg walking) 859 past tense forms and 1086 formsclassed as perfective (past participles) as well as 728 thirdperson verb forms (eg walks) Many forms can be analyzedas either past tenses or as past participles We followed theanalyses of the treetagger which resulted in a dataset inwhichboth inflectional functions are well-attested

Following (30) we obtained an (augmented) 10582 times5483 semantic matrix S whereas before we retained the 5483columns with the highest column variance The (augmented)triphone matrix T for this dataset had the same dimensions

Inspection of the activations of the triphones revealedthat targeted triphones had top activations for 85 of themonolexomic words and 86 of the inflected words Theproportion of words with at most one intruding triphone was97 for both monolexomic and inflected words The graph-based algorithm performed with an overall accuracy of 94accuracies broken down bymorphology revealed an accuracyof 99 for the monolexomic words and an accuracy of 92for inflected words One source of errors for the productionalgorithm is inconsistent coding in the celex database Forinstance the stem of prosper is coded as having a final schwafollowed by r but the inflected forms are coded without ther creating a mismatch between a partially rhotic stem andcompletely non-rhotic inflected variants

We next put model performance to a more stringent testby using 10-fold cross-validation for the inflected variantsFor each fold we trained on all stems and 90 of all inflectedforms and then evaluated performance on the 10of inflectedforms that were not seen in training In this way we can ascer-tain the extent to which our production system (network plusgraph-based algorithm for ordering triphones) is productiveWe excluded from the cross-validation procedure irregularinflected forms forms with celex phonological forms withinconsistent within-paradigm rhoticism as well as forms thestem of which was not available in the training set Thus

Complexity 23

cross-validation was carried out for a total of 6236 inflectedforms

As before the semantic vectors for inflected words wereobtained by addition of the corresponding content and inflec-tional semantic vectors For each training set we calculatedthe transformationmatrixG from the S andTmatrices of thattraining set For an out-of-bag inflected form in the test setwe calculated its semantic vector and multiplied this vectorwith the transformation matrix (using (21)) to obtain thepredicted triphone vector t

The proportion of forms that were predicted correctlywas 062 The proportion of forms ranked second was 025Forms that were incorrectly ranked first typically were otherinflected forms (including bare stems) that happened toreceive stronger support than the targeted form Such errorsare not uncommon in spoken English For instance inthe Buckeye corpus [110] closest is once reduced to closFurthermore Dell [90] classified forms such as concludementfor conclusion and he relax for he relaxes as (noncontextual)errors

The algorithm failed to produce the targeted form for3 of the cases Examples of the forms produced instead arethe past tense for blazed being realized with [zId] insteadof [zd] the plural mouths being predicted as having [Ts] ascoda rather than [Ds] and finest being reduced to finst Thevoiceless production formouth does not follow the dictionarynorm but is used as attested by on-line pronunciationdictionaries Furthermore the voicing alternation is partlyunpredictable (see eg [128] for final devoicing in Dutch)and hence model performance here is not unreasonable Wenext consider model accuracy for derived words

523 Performance on Derived Words When building thevector space model we distinguished between inflectionand word formation Inflected words did not receive theirown semantic vectors By contrast each derived word wasassigned its own lexome together with a lexome for itsderivational function Thus happiness was paired with twolexomes happiness and ness Since the semantic matrix forour dataset already contains semantic vectors for derivedwords we first investigated how well forms are predictedwhen derived words are assessed along with monolexomicwords without any further differentiation between the twoTo this end we constructed a semantic matrix S for 4885words (rows) by 4993 (columns) and constructed the trans-formation matrix G from this matrix and the correspondingtriphone matrix T The predicted form vectors of T =SG supported the targeted triphones above all other tri-phones without exception Furthermore the graph-basedalgorithm correctly reconstructed 99 of all forms withonly 5 cases where it assigned the correct form secondrank

Next we inspected how the algorithm performs whenthe semantic vectors of derived forms are obtained fromthe semantic vectors of their base words and those of theirderivational lexomes instead of using the semantic vectors ofthe derived words themselves To allow subsequent evalua-tion by cross-validation we selected those derived words that

contained an affix that occured at least 30 times in our data set(again (38) agent (177) ful (45) instrument (82) less(54) ly (127) and ness (57)) to a total of 770 complex words

We first combined these derived words with the 3987monolexomic words For the resulting 4885 words the Tand S matrices were constructed from which we derivedthe transformation matrix G and subsequently the matrixof predicted triphone strengths T The proportion of wordsfor which the targeted triphones were the best supportedtriphones was 096 and the graph algorithm performed withan accuracy of 989

In order to assess the productivity of the system weevaluated performance on derived words by means of 10-fold cross-validation For each fold we made sure that eachaffix was present proportionally to its frequency in the overalldataset and that a derived wordrsquos base word was included inthe training set

We first examined performance when the transformationmatrix is estimated from a semantic matrix that contains thesemantic vectors of the derived words themselves whereassemantic vectors for unseen derived words are obtained bysummation of the semantic vectors of base and affix It turnsout that this set-up results in a total failure In the presentframework the semantic vectors of derived words are tooidiosyncratic and too finely tuned to their own collocationalpreferences They are too scattered in semantic space tosupport a transformation matrix that supports the triphonesof both stem and affix

We then used exactly the same cross-validation procedureas outlined above for inflected words constructing semanticvectors for derived words from the semantic vectors of baseand affix and calculating the transformation matrix G fromthese summed vectors For unseen derivedwordsGwas usedto transform the semantic vectors for novel derived words(obtained by summing the vectors of base and affix) intotriphone space

For 75 of the derived words the graph algorithmreconstructed the targeted triphones For 14 of the derivednouns the targeted form was not retrieved These includecases such as resound which the model produced with [s]instead of the (unexpected) [z] sewer which the modelproduced with R instead of only R and tumbler where themodel used syllabic [l] instead of nonsyllabic [l] given in thecelex target form

53 Weak Links in the Triphone Graph and Delays in SpeechProduction An important property of the model is that thesupport for trigrams changes where a wordrsquos graph branchesout for different morphological variants This is illustratedin Figure 9 for the inflected form blending Support for thestem-final triphone End is still at 1 but then blend can endor continue as blends blended or blending The resultinguncertainty is reflected in the weights on the edges leavingEnd In this example the targeted ing form is driven by theinflectional semantic vector for continuous and hence theedge to ndI is best supported For other forms the edgeweights will be different and hence other paths will be bettersupported

24 Complexity

1

1

1051

024023

03

023

1

001 001

1

02

03

023

002

1 0051

bl

blE

lEn

EndndI

dIN

IN

INI

NIN

dId

Id

IdI

dII

IIN

ndz

dz

dzI

zIN

nd

EnI

nIN

lEI

EIN

Figure 9 The directed graph for blending Vertices represent triphones including triphones such as IIN that were posited by the graphalgorithm to bridge potentially relevant transitions that are not instantiated in the training data Edges are labelled with their edge weightsThe graph the edge weights of which are specific to the lexomes blend and continuous incorporates not only the path for blending butalso the paths for blend blends and blended

The uncertainty that arises at branching points in thetriphone graph is of interest in the light of several exper-imental results For instance inter keystroke intervals intyping become longer at syllable and morph boundaries[129 130] Evidence from articulography suggests variabilityis greater at morph boundaries [131] Longer keystroke exe-cution times and greater articulatory variability are exactlywhat is expected under reduced edge support We thereforeexamined whether the edge weights at the first branchingpoint are predictive for lexical processing To this end weinvestigated the acoustic duration of the segment at the centerof the first triphone with a reduced LDL edge weight (in thepresent example d in ndI henceforth lsquobranching segmentrsquo)for those words in our study that are attested in the Buckeyecorpus [110]

The dataset we extracted from the Buckeye corpus com-prised 15105 tokens of a total of 1327 word types collectedfrom 40 speakers For each of these words we calculatedthe relative duration of the branching segment calculatedby dividing segment duration by word duration For thepurposes of statistical evaluation the distribution of thisrelative duration was brought closer to normality bymeans ofa logarithmic transformationWe fitted a generalized additivemixed model [132] to log relative duration with randomintercepts for word and speaker log LDL edge weight aspredictor of interest and local speech rate (Phrase Rate)log neighborhood density (Ncount) and log word frequencyas control variables A summary of this model is presentedin Table 7 As expected an increase in LDL Edge Weight

goes hand in hand with a reduction in the duration of thebranching segment In other words when the edge weight isreduced production is slowed

The adverse effects of weaknesses in a wordrsquos path inthe graph where the path branches is of interest againstthe discussion about the function of weak links in diphonetransitions in the literature on comprehension [133ndash138] Forcomprehension it has been argued that bigram or diphonelsquotroughsrsquo ie low transitional probabilities in a chain of hightransitional probabilities provide points where sequencesare segmented and parsed into their constituents Fromthe perspective of discrimination learning however low-probability transitions function in exactly the opposite wayfor comprehension [54 124] Naive discriminative learningalso predicts that troughs should give rise to shorter process-ing times in comprehension but not because morphologicaldecomposition would proceed more effectively Since high-frequency boundary bigrams and diphones are typically usedword-internally across many words they have a low cuevalidity for thesewords Conversely low-frequency boundarybigrams are much more typical for very specific base+affixcombinations and hence are better discriminative cues thatafford enhanced activation of wordsrsquo lexomes This in turngives rise to faster processing (see also Ramscar et al [78] forthe discriminative function of low-frequency bigrams in low-frequency monolexomic words)

There is remarkable convergence between the directedgraphs for speech production such as illustrated in Figure 9and computational models using temporal self-organizing

Complexity 25

Table 7 Statistics for the partial effects of a generalized additive mixed model fitted to the relative duration of edge segments in the Buckeyecorpus The trend for log frequency is positive accelerating TPRS thin plate regression spline

A parametric coefficients Estimate Std Error t-value p-valueIntercept -19165 00592 -323908 lt 00001Log LDL Edge Weight -01358 00400 -33960 00007Phrase Rate 00047 00021 22449 00248Log Ncount 01114 00189 58837 lt 00001B smooth terms edf Refdf F-value p-valueTPRS log Frequency 19007 19050 75495 00004random intercepts word 11498839 13230000 314919 lt 00001random intercepts speaker 302995 390000 345579 lt 00001

maps (TSOMs [139ndash141]) TSOMs also use error-drivenlearning and in recent work attention is also drawn toweak edges in wordsrsquo paths in the TSOM and the conse-quences thereof for lexical processing [142] We are not usingTSOMs however one reason being that in our experiencethey do not scale up well to realistically sized lexiconsA second reason is that we are pursuing the hypothesisthat form serves meaning and that self-organization ofform by itself is not necessary This hypothesis is likelytoo strong especially as we do not provide a rationale forthe trigram and triphone units that we use to representaspects of form It is worth noting that spatial organizationof form features is not restricted to TSOMs Although inour approach the spatial organization of triphone features isleft unspecified such an organization can be easily enforced(see [85] for further details) by using an algorithm suchas graphopt which self-organizes the vertices of a graphinto a two-dimensional plane Figure 9 was obtained withthis algorithm (Graphopt was developed for the layout oflarge graphs httpwwwschmuhlorggraphopt andis implemented in the igraph package Graphopt uses basicprinciples of physics to iteratively determine an optimallayout Each node in the graph is given both mass and anelectric charge and edges between nodes are modeled asspringsThis sets up a system inwhich there are attracting andrepelling forces between the vertices of the graph and thisphysical system is simulated until it reaches an equilibrium)

54 Discussion Our production network reconstructsknown wordsrsquo forms with a high accuracy 999 formonolexomic words 92 for inflected words and 99 forderived words For novel complex words accuracy under10-fold cross-validation was 62 for inflected words and 75for derived words

The drop in performance for novel forms is perhapsunsurprising given that speakers understand many morewords than they themselves produce even though they hearnovel forms on a fairly regular basis as they proceed throughlife [79 143]

However we also encountered several technical problemsthat are not due to the algorithm but to the representationsthat the algorithm has had to work with First it is surprisingthat accuracy is as high as it is given that the semanticvectors are constructed from a small corpus with a very

simple discriminative algorithm Second we encounteredinconsistencies in the phonological forms retrieved from thecelex database inconsistencies that in part are due to theuse of discrete triphones Third many cases where the modelpredictions do not match the targeted triphone sequence thetargeted forms have a minor irregularity (eg resound with[z] instead of [s]) Fourth several of the typical errors that themodel makes are known kinds of speech errors or reducedforms that one might encounter in engaged conversationalspeech

It is noteworthy that themodel is almost completely data-driven The triphones are derived from celex and otherthan somemanual corrections for inconsistencies are derivedautomatically from wordsrsquo phone sequences The semanticvectors are based on the tasa corpus and were not in any wayoptimized for the production model Given the triphone andsemantic vectors the transformation matrix is completelydetermined No by-hand engineering of rules and exceptionsis required nor is it necessary to specify with hand-codedlinks what the first segment of a word is what its secondsegment is etc as in the weaver model [37] It is only in theheuristic graph algorithm that three thresholds are requiredin order to avoid that graphs become too large to remaincomputationally tractable

The speech production system that emerges from thisapproach comprises first of all an excellent memory forforms that have been encountered before Importantly thismemory is not a static memory but a dynamic one It is not arepository of stored forms but it reconstructs the forms fromthe semantics it is requested to encode For regular unseenforms however a second network is required that projects aregularized semantic space (obtained by accumulation of thesemantic vectors of content and inflectional or derivationalfunctions) onto the triphone output space Importantly itappears that no separate transformations or rules are requiredfor individual inflectional or derivational functions

Jointly the two networks both of which can also betrained incrementally using the learning rule ofWidrow-Hoff[44] define a dual route model attempts to build a singleintegrated network were not successfulThe semantic vectorsof derived words are too idiosyncratic to allow generalizationfor novel forms It is the property of the semantic vectorsof derived lexomes described in Section 323 namely thatthey are close to but not inside the cloud of their content

26 Complexity

lexomes which makes them productive Novel forms do notpartake in the idiosyncracies of lexicalized existing wordsbut instead remain bound to base and derivational lexomesIt is exactly this property that makes them pronounceableOnce produced a novel form will then gravitate as experi-ence accumulates towards the cloud of semantic vectors ofits morphological category meanwhile also developing itsown idiosyncracies in pronunciation including sometimeshighly reduced pronunciation variants (eg tyk for Dutchnatuurlijk ([natyrlk]) [111 144 145] It is perhaps possibleto merge the two routes into one but for this much morefine-grained semantic vectors are required that incorporateinformation about discourse and the speakerrsquos stance withrespect to the adressee (see eg [115] for detailed discussionof such factors)

6 Bringing in Time

Thus far we have not considered the role of time The audiosignal comes in over time longerwords are typically readwithmore than one fixation and likewise articulation is a temporalprocess In this section we briefly outline how time can bebrought into the model We do so by discussing the readingof proper names

Proper names (morphologically compounds) pose achallenge to compositional theories as the people referredto by names such as Richard Dawkins Richard NixonRichardThompson and SandraThompson are not obviouslysemantic composites of each other We therefore assignlexomes to names irrespective of whether the individu-als referred to are real or fictive alive or dead Further-more we assume that when the personal name and familyname receive their own fixations both names are under-stood as pertaining to the same named entity which istherefore coupled with its own unique lexome Thus forthe example sentences in Table 8 the letter trigrams ofJohn are paired with the lexome JohnClark and like-wise the letter trigrams of Clark are paired with this lex-ome

By way of illustration we obtained thematrix of semanticvectors training on 100 randomly ordered tokens of thesentences of Table 8 We then constructed a trigram cuematrix C specifying for each word (John Clark wrote )which trigrams it contains In parallel a matrix L specify-ing for each word its corresponding lexomes (JohnClarkJohnClark write ) was set up We then calculatedthe matrix F by solving CF = L and used F to cal-culate estimated (predicted) semantic vectors L Figure 10presents the correlations of the estimated semantic vectorsfor the word forms John John Welsch Janet and Clarkwith the targeted semantic vectors for the named entitiesJohnClark JohnWelsch JohnWiggam AnneHastieand JaneClark For the sequentially read words John andWelsch the semantic vector generated by John and thatgenerated by Welsch were summed to obtain the integratedsemantic vector for the composite name

Figure 10 upper left panel illustrates that upon readingthe personal name John there is considerable uncertainty

about which named entity is at issue When subsequentlythe family name is read (upper right panel) uncertainty isreduced and John Welsch now receives full support whereasother named entities have correlations close to zero or evennegative correlations As in the small world of the presenttoy example Janet is a unique personal name there is nouncertainty about what named entity is at issue when Janetis read For the family name Clark on its own by contrastJohn Clark and Janet Clark are both viable options

For the present example we assumed that every ortho-graphic word received one fixation However depending onwhether the eye lands far enough into the word namessuch as John Clark and compounds such as graph theorycan also be read with a single fixation For single-fixationreading learning events should comprise the joint trigrams ofthe two orthographic words which are then simultaneouslycalibrated against the targeted lexome (JohnClark graph-theory)

Obviously the true complexities of reading such asparafoveal preview are not captured by this example Never-theless as a rough first approximation this example showsa way forward for integrating time into the discriminativelexicon

7 General Discussion

We have presented a unified discrimination-driven modelfor visual and auditory word comprehension and wordproduction the discriminative lexicon The layout of thismodel is visualized in Figure 11 Input (cochlea and retina)and output (typing and speaking) systems are presented inlight gray the representations of themodel are shown in darkgray For auditory comprehension we make use of frequencyband summary (FBS) features low-level cues that we deriveautomatically from the speech signal The FBS features serveas input to the auditory network F119886 that maps vectors of FBSfeatures onto the semantic vectors of S For reading lettertrigrams represent the visual input at a level of granularitythat is optimal for a functional model of the lexicon (see [97]for a discussion of lower-level visual features) Trigrams arerelatively high-level features compared to the FBS featuresFor features implementing visual input at a much lower levelof visual granularity using histograms of oriented gradients[146] see Linke et al [15] Letter trigrams are mapped by thevisual network F119900 onto S

For reading we also implemented a network heredenoted by K119886 that maps trigrams onto auditory targets(auditory verbal images) represented by triphones Theseauditory triphones in turn are mapped by network H119886onto the semantic vectors S This network is motivated notonly by our observation that for predicting visual lexicaldecision latencies triphones outperform trigrams by a widemargin Network H119886 is also part of the control loop forspeech production see Hickok [147] for discussion of thiscontrol loop and the necessity of auditory targets in speechproduction Furthermore the acoustic durations with whichEnglish speakers realize the stem vowels of English verbs[148] as well as the duration of word final s in English [149]

Complexity 27

Table 8 Example sentences for the reading of proper names To keep the example simple inflectional lexomes are not taken into account

John Clark wrote great books about antsJohn Clark published a great book about dangerous antsJohn Welsch makes great photographs of airplanesJohn Welsch has a collection of great photographs of airplanes landingJohn Wiggam will build great harpsichordsJohn Wiggam will be a great expert in tuning harpsichordsAnne Hastie has taught statistics to great second year studentsAnne Hastie has taught probability to great first year studentsJanet Clark teaches graph theory to second year studentsJohnClark write great book about antJohnClark publish a great book about dangerous antJohnWelsch make great photograph of airplaneJohnWelsch have a collection of great photograph airplane landingJohnWiggam future build great harpsichordJohnWiggam future be a great expert in tuning harpsichordAnneHastie have teach statistics to great second year studentAnneHastie have teach probability to great first year studentJanetClark teach graphtheory to second year student

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

John

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

John Welsch

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

Janet

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

Clark

Pear

son

corr

elat

ion

minus0

50

00

51

0

Figure 10 Correlations of estimated semantic vectors for John JohnWelsch Janet andClarkwith the targeted semantic vectors of JohnClarkJohnWelsch JohnWiggam AnneHastie and JaneClark

can be predicted from ndl networks using auditory targets todiscriminate between lexomes [150]

Speech production is modeled by network G119886 whichmaps semantic vectors onto auditory targets which in turnserve as input for articulation Although current modelsof speech production assemble articulatory targets fromphonemes (see eg [126 147]) a possibility that is certainlycompatible with our model when phones are recontextual-ized as triphones we think that it is worthwhile investigatingwhether a network mapping semantic vectors onto vectors ofarticulatory parameters that in turn drive the articulators canbe made to work especially since many reduced forms arenot straightforwardly derivable from the phone sequences oftheir lsquocanonicalrsquo forms [111 144]

We have shown that the networks of our model havereasonable to high production and recognition accuraciesand also generate a wealth of well-supported predictions forlexical processing Central to all networks is the hypothesisthat the relation between form and meaning can be modeleddiscriminatively with large but otherwise surprisingly simplelinear networks the underlyingmathematics of which (linearalgebra) is well-understood Given the present results it isclear that the potential of linear algebra for understandingthe lexicon and lexical processing has thus far been severelyunderestimated (the model as outlined in Figure 11 is amodular one for reasons of computational convenience andinterpretabilityThe different networks probably interact (seefor some evidence [47]) One such interaction is explored

28 Complexity

typing speaking

orthographic targets

trigrams

auditory targets

triphones

semantic vectors

TASA

auditory cues

FBS features

orthographic cues

trigrams

cochlea retina

To Ta

Ha

GoGa

KaS

Fa Fo

CoCa

Figure 11 Overview of the discriminative lexicon Input and output systems are presented in light gray the representations of the model areshown in dark gray Each of these representations is a matrix with its own set of features Arcs are labeled with the discriminative networks(transformationmatrices) thatmapmatrices onto each otherThe arc from semantic vectors to orthographic vectors is in gray as thismappingis not explored in the present study

in Baayen et al [85] In their production model for Latininflection feedback from the projected triphone paths backto the semantics (synthesis by analysis) is allowed so thatof otherwise equally well-supported paths the path that bestexpresses the targeted semantics is selected For informationflows between systems in speech production see Hickok[147]) In what follows we touch upon a series of issues thatare relevant for evaluating our model

Incremental learning In the present study we esti-mated the weights on the connections of these networkswith matrix operations but importantly these weights canalso be estimated incrementally using the learning rule ofWidrow and Hoff [44] further improvements in accuracyare expected when using the Kalman filter [151] As allmatrices can be updated incrementally (and this holds aswell for the matrix with semantic vectors which are alsotime-variant) and in theory should be updated incrementally

whenever information about the order of learning events isavailable the present theory has potential for studying lexicalacquisition and the continuing development of the lexiconover the lifetime [79]

Morphology without compositional operations Wehave shown that our networks are predictive for a rangeof experimental findings Important from the perspective ofdiscriminative linguistics is that there are no compositionaloperations in the sense of Frege and Russell [1 2 152] Thework on compositionality in logic has deeply influencedformal linguistics (eg [3 153]) and has led to the beliefthat the ldquoarchitecture of the language facultyrdquo is groundedin a homomorphism between a calculus (or algebra) ofsyntactic representations and a calculus (or algebra) based onsemantic primitives Within this tradition compositionalityarises when rules combining representations of form arematched with rules combining representations of meaning

Complexity 29

The approach of Marelli and Baroni [60] who derive thesemantic vector of a complex word from the semantic vectorof its base and a dedicated affix-specific mapping from thepertinent base words to the corresponding derived wordsis in the spirit of compositionality in that a change in form(through affixation) goes hand in hand with a change inmeaning (modeled with a linear mapping in semantic space)

The perspective on morphological processing developedin the present study breaks with this tradition Althoughsemantic vectors for inflected words are obtained by summa-tion of the semantic vectors of the base word and those ofits inflectional functions (see [85] for modeling of the richverbal paradigms of Latin) the form vectors correspondingto the resulting semantic vectors are not obtained by thesummation of vectors for chunks of form nor by any otheroperations on forms Attempts to align chunks of wordsrsquo formwith their semantic structures mdash typically represented bygraphs constructed over semantic primitives (see eg [154])mdash are destined to end in quagmires of arbitrary decisionsabout the number of inflectional classes required (Estonian30 or 300) what the chunks should look like (does Germanhave a morpheme d in its articles and demonstratives derdie das diese ) whether Semitic morphology requiresoperations on form on one or two tiers [155ndash157] andhow many affixal position slots are required for Athebascanlanguages such asNavajo [158] A language that is particularlypainful for compositional theories is Yelı Dnye an Indo-Pacific language spoken on Rossel island (in the south-east ofPapuaNewGuinea)The language has a substantial inventoryof over a 1000 function words used for verb agreementTypically these words are monosyllables that simultaneouslyspecify values for negation tense aspect person and numberof subject deixis evidentiality associated motion and coun-terfactuality Furthermore verbs typically have no less thaneight different stems the use of which depends on particularconstellations of inflectional functions [159 160] Yelı Dnyethus provides a striking counterexample to the hypothesisthat a calculus of meaning would be paralleled by a calculusof form

Whereas the theory of linear discriminative learningradically rejects the idea that morphological processing isgrounded in compositionality as defined in mainstream for-mal linguistic theory it does allow for the conjoint expressionin form of lexical meanings and inflectional functions andfor forms to map onto such conjoint semantic vectors Butjust asmultiple inflectional functions can be expressed jointlyin a single word multiple words can map onto nonconjointsemantic vectors as was illustrated above for named entityrecognition John Wiggam can be understood to refer to avery specific person without this person being a composi-tional function of the proper name John and the surnameWiggam In both these cases there is no isomorphy betweeninflectional functions and lexical meanings on the one handand chunks of form on the other hand

Of course conjoint vectors for lexical meanings andinflectional functions can be implemented andmade to workonly because a Latin verb form such as sapıvissemus isunderstood when building the semantic space S to expresslexomes for person (first) number (plural) voice (active)

tense (pluperfect) mood (subjunctive) and lexical meaning(to know) (see [85] for computational modeling details)Although this could loosely be described as a decomposi-tional approach to inflected words we prefer not to use theterm lsquodecompositionalrsquo as in formal linguistics this termas outlined above has a very different meaning A moreadequate terminology would be that the system is conjointrealizational for production and conjoint inferential for com-prehension (we note here that we have focused on temporallyconjoint realization and inference leaving temporally disjointrealization and inference for sequences of words and possiblycompounds and fixed expression for further research)

Naive versus linear discriminative learning There isone important technical difference between the learningengine of the current study ldl and naive discriminativelearning (ndl) Because ndl makes the simplifying assump-tion that outcomes are orthogonal the weights from the cuesto a given outcome are independent of the weights from thecues to the other outcomes This independence assumptionwhich motivates the word naive in naive discriminativelearning makes it possible to very efficiently update networkweights during incremental learning In ldl by contrast thisindependence no longer holds learning is no longer naiveWe therefore refer to our networks not as ndl networksbut simply as linear discriminative learning networks Actualincremental updating of ldl networks is computationallymore costly but further numerical optimization is possibleand implementations are underway

Scaling An important property of the present approachis that good results are obtained already for datasets ofmodestsize Our semantic matrix is derived from the tasa corpuswhich with only 10 million words is dwarfed by the 28 billionwords of the ukWaC corpus used by Marelli and Baroni [60]more words than any individual speaker can ever encounterin their lifetime Similarly Arnold et al [18] report goodresults for word identification when models are trained on20 hours of speech which contrasts with the huge volumesof speech required for training deep learning networks forspeech recognition Although performance increases some-whatwithmore training data [161] it is clear that considerableheadway can be made with relatively moderate volumesof speech It thus becomes feasible to train models onfor instance Australian English and New Zealand Englishusing already existing speech corpora and to implementnetworks that make precise quantitative predictions for howunderstanding is mediated by listenersrsquo expectations abouttheir interlocutors (see eg [162])

In this study we have obtained good results with asemantic matrix with a dimensionality of around 4000lexomes By itself we find it surprising that a semantic vectorspace can be scaffolded with a mere 4000 words But thisfindingmay also shed light on the phenomenon of childhoodamnesia [163 164] Bauer and Larkina [165] pointed outthat young children have autobiographical memories thatthey may not remember a few years later They argue thatthe rate of forgetting diminishes as we grow older andstabilizes in adulthood Part of this process of forgettingmay relate to the changes in the semantic matrix Cruciallywe expect the correlational structure of lexomes to change

30 Complexity

considerably over time as experience accumulates If this isindeed the case the semantic vectors for the lexomes foryoung children are different from the corresponding lexomesfor older children which will again differ from those ofadults As a consequence autobiographical memories thatwere anchored to the lexomes at a young age can no longerbe accessed as the semantic vectors to which these memorieswere originally anchored no longer exist

No examplars Given how we estimate the transforma-tion matrices with which we navigate between form andmeaning it might seem that our approach is in essence animplementation of an exemplar-based theory of the mentallexicon After all for instance the C119886 matrix specifies foreach auditory word form (exemplar) its pertinent frequencyband summary features However the computational imple-mentation of the present study should not be confused withthe underlying algorithmic conceptualization The underly-ing conceptualization is that learning proceeds incrementallytrial by trial with the weights of transformation networksbeing updated with the learning rule of Widrow and Hoff[44] In the mean trial-by-trial updating with the Widrow-Hoff learning rule results in the same expected connectionweights as obtained by the present estimates using thegeneralized inverse The important conceptual advantage ofincremental learning is however that there is no need to storeexemplars By way of example for auditory comprehensionacoustic tokens are given with the input to the learningsystemThese tokens the ephemeral result of interactingwiththe environment leave their traces in the transformationmatrix F119886 but are not themselves stored The same holdsfor speech production We assume that speakers dynamicallyconstruct the semantic vectors from their past experiencesrather than retrieving these semantic vectors from somedictionary-like fixed representation familiar from currentfile systems The dynamic nature of these semantic vectorsemerged loud and clear from the modeling of novel complexwords for speech production under cross-validation In otherwords our hypothesis is that all input and output vectors areephemeral in the sense that they represent temporary statesof the cognitive system that are not themselves represented inthat system but that leave their traces in the transformationmatrices that jointly characterize and define the cognitivesystem that we approximate with the discriminative lexicon

This dynamic perspective on the mental lexicon asconsisting of several coupled networks that jointly bringthe cognitive system into states that in turn feed into fur-ther networks (not modeled here) for sentence integration(comprehension) and articulation (production) raises thequestion about the status of units in this theory Units playan important role when constructing numeric vectors forform and meaning Units for letter trigrams and phonetrigrams are of course context-enriched letter and phoneunits Content lexomes as well as lexomes for derivationaland inflectional functions are crutches for central semanticfunctions ranging from functions realizing relatively concreteonomasiological functions (lsquothis is a dog not a catrsquo) toabstract situational functions allowing entities events andstates to be specified on the dimensions of time spacequantity aspect etc However crucial to the central thrust of

the present study there are no morphological symbols (unitscombining form andmeaning) normorphological formunitssuch as stems and exponents of any kind in the modelAbove we have outlined the many linguistic considerationsthat have led us not to want to incorporate such units aspart of our theory Importantly as there are no hidden layersin our model it is not possible to seek for lsquosubsymbolicrsquoreflexes of morphological units in patterns of activation overhidden layers Here we depart from earlier connectionistmodels which rejected symbolic representations but retainedthe belief that there should be morphological representationalbeit subsymbolic ones Seidenberg and Gonnerman [166]for instance discuss morphological representations as beinglsquointerlevel representationsrsquo that are emergent reflections ofcorrelations among orthography phonology and semanticsThis is not to say that themore agglutinative a language is themore the present model will seem to the outside observer tobe operating withmorphemes But themore a language tendstowards fusional or polysyntheticmorphology the less strongthis impression will be

Model complexity Next consider the question of howdifficult lexical processing actually is Sidestepping the com-plexities of the neural embedding of lexical processing [126147] here we narrow down this question to algorithmiccomplexity State-of-the-art models in psychology [37 114167] implementmany layers of hierarchically organized unitsand many hold it for an established fact that such units arein some sense psychologically real (see eg [19 91 168])However empirical results can be equally well understoodwithout requiring the theoretical construct of the morpheme(see eg [47 54 58 166 169 170]) The present studyillustrates this point for lsquoboundaryrsquo effects in written andoral production When paths in the triphone graph branchuncertainty increases and processing slows down Althoughobserved lsquoboundaryrsquo effects are compatible with a post-Bloomfieldian lexicon the very same effects arise in ourmodel even though morphemes do not figure in any way inour computations

Importantly the simplicity of the linear mappings in ourmodel of the discriminative lexicon allows us to sidestep theproblem that typically arise in computationally implementedfull-scale hierarchical systems namely that errors arisingat lower levels propagate to higher levels The major stepsforward made in recent end-to-end models in languageengineering suggest that end-to-end cognitive models arealso likely to perform much better One might argue thatthe hidden layers of deep learning networks represent thetraditional lsquohidden layersrsquo of phonemes morphemes andword forms mediating between form and meaning in lin-guistic models and offshoots thereof in psychology Howeverthe case of baboon lexical learning [12 13] illustrates thatthis research strategy is not without risk and that simplerdiscriminative networks can substantially outperform deeplearning networks [15] We grant however that it is conceiv-able that model predictions will improve when the presentlinear networks are replaced by deep learning networks whenpresented with the same input and output representationsused here (see Zhao et al [171] for a discussion of lossfunctions for deep learning targeting numeric instead of

Complexity 31

categorical output vectors) What is unclear is whethersuch networks will improve our understandingmdashthe currentlinear networks offer the analyst connection weights betweeninput and output representations that are straightforwardlylinguistically interpretable Where we think deep learningnetworks really will come into their own is bridging the gapbetween for instance the cochlea and retina on the one handand the heuristic representations (letter trigrams) that wehave used as a starting point for our functional theory of themappings between form and meaning

Network flexibility An important challenge for deeplearning networks designed to model human lexical process-ing is to keep the networks flexible and open to rapid updat-ing This openness to continuous learning is required notonly by findings on categorization [40 41 41 42] includingphonetic categorization [172 173] but is also visible in timeseries of reaction times Baayen and Hendrix [17] reportedimproved prediction of visual lexical decision reaction timesin the British Lexicon Project when a naive discriminativelearning network is updated from trial to trial simulating thewithin-experiment learning that goes on as subjects proceedthrough the experiment In natural speech the phenomenonof phonetic convergence between interlocutors [174] likewisebears witness to a lexicon that is constantly recalibratedin interaction with its environment (for the many ways inwhich interlocutorsrsquo speech aligns see [175]) Likewise therapidity with which listeners adapt to foreign accents [176]within a few minutes shows that lexical networks exhibitfast local optimization As deep learning networks typicallyrequire huge numbers of cycles through the training dataonce trained they are at risk of not having the requiredflexibility for fast local adaptation This risk is reduced for(incremental) wide learning as illustrated by the phoneticconvergence evidenced by the model of Arnold et al [18] forauditory comprehension (a very similar point was made byLevelt [177] in his critique of connectionist models in thenineties)

Open questions This initial study necessarily leavesmany questions unanswered Compounds and inflectedderived words have not been addressed and morphologicaloperations such as reduplication infixation and nonconcate-native morphology provide further testing grounds for thepresent approach We also need to address the processingof compounds with multiple constituents ubiquitous inlanguages as different as German Mandarin and EstonianFurthermore in actual utterances words undergo assimila-tion at their boundaries and hence a crucial test case forthe discriminative lexicon is to model the production andcomprehension of words in multi-word utterances For ourproduction model we have also completely ignored stresssyllabification and pitch which however likely has renderedthe production task more difficult than necessary

A challenge for discriminative morphology is the devel-opment of proper semantic matrices For languages withrich morphology such as Estonian current morphologicalparsers will identify case-inflected words as nominativesgenitives or partitives (among others) but these labelsfor inflectional forms do not correspond to the semanticfunctions encoded in these forms (see eg [178 179]) What

is required are computational tools that detect the appropriateinflectional lexomes for these words in the sentences orutterances in which they are embedded

A related problem specifically for the speech productionmodel is how to predict the strongly reduced word formsthat are rampant in conversational speech [111] Our auditorycomprehension network F119886 is confronted with reduced formsin training and the study of Arnold et al [18] indicates thatpresent identification accuracymay approximate human per-formance for single-word recognition Whereas for auditorycomprehension we are making progress what is required forspeech production is a much more detailed understanding ofthe circumstances under which speakers actually use reducedword forms The factors driving reduction may be in partcaptured by the semantic vectors but we anticipate thataspects of the unfolding discourse play an important role aswell which brings us back to the unresolved question of howto best model not isolated words but words in context

An important challenge for our theory which formalizesincremental implicit learning is how to address and modelthe wealth of explicit knowledge that speakers have abouttheir language We play with the way words sound in poetrywe are perplexed by wordsrsquo meanings and reinterpret themso that they make more sense (eg the folk etymology [180]reflected in modern English crawfish a crustacean and nota fish originally Middle English crevis compare ecrevissein French) we teach morphology in schools and we evenfind that making morphological relations explicit may bebeneficial for aphasic patients [181] The rich culture of worduse cannot but interact with the implicit system but howexactly this interaction unfolds andwhat its consequences areat both sides of the consciousunconscious divide is presentlyunclear

Although many questions remain our model is remark-ably successful in capturing many aspects of lexical andmorphological processing in both comprehension and pro-duction Since themodel builds on a combination of linguisti-callymotivated lsquosmartrsquo low-level representations for form andmeaning and large but otherwise straightforward andmathe-matically well-understood two-layer linear networks (to beclear networks without any hidden layers) the conclusionseems justified that algorithmically the ldquomental lexiconrdquomaybe much simpler than previously thought

Appendix

A Vectors Matrices and MatrixMultiplication

Figure 12 presents two data points 119886 and 119887 with coordinates(3 4) and (-2 -3) Arrows drawn from the origin to thesepoints are shown in blue with a solid and dashed linerespectively We refer to these points as vectors Vectors aredenoted with lower case letters in bold and we place the 119909coordinate of a point next to the 119910 coordinate

a = (3 4) b = (minus2 minus3) (A1)

32 Complexity

minus4 minus2 0 2 4

minus4

minus2

02

4

x

y

a

b

x

y

Figure 12 Points 119886 and 119887 can be transformed into points 119901 and 119902 bya linear transformation

We can represent a and b jointly by placing them next to eachother in a matrix For matrices we use capital letters in boldfont

A = ( 3 4minus2 minus3) (A2)

In addition to the points 119886 and 119887 Figure 12 also shows twoother datapoints 119909 and 119910 The matrix for these two points is

B = (minus5 24 minus1) (A3)

Let us assume that points 119886 and 119887 represent the forms oftwo words 1205961 and 1205962 and that the points 119909 and 119910 representthe meanings of these two words (below we discuss in detailhow exactly wordsrsquo forms and meanings can be representedas vectors of numbers) As morphology is the study of therelation between wordsrsquo forms and their meanings we areinterested in how to transform points 119886 and b into points119909 and 119910 and likewise in how to transform points 119909 and119910 back into points 119886 and 119887 The first transformation isrequired for comprehension and the second transformationfor production

We begin with the transformation for comprehensionFormally we have matrix A which we want to transforminto matrix B A way of mapping A onto B is by means ofa linear transformation using matrix multiplication For 2 times2 matrices matrix multiplication is defined as follows(1198861 11988621198871 1198872) (1199091 11990921199101 1199102) = (11988611199091 + 11988621199101 11988611199092 + 1198862119910211988711199091 + 11988721199101 11988711199092 + 11988721199102) (A4)

Such a mapping of A onto B is given by a matrix F

F = ( 1 2minus2 minus1) (A5)

and it is straightforward to verify that indeed( 3 4minus2 minus3) ( 1 2minus2 minus1) = (minus5 24 minus1) (A6)

Using matrix notation we can write

AF = B (A7)

Given A and B how do we obtain F The answeris straightforward but we need two further concepts theidentity matrix and the matrix inverse For multiplication ofnumbers the identity multiplication is multiplication with 1For matrices multiplication with the identity matrix I leavesthe locations of the points unchangedThe identitymatrix hasones on the main diagonal and zeros elsewhere It is easilyverified that indeed(1 00 1) (1199091 11990921199101 1199102) = (1199091 11990921199101 1199102) (1 00 1) = (1199091 11990921199101 1199102) (A8)

For numbers the inverse of multiplication with 119904 is dividingby 119904 (1119904 ) (119904119909) = 119909 (A9)

For matrices the inverse of a square matrix X is that matrixXminus1 such that their product is the identity matrix

Xminus1X = XXminus1 = I (A10)

For nonsquare matrices the inverse Yminus1 is defined such that

Yminus1 (YX) = (XY) 119884minus1 = X (A11)

We find the square matrix F that maps the square matrices Aonto B as follows

AF = B

Aminus1AF = Aminus1B

IF = Aminus1B

F = Aminus1B(A12)

Since for the present example A and its inverse happen to beidentical (Aminus1 = A) we obtain (A6)

Fmaps wordsrsquo form vectors onto wordsrsquo semantic vectorsLet us now consider the reverse and see how we can obtainwordsrsquo form vectors from wordsrsquo semantic vectors That isgiven B we want to find that matrix G which maps B ontoA ie

BG = A (A13)

Solving this equation proceeds exactly as above

BG = A

Bminus1BG = Bminus1A

IG = Bminus1A

G = Bminus1A(A14)

Complexity 33

The inverse of B is

Bminus1 = (13 2343 53) (A15)

and hence we have

(13 2343 53) ( 3 4minus2 minus3) = (minus13 minus2323 13 ) = G (A16)

The inverse of a matrix needs not exist A square matrixthat does not have an inverse is referred to as a singularmatrix Almost all matrices with which we work in theremainder of this study are singular For singular matricesan approximation of the inverse can be used such as theMoore-Penrose generalized inverse (In R the generalizedinverse is available in theMASS package function ginv Thenumeric library that is used by ginv is LAPACK availableat httpwwwnetliborglapack) In this study wedenote the generalized inverse of a matrix X by X1015840

The examples presented here are restricted to 2 times 2matrices but themathematics generalize tomatrices of largerdimensions When multiplying two matrices X and Y theonly constraint is that the number of columns ofX is the sameas the number of rows ofYThe resultingmatrix has the samenumber of rows as X and the same number of columns as Y

In the present study the rows of matrices represent wordsand columns the features of thesewords For instance the rowvectors of A can represent wordsrsquo form features and the rowvectors of and B the corresponding semantic features Thefirst row vector of A is mapped onto the first row vector ofB as follows (3 4) ( 1 2minus2 minus1) = (minus5 2) (A17)

A transformationmatrix such as F can be represented as atwo-layer networkThe network corresponding to F is shownin Figure 13 When the input vector (3 4) is presented to thenetwork the nodes 1198941 and 1198942 are set to 3 and 4 respectively Toobtain the activations 1199001 = minus5 and 1199002 = 2 for the output vector(minus5 2) we cumulate the evidence at the input weighted bythe connection strengths specified on the edges of the graphThus for 1199001 we obtain the value 3 times 1 + 4 times minus2 = minus5 and for1199002 we have 3 times 2 + 4 times minus1 = 2 In other words the networkmaps the input vector (3 4) onto the output vector (minus5 2)

An important property of linear maps is that they areproductive We can present a novel form vector say

s = (2 2) (A18)

to the network and it will map this vector onto a novelsemantic vector Using matrices this new semantic vector isstraightforwardly calculated(2 2) ( 1 2minus2 minus1) = (minus2 2) (A19)

1 2 minus2 minus1

i1 i2

o1 o2

semantic vectors come out here

form features go in here

connection weightsare the transformationmatrix

Figure 13 The linear network corresponding to the transformationmatrix F There are two input nodes 1198941 and 1198942 (in blue)and twooutput nodes 1199001 and 1199002 in red When the input vector (3 4) ispresented to the network the value at 1199001 is 3 times 1 + 4 times minus2 = minus5and that at 1199002 is 3 times 2 + 4 times minus1 = 2 The network maps the point(3 4) onto the point (minus5 2)B Graph-Based Triphone Sequencing

The hypothesis underlying graph-based triphone sequencingis that in the directed graph that has triphones as verticesand an edge between any two vertices that overlap (iesegments 2 and 3 of triphone 1 are identical to segments 1and 2 of triphone 2) the path from a wordrsquos initial triphone(the triphone starting with ) to a wordrsquos final triphone (thetriphone ending with a ) is the path receiving the bestsupport of all possible paths from the initial to the finaltriphone Unfortunately the directed graph containing allvertices and edges is too large tomake computations tractablein a reasonable amount of computation time The followingalgorithm is a heuristic algorithm that first collects potentiallyrelevant vertices and edges and then calculates all pathsstarting fromany initial vertex using theall simple pathsfunction from the igraph package [127] As a first step allvertices that are supported by the stem and whole wordwith network support exceeding 01 are selected To theresulting set those vertices are added that are supported bythe affix which is conditional on these vertices having anetwork support exceeding 095 A directed graph can nowbe constructed and for each initial vertex all paths startingat these vertices are calculated The subset of those pathsreaching a final vertex is selected and the support for eachpath is calculated As longer paths trivially will have greatersupport the support for a path is weighted for its lengthsimply by dividing the raw support by the path length Pathsare ranked by weighted path support and the path withmaximal support is selected as the acoustic image drivingarticulation

It is possible that no path from an initial vertex to a finalvertex is found due to critical boundary triphones not beinginstantiated in the data on which the model was trainedThis happens when the model is assessed on novel inflected

34 Complexity

nobo

dyso

met

hing

ever

ythi

ngno

thin

gev

eryo

nean

ythi

ngan

yone

som

eone

hers

him his

via

than I

exce

ptou

tside

thei

rsw

hoev

erth

ence

min

ein

side

amon

gst its

thei

rou

rm

yyo

uyo

ur her

she

your

selv

esw

hate

ver us we it

they he

them m

em

yself

him

self

hers

elfou

rselv

esyo

urse

lfits

elfth

emse

lves

abou

tw

ithou

tw

ithin

beyo

ndam

ong

betw

een

thro

ugho

utdu

ring

since by

from in of

with at fo

r tobe

low

abov

e up offdo

wn

upon on

agai

nst

tow

ard

tow

ards

arou

ndne

aral

ong

unde

rbe

hind

besid

eth

roug

hon

toin

toov

erac

ross

nobodysomethingeverythingnothingeveryoneanythinganyonesomeonehershimhisviathanIexceptoutsidetheirswhoeverthencemineinsideamongstitstheirourmyyouyourhersheyourselveswhateverusweittheyhethemmemyselfhimselfherselfourselvesyourselfitselfthemselvesaboutwithoutwithinbeyondamongbetweenthroughoutduringsincebyfrominofwithatfortobelowaboveupoffdownupononagainsttowardtowardsaroundnearalongunderbehindbesidethroughontointooveracross

minus01 0 01 02Value

Color Keyand Histogram

050

015

0025

00C

ount

Figure 14 Heatmap for the correlations of the semantic vectors of pronouns and prepositions Note that quantifier pronouns form a groupof their own that prepositions and the other pronouns form distinguisable groups and that within groups further subclusters are visible egforwe and us she and her and across and over All diagonal elements represent correlations equal to 1 (this is not brought out by the heatmapwhich color-codes diagonal elements with the color for the highest correlation of the range we specified 025)

and derived words under cross-validationTherefore verticesand edges are added to the graph for all nonfinal verticesthat are at the end of any of the paths starting at any initialtriphone Such novel vertices and edges are assigned zeronetwork support Typically paths from the initial vertex tothe final vertex can now be constructed and path support isevaluated as above

For an algorithm that allows triphone paths to includecycles which may be the case in languages with much richer

morphology than English see Baayen et al [85] and for codethe R packageWpmWithLdl described therein

C A Heatmap for Function Words(Pronouns and Prepositions)

The heatmap for the correlations of the semantic vectors ofpronouns and prepositions is presented in Figure 14

Complexity 35

Data Availability

With the exception of the data from theDistributed Little RedHen Lab all primary data are publicly available from the citedsources

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The authors are indebted to Geoff Hollis Jessie Nixon ChrisWestbury and LuisMienhardt for their constructive feedbackon earlier versions of this manuscript This research wassupported by an ERC advancedGrant (no 742545) to the firstauthor

References

[1] G Frege ldquoBegriffsschrift a formula language modeled uponthat of arithmetic for pure thoughtrdquo in From Frege to GodelA Source Book in Mathematical Logic 1879-1931 pp 1ndash82 1967

[2] B RussellAn Inquiry intoMeaning andTruth Allen andUnwinLondon UK 1942

[3] R Montague ldquoThe proper treatment of quantification in ordi-nary englishrdquo in Approaches to Natural Language pp 221ndash242Springer 1973

[4] N Chomsky Syntactic Structures vol 33 Mouton The Hague1957

[5] W J M Levelt Formal Grammars in Linguistics And Psycholin-guistics Volume 1 An Introduction to the Theory of FormalLanguages and Automata Volume 2 Applications in LinguisticTheory Volume 3 Psycholinguistic Applications John BenjaminsPublishing 2008

[6] D F Kleinschmidt and T Florian Jaeger ldquoRobust speechperception Recognize the familiar generalize to the similar andadapt to the novelrdquo Psychological Review vol 122 no 2 pp 148ndash203 2015

[7] L Breiman ldquoRandom forestsrdquo Machine Learning vol 45 no 1pp 5ndash32 2001

[8] C Strobl A-L Boulesteix A Zeileis and T Hothorn ldquoBiasin random forest variable importance measures illustrationssources and a solutionrdquo BMC Bioinformatics vol 8 2007

[9] A Hannun C Case J Casper et al ldquoDeep speechScaling up end-to-end speech recognitionrdquo 2014httpsarxivorgabs14125567

[10] Z Tuske P Golik R Schluter and H Ney ldquoAcoustic modelingwith deep neural networks using raw time signal for lvcsrrdquo inProceedings of the INTERSPEECH pp 890ndash894 2014

[11] J Schmidhuber ldquoDeep learning in neural networks anoverviewrdquo Neural Networks vol 61 pp 85ndash117 2015

[12] T Hannagan J C Ziegler S Dufau J Fagot and J GraingerldquoDeep learning of orthographic representations in baboonsrdquoPLoS ONE vol 9 no 1 2014

[13] J Grainger S Dufau M Montant J C Ziegler and J FagotldquoOrthographic processing in baboons (Papio papio)rdquo Sciencevol 336 no 6078 pp 245ndash248 2012

[14] D Scarf K Boy A U Reinert J Devine O Gunturkun andM Colombo ldquoOrthographic processing in pigeons (columba

livia)rdquo Proceedings of the National Academy of Sciences vol 113no 40 pp 11272ndash11276 2016

[15] M Linke F Broker M Ramscar and R H Baayen ldquoArebaboons learning ldquoorthographicrdquo representations Probablynotrdquo PLoS ONE vol 12 no 8 2017

[16] D H Hubel and T N Wiesel ldquoReceptive fields binocularinteraction and functional architecture in the catrsquos visualcortexrdquo The Journal of Physiology vol 160 pp 106ndash154 1962

[17] R H Baayen and P Hendrix ldquoTwo-layer networks non-linear separation and human learningrdquo in From Semantics toDialectometry Festschrift in honor of JohnNerbonne Tributes 32M Wieling M Kroon G Van Noord and G Bouma Eds pp13ndash22 College Publications 2017

[18] D Arnold F Tomaschek K Sering F Lopez and R H BaayenldquoWords from spontaneous conversational speech can be rec-ognized with human-like accuracy by an error-driven learningalgorithm that discriminates between meanings straight fromsmart acoustic features bypassing the phoneme as recognitionunitrdquo PLoS ONE vol 12 no 4 2017

[19] P Zwitserlood ldquoProcessing and representation of morphologi-cal complexity in native language comprehension and produc-tionrdquo in The Construction of Words G E Booij Ed pp 583ndash602 Springer 2018

[20] G Booij ldquoConstruction morphologyrdquo Language and LinguisticsCompass vol 4 no 7 pp 543ndash555 2010

[21] J P Blevins ldquoWord-based morphologyrdquo Journal of Linguisticsvol 42 no 3 pp 531ndash573 2006

[22] G Stump Inflectional Morphology A Theory of ParadigmStructure Cambridge University Press 2001

[23] R Beard ldquoOn the extent and nature of irregularity in thelexiconrdquo Lingua vol 42 no 4 pp 305ndash341 1977

[24] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1974

[25] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1991

[26] C Hockett ldquoThe origin of speechrdquo Scientific American vol 203pp 89ndash97 1960

[27] M Erelt Estonian Language M Erelt Ed Estonian AcademyPublishers Tallinn Estonia 2003

[28] P H Matthews Grammatical Theory in the United States fromBloomfield to Chomsky Cambridge University Press Cam-bridge 1993

[29] N Chomsky andMHalleTheSound Pattern of English Harperand Row NY USA 1968

[30] T Landauer and S Dumais ldquoA solution to Platorsquos problem Thelatent semantic analysis theory of acquisition induction andrepresentation of knowledgerdquo Psychological Review vol 104 no2 pp 211ndash240 1997

[31] W Weaver ldquoTranslationrdquo in Machine Translation of LanguagesFourteen Essays W N Locke and A D Booth Eds pp 15ndash23MIT Press Cambridge 1955

[32] J R Firth Selected Papers of J R Firth 1952-59 IndianaUniversity Press 1968

[33] K Lund and C Burgess ldquoProducing high-dimensional seman-tic spaces from lexical co-occurrencerdquo Behavior Research Meth-ods Instruments and Computers vol 28 no 2 pp 203ndash2081996

[34] C Shaoul and C Westbury ldquoExploring lexical co-occurrencespace using HiDExrdquo Behavior Research Methods vol 42 no 2pp 393ndash413 2010

36 Complexity

[35] T Mikolov I Sutskever K Chen G Corrado and J DeanldquoDistributed representations of words and phrases and theircompositionalityrdquo inAdvances in Neural Information ProcessingSystems pp 3111ndash3119 2013

[36] A Roelofs ldquoThe WEAVER model of word-form encoding inspeech productionrdquo Cognition vol 64 no 3 pp 249ndash284 1997

[37] W J M Levelt A Roelofs and A S Meyer ldquoA theory of lexicalaccess in speech productionrdquoBehavioral and Brain Sciences vol22 no 1 pp 1ndash38 1999

[38] M Taft ldquoInteractive-activation as a Framework for Under-standing Morphological Processingrdquo Language and CognitiveProcesses vol 9 no 3 pp 271ndash294 1994

[39] R Schreuder and R H Baayen ldquoModeling morphologicalprocessingrdquo inMorphological Aspects of Language Processing LB Feldman Ed pp 131ndash154 Lawrence ErlbaumHillsdale NewJersey USA 1995

[40] B C Love D L Medin and T M Gureckis ldquoSUSTAIN ANetwork Model of Category Learningrdquo Psychological Reviewvol 111 no 2 pp 309ndash332 2004

[41] C J Marsolek ldquoWhat antipriming reveals about primingrdquoTrends in Cognitive Sciences vol 12 no 5 pp 176ndash181 2008

[42] M Ramscar and R Port ldquoCategorization (without categories)rdquoin Handbook of Cognitive Linguistics E Dabrowska and DDivjak Eds pp 75ndash99 De Gruyter Berlin Germany 2015

[43] R A Rescorla and A R Wagner ldquoA theory of Pavlovianconditioning Variations in the effectiveness of reinforcementand nonreinforcementrdquo in Classical Conditioning II CurrentResearch and Theory A H Black and W F Prokasy Eds pp64ndash99 Appleton Century Crofts NY USA 1972

[44] B Widrow and M E Hoff ldquoAdaptive switching circuitsrdquo 1960WESCON Convention Record Part IV pp 96ndash104 1960

[45] R H Baayen R Schreuder and R Sproat ldquoMorphology inthe Mental Lexicon A Computational Model for Visual WordRecognitionrdquo in Lexicon Development for Speech and LanguageProcessing F van Eynde and D Gibbon Eds pp 267ndash291Kluwer Academic Publishers 2000

[46] M Taft ldquoA morphological-decomposition model of lexicalrepresentationrdquo Linguistics vol 26 no 4 pp 657ndash667 1988

[47] MWHarmandM S Seidenberg ldquoComputing themeanings ofwords in reading Cooperative division of labor between visualand phonological processesrdquo Psychological Review vol 111 no3 pp 662ndash720 2004

[48] P C Trimmer J M McNamara A Houston and J A RMarshall ldquoDoes natural selection favour the Rescorla-Wagnerrulerdquo Journal of Theoretical Biology vol 302 pp 39ndash52 2012

[49] R R Miller R C Barnet and N J Grahame ldquoAssessment ofthe Rescorla-Wagner modelrdquo Psychological Bulletin vol 117 no3 pp 363ndash386 1995

[50] L J Kamin ldquoPredictability surprise attention and condition-ingrdquo in Punishment and Aversive Behavior B A Campbell andRMChurch Eds pp 276ndash296Appleton-Century-CroftsNYUSA 1969

[51] R A Rescorla ldquoPavlovian conditioning Itrsquos not what you thinkit isrdquo American Psychologist vol 43 no 3 pp 151ndash160 1988

[52] M Ramscar D Yarlett M Dye K Denny and K ThorpeldquoThe Effects of Feature-Label-Order and Their Implications forSymbolic Learningrdquo Cognitive Science vol 34 no 6 pp 909ndash957 2010

[53] M Ramscar and D Yarlett ldquoLinguistic self-correction in theabsence of feedback A new approach to the logical problem oflanguage acquisitionrdquo Cognitive Science vol 31 no 6 pp 927ndash960 2007

[54] R H Baayen P Milin D F ETHurđevic P Hendrix and MMarelli ldquoAn Amorphous Model for Morphological Process-ing in Visual Comprehension Based on Naive DiscriminativeLearningrdquo Psychological Review vol 118 no 3 pp 438ndash4812011

[55] F M Del Prado Martın R Bertram T Haikio R Schreuderand R H Baayen ldquoMorphological family size in a morphologi-cally rich languageThe case of Finnish compared to Dutch andHebrewrdquo Journal of Experimental Psychology LearningMemoryand Cognition vol 30 pp 1271ndash1278 2004

[56] P Milin D F Durdevic and F M Del Prado Martın ldquoThesimultaneous effects of inflectional paradigms and classes onlexical recognition Evidence from Serbianrdquo Journal of Memoryand Language vol 60 no 1 pp 50ndash64 2009

[57] B K Bergen ldquoThe psychological reality of phonaesthemesrdquoLanguage vol 80 no 2 pp 290ndash311 2004

[58] P Milin L B Feldman M Ramscar P Hendrix and R HBaayen ldquoDiscrimination in lexical decisionrdquo PLoS ONE vol 12no 2 2017

[59] R H Baayen P Milin and M Ramscar ldquoFrequency in lexicalprocessingrdquo Aphasiology vol 30 no 11 pp 1174ndash1220 2016

[60] M Marelli and M Baroni ldquoAffixation in semantic space Mod-eling morpheme meanings with compositional distributionalsemanticsrdquo Psychological Review vol 122 no 3 pp 485ndash5152015

[61] T Sering PMilin andRH Baayen ldquoLanguage comprehensionas amultiple label classification problemrdquo StatisticaNeerlandicapp 1ndash15 2018

[62] R Kaye and RWilson Linear Algebra Oxford University Press1998

[63] S H Ivens and B L Koslin Demands for Reading LiteracyRequire New Accountability Methods Touchstone Applied Sci-ence Associates 1991

[64] T K Landauer P W Foltz and D Laham ldquoAn introduction tolatent semantic analysisrdquoDiscourse Processes vol 25 no 2-3 pp259ndash284 1998

[65] H Schmid ldquoImprovements in Part-of-Speech Tagging with anApplication to Germanrdquo in Proceedings of the ACL SIGDAT-Workshop pp 13ndash25 Dublin Ireland 1995

[66] R H Baayen R Piepenbrock and L Gulikers The CELEXLexical Database (CD-ROM) Linguistic Data ConsortiumUniversity of Pennsylvania Philadelphia PA 1995

[67] J Mitchell and M Lapata ldquoVector-based models of semanticcompositionrdquo in Proceedings of the ACL pp 236ndash244 2008

[68] A Lazaridou M Marelli R Zamparelli and M BaronildquoCompositionally derived representations of morphologicallycomplex words in distributional semanticsrdquo in Proceedings ofthe ACL (1) pp 1517ndash1526 2013

[69] R Cotterell H Schutze and J Eisner ldquoMorphological smooth-ing and extrapolation of word embeddingsrdquo in Proceedings ofthe 54th Annual Meeting of the Association for ComputationalLinguistics vol 1 pp 1651ndash1660 2016

[70] B D Zeller S Pado and J Snajder ldquoTowards semantic valida-tion of a derivational lexiconrdquo in Proceedings of the COLING2014 the 25th International Conference on Computational Lin-guistics Technical Papers pp 1728ndash1739 2014

[71] H Marchand The Categories and Types of Present-Day EnglishWord Formation A Synchronic-Diachronic Approach BeckrsquoscheVerlagsbuchhandlung Munchen Germany 1969

[72] L Bauer EnglishWord-Formation Cambridge University PressCambridge 1983

Complexity 37

[73] I PlagWord-Formation in English CambridgeUniversity PressCambridge 2003

[74] R Beard Lexeme-Morpheme Base Morphology A GeneralTheory of Inflection and Word Formation State University ofNew York Press Albany NY USA 1995

[75] K Geeraert J Newman and R H Baayen ldquoIdiom VariationExperimental Data and a Blueprint of a Computational ModelrdquoTopics in Cognitive Science vol 9 no 3 pp 653ndash669 2017

[76] C Shaoul S Bitschau N Schilling et al ldquondl2 Naive discrim-inative learning an implementation in R R packagerdquo 2015

[77] P Milin D Divjak and R H Baayen ldquoA learning perspectiveon individual differences in skilled reading Exploring andexploiting orthographic and semantic discrimination cuesrdquoJournal of Experimental Psychology Learning Memory andCognition vol 43 no 11 pp 1730ndash1751 2017

[78] M Ramscar P Hendrix C Shaoul P Milin and R H BaayenldquoNonlinear dynamics of lifelong learning the myth of cognitivedeclinerdquo Topics in Cognitive Science vol 6 pp 5ndash42 2014

[79] M Ramscar C C Sun P Hendrix and R H Baayen ldquoTheMis-measurement of Mind Life-Span Changes in Paired-Associate-Learning Scores Reflect the ldquoCostrdquo of Learning Not CognitiveDeclinerdquo Psychological Science vol 28 no 8 pp 1171ndash1179 2017

[80] G desRosiers and D Ivison ldquoPaired associate learning Form 1and form 2 of the Wechsler memory scalerdquo Archives of ClinicalNeuropsychology vol 3 no 1 pp 47ndash67 1988

[81] E Bruni N K Tran andM Baroni ldquoMultimodal distributionalsemanticsrdquo Journal of Artificial Intelligence Research vol 49 pp1ndash47 2014

[82] W N Venables and B D RipleyModern Applied Statistics withS-PLUS Springer NY USA 2002

[83] A B Warriner V Kuperman and M Brysbaert ldquoNorms ofvalence arousal and dominance for 13915 English lemmasrdquoBehavior Research Methods vol 45 no 4 pp 1191ndash1207 2013

[84] D Kastovsky ldquoProductivity in word formationrdquo Linguistics vol24 no 3 pp 585ndash600 1986

[85] R H Baayen Y Chuang and J P Blevins ldquoInflectionalmorphology with linear mappingsrdquoTheMental Lexicon vol 13no 2 pp 232ndash270 2018

[86] Burnage CELEX A Guide for Users Centre for Lexical Infor-mation Nijmegen The Netherlands 1988

[87] C McBride-Chang ldquoModels of Speech Perception and Phono-logical Processing in Readingrdquo Child Development vol 67 no4 pp 1836ndash1856 1996

[88] G C Van Orden H Kloos et al ldquoThe question of phonologyand readingrdquo inThe Science of Reading A Handbook pp 61ndash782005

[89] D Jared and K OrsquoDonnell ldquoSkilled adult readers activate themeanings of high-frequency words using phonology Evidencefrom eye trackingrdquoMemoryampCognition vol 45 no 2 pp 334ndash346 2017

[90] G S Dell ldquoA Spreading-Activation Theory of Retrieval inSentence Productionrdquo Psychological Review vol 93 no 3 pp283ndash321 1986

[91] A Marantz ldquoNo escape from morphemes in morphologicalprocessingrdquo Language and Cognitive Processes vol 28 no 7 pp905ndash916 2013

[92] M Bozic W D Marslen-Wilson E A Stamatakis M HDavis and L K Tyler ldquoDifferentiating morphology formand meaning Neural correlates of morphological complexityrdquoCognitive Neuroscience vol 19 no 9 pp 1464ndash1475 2007

[93] R F Port andA P Leary ldquoAgainst formal phonologyrdquoLanguagevol 81 no 4 pp 927ndash964 2005

[94] J P Blevins ldquoStems and paradigmsrdquo Language vol 79 no 4 pp737ndash767 2003

[95] R Forsyth and D Holmes ldquoFeature-finding for test classifica-tionrdquo Literary and Linguistic Computing vol 11 no 4 pp 163ndash174 1996

[96] H Pham and R H Baayen ldquoVietnamese compounds showan anti-frequency effect in visual lexical decisionrdquo LanguageCognition and Neuroscience vol 30 no 9 pp 1077ndash1095 2015

[97] L Cohen and S Dehaene ldquoVentral and dorsal contributions toword readingrdquo inTheCognitive Neurosciences M S GazzanigaEd pp 789ndash804 Massachusetts Institute of Technology 2009

[98] E Keuleers P Lacey K Rastle and M Brysbaert ldquoThe BritishLexicon Project Lexical decision data for 28730 monosyllabicand disyllabic English wordsrdquo Behavior Research Methods vol44 no 1 pp 287ndash304 2012

[99] R K Wagner J K Torgesen and C A Rashotte ldquoDevel-opment of reading-related phonological processing abilitiesnew evidence of bidirectional causality from a latent variablelongitudinal studyrdquoDevelopmental Psychology vol 30 no 1 pp73ndash87 1994

[100] K F EWong andH-C Chen ldquoOrthographic and phonologicalprocessing in reading Chinese text Evidence from eye fixa-tionsrdquo Language and Cognitive Processes vol 14 no 5-6 pp461ndash480 1999

[101] R L Newman D Jared and C A Haigh ldquoDoes phonologyplay a role when skilled readers read high-frequency wordsEvidence from ERPsrdquo Language and Cognitive Processes vol 27no 9 pp 1361ndash1384 2012

[102] D Jared J Ashby S J Agauas and B A Levy ldquoPhonologicalactivation of word meanings in grade 5 readersrdquo Journal ofExperimental Psychology LearningMemory andCognition vol42 no 4 pp 524ndash541 2016

[103] T Bitan A Kaftory A Meiri-Leib Z Eviatar and O PelegldquoPhonological ambiguity modulates resolution of semanticambiguity during reading An fMRI study of Hebrewrdquo Neu-ropsychology vol 31 no 7 pp 759ndash777 2017

[104] D Jared and S Bainbridge ldquoReading homophone puns Evi-dence from eye trackingrdquo Canadian Journal of ExperimentalPsychology vol 71 no 1 pp 2ndash13 2017

[105] S Amenta M Marelli and S Sulpizio ldquoFrom sound tomeaning Phonology-to-Semantics mapping in visual wordrecognitionrdquo Psychonomic Bulletin amp Review vol 24 no 3 pp887ndash893 2017

[106] M Perrone-Bertolotti J Kujala J R Vidal et al ldquoHow silent issilent reading intracerebral evidence for top-down activationof temporal voice areas during readingrdquo The Journal of Neuro-science vol 32 no 49 pp 17554ndash17562 2012

[107] B Yao P Belin andC Scheepers ldquoSilent reading of direct versusindirect speech activates voice-selective areas in the auditorycortexrdquo Journal of Cognitive Neuroscience vol 23 no 10 pp3146ndash3152 2011

[108] M Coltheart B Curtis P Atkins and M Haller ldquoMod-els of Reading Aloud Dual-Route and Parallel-Distributed-Processing Approachesrdquo Psychological Review vol 100 no 4pp 589ndash608 1993

[109] M Coltheart ldquoModeling reading The dual-route approachrdquo inThe Science of Reading A Handbook pp 6ndash23 2005

38 Complexity

[110] M Pitt K Johnson E Hume S Kiesling and W RaymondldquoThe Buckeye corpus of conversational speech Labeling con-ventions and a test of transcriber reliabilityrdquo Speech Communi-cation vol 45 no 1 pp 89ndash95 2005

[111] K Johnson ldquoMassive reduction in conversational AmericanEnglishrdquo in Proceedings of the Spontaneous Speech Data andAnalysis Proceedings of the 1st Session of the 10th InternationalSymposium pp 29ndash54 The National International Institute forJapanese Language Tokyo Japan 2004

[112] C Phillips ldquoLevels of representation in the electrophysiology ofspeech perceptionrdquo Cognitive Science vol 25 no 5 pp 711ndash7312001

[113] R L Diehl A J Lotto and L L Holt ldquoSpeech perceptionrdquoAnnual Review of Psychology vol 55 pp 149ndash179 2004

[114] D Norris and J M McQueen ldquoShortlist B a Bayesian model ofcontinuous speech recognitionrdquo Psychological Review vol 115no 2 pp 357ndash395 2008

[115] S Hawkins ldquoRoles and representations of systematic finephonetic detail in speech understandingrdquo Journal of Phoneticsvol 31 no 3-4 pp 373ndash405 2003

[116] CCucchiarini andH Strik ldquoAutomatic Phonetic TranscriptionAn overviewrdquo in Proceedings of the 15th ICPhS pp 347ndash350Barcelona Spain 2003

[117] K JohnsonTheAuditoryPerceptual Basis for Speech Segmenta-tion Ohio State University Working Papers in Linguistics 1997

[118] H Fletcher ldquoAuditory patternsrdquo Reviews of Modern Physics vol12 no 1 pp 47ndash65 1940

[119] D Arnold ldquoAcousticndlcoder Coding sound files for use withndl R package version 101rdquo 2017

[120] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2016

[121] J Pickett and I Pollack ldquoIntelligibility of excerpts from fluentspeech Effects of rate of utterance and duration of excerptrdquoLanguage and Speech vol 6 pp 151ndash164 1963

[122] L Shockey ldquoPerception of reduced forms by non-native speak-ers of Englishrdquo in Sound Patterns of Spontaneous Speech DDuez Ed pp 97ndash100 ESCA Aix 1998

[123] M Ernestus H Baayen and R Schreuder ldquoThe recognition ofreduced word formsrdquo Brain and Language vol 81 no 1-3 pp162ndash173 2002

[124] R H Baayen C Shaoul J Willits and M Ramscar ldquoCom-prehension without segmentation a proof of concept withnaive discriminative learningrdquo Language Cognition and Neu-roscience vol 31 no 1 pp 106ndash128 2016

[125] C P Browman and L Goldstein ldquoArticulatory phonology Anoverviewrdquo Phonetica vol 49 no 3-4 pp 155ndash180 1992

[126] J A Tourville and F H Guenther ldquoThe DIVA model A neuraltheory of speech acquisition and productionrdquo Language andCognitive Processes vol 26 no 7 pp 952ndash981 2011

[127] G Csardi and T Nepusz ldquoThe igraph software package forcomplex network researchrdquo InterJournal Complex Systems vol1695 no 5 pp 1ndash9 2006

[128] M Ernestus and R H Baayen ldquoPredicting the unpredictableInterpreting neutralized segments in Dutchrdquo Language vol 79no 1 pp 5ndash38 2003

[129] R Weingarten G Nottbusch and U Will ldquoMorphemes sylla-bles and graphemes in written word productionrdquo inMultidisci-plinary Approaches to Speech Production T Pechmann and CHabel Eds pp 529ndash572 Mouton de Gruyter Berlin Germany2004

[130] R Bertram F E Toslashnnessen S Stromqvist J Hyona andP Niemi ldquoCascaded processing in written compound wordproductionrdquo Frontiers in Human Neuroscience vol 9 pp 1ndash102015

[131] T Cho ldquoEffects of morpheme boundaries on intergesturaltiming Evidence from Koreanrdquo Phonetica vol 58 no 3 pp129ndash162 2001

[132] S N Wood Generalized Additive Models Chapman ampHallCRC NY USA 2017

[133] M Seidenberg ldquoSublexical structures in visual word recogni-tion Access units or orthographic redundancyrdquo in Attentionand Performance XII M Coltheart Ed pp 245ndash264 LawrenceErlbaum Associates Hove 1987

[134] J M McQueen ldquoSegmentation of continuous speech usingphonotacticsrdquo Journal of Memory and Language vol 39 no 1pp 21ndash46 1998

[135] J B Hay Causes and Consequences of Word Structure Rout-ledge NY USA London UK 2003

[136] J B Hay ldquoFrom speech perception to morphology Affix-ordering revisitedrdquo Language vol 78 pp 527ndash555 2002

[137] J B Hay and R H Baayen ldquoPhonotactics parsing and produc-tivityrdquo Italian Journal of Linguistics vol 15 no 1 pp 99ndash1302003

[138] J Hay J Pierrehumbert and M Beckman ldquoSpeech perceptionwell-formedness and the statistics of the lexiconrdquo Papers inlaboratory phonology VI pp 58ndash74 2004

[139] M Ferro C Marzi and V Pirrelli ldquoA self-organizing modelof word storage and processing implications for morphologylearningrdquo Lingue e Linguaggio vol 10 no 2 pp 209ndash226 2011

[140] F ChersiM Ferro G Pezzulo andV Pirrelli ldquoTopological self-organization and prediction learning support both action andlexical chains in the brainrdquo Topics in Cognitive Science vol 6no 3 pp 476ndash491 2014

[141] V Pirrelli M Ferro and C Marzi ldquoComputational complexityof abstractive morphologyrdquo in Understanding and MeasuringMorphological Complexity M Bearman D Brown and G GCorbett Eds pp 141ndash166 Oxford University Press 2015

[142] C Marzi M Ferro and V Pirrelli ldquoIs inflectional irregularitydysfunctional to human processingrdquo in Abstract Booklet TheMental Lexicon 2018 V Kuperman Ed University of AlbertaEdmonton 2018

[143] E Keuleers M Stevens P Mandera and M Brysbaert ldquoWordknowledge in the crowd Measuring vocabulary size and wordprevalence in a massive online experimentrdquo The QuarterlyJournal of Experimental Psychology vol 68 no 8 pp 1665ndash16922015

[144] M Ernestus Voice Assimilation And Segment Reduction inCasual Dutch A Corpus-Based Study of the Phonology-PhoneticsInterface LOT Utrecht the Netherlands 2000

[145] R Kemps M Ernestus R Schreuder and H Baayen ldquoProcess-ing reducedword formsThe suffix restoration effectrdquoBrain andLanguage vol 90 no 1-3 pp 117ndash127 2004

[146] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) vol 1 pp 886ndash893 June 2005

[147] G Hickok ldquoThe architecture of speech production and the roleof the phoneme in speech processingrdquo Language Cognition andNeuroscience vol 29 no 1 pp 2ndash20 2014

[148] B V Tucker M Sims and R H Baayen Opposing Forceson Acoustic Duration Manuscript University of Alberta andUniversity of Tubingen 2018

Complexity 39

[149] I Plag J Homann and G Kunter ldquoHomophony and mor-phology The acoustics of word-final S in Englishrdquo Journal ofLinguistics vol 53 no 1 pp 181ndash216 2017

[150] F Tomaschek I Plag M Ernestus and R H BaayenModeling the Duration of Word-Final s in English withNaive Discriminative Learning Manuscript University ofSiegenTubingenNijmegen 2018

[151] R E Kalman ldquoA new approach to linear filtering and predictionproblemsrdquo Journal of Fluids Engineering vol 82 no 1 pp 35ndash451960

[152] B Russell ldquoOn denotingrdquo Mind vol 14 no 56 pp 479ndash4931905

[153] N Hornstein Logical Form FromGB toMinimalism Blackwell1995

[154] R Jackendoff Semantic StructuresMITPress Cambridge 1990[155] J J McCarthy ldquoA prosodic theory of non-concatenative mor-

phologyrdquo Linguistic Inquiry vol 12 pp 373ndash418 1981[156] A Ussishkin ldquoA Fixed Prosodic Theory of Nonconcatenative

Templaticmorphologyrdquo Natural Language amp Linguistic Theoryvol 23 no 1 pp 169ndash218 2005

[157] A Ussishkin ldquoAffix-favored contrast inequity and psycholin-guistic grounding for non-concatenative morphologyrdquo Mor-phology vol 16 no 1 pp 107ndash125 2006

[158] RW Young andWMorganTheNavajo Language A Grammarand Colloquial Dictionary University of New Mexico Press1980

[159] S C Levinson and A Majid ldquoThe island of time Yelı Dnye thelanguage of Rossel islandrdquo Frontiers in psychology vol 4 2013

[160] S C Levinson ldquoThe language of space in Yelı Dnyerdquo inGrammars of Space Explorations in Cognitive Diversity pp 157ndash203 Cambridge University Press 2006

[161] E Shafaei-Bajestan and R H Baayen ldquoWide Learning forAuditory Comprehensionrdquo in Proceedings of the Interspeech2018 2018

[162] J Hay and K Drager ldquoStuffed toys and speech perceptionrdquoLinguistics vol 48 no 4 pp 865ndash892 2010

[163] V Henri and C Henri ldquoOn our earliest recollections ofchildhoodrdquo Psychological Review vol 2 pp 215-216 1895

[164] S Freud ldquoChildhood and concealing memoriesrdquo in The BasicWritings of Sigmund Freud The Modern Library NY USA19051953

[165] P J Bauer and M Larkina ldquoThe onset of childhood amnesiain childhood A prospective investigation of the course anddeterminants of forgetting of early-life eventsrdquoMemory vol 22no 8 pp 907ndash924 2014

[166] M S Seidenberg and L M Gonnerman ldquoExplaining deriva-tional morphology as the convergence of codesrdquo Trends inCognitive Sciences vol 4 no 9 pp 353ndash361 2000

[167] M Coltheart K Rastle C Perry R Langdon and J ZieglerldquoDRC a dual route cascaded model of visual word recognitionand reading aloudrdquoPsychological Review vol 108 no 1 pp 204ndash256 2001

[168] M V Butz and E F Kutter How the Mind Comes into BeingIntroducing Cognitive Science from a Functional and Computa-tional Perspective Oxford University Press 2016

[169] M W Harm and M S Seidenberg ldquoPhonology readingacquisition and dyslexia Insights from connectionist modelsrdquoPsychological Review vol 106 no 3 pp 491ndash528 1999

[170] L M Gonnerman M S Seidenberg and E S AndersenldquoGraded semantic and phonological similarity effects in prim-ing Evidence for a distributed connectionist approach to

morphologyrdquo Journal of Experimental Psychology General vol136 no 2 pp 323ndash345 2007

[171] H Zhao O Gallo I Frosio and J Kautz ldquoLoss Functions forImage Restoration With Neural Networksrdquo IEEE Transactionson Computational Imaging vol 3 no 1 pp 47ndash57 2017

[172] C Clarke and P Luce ldquoPerceptual adaptation to speakercharacteristics Vot boundaries in stop voicing categorizationrdquoin Proceedings of the ISCA Workshop on Plasticity in SpeechPerception 2005

[173] DNorris JMMcQueen andACutler ldquoPerceptual learning inspeechrdquo Cognitive Psychology vol 47 no 2 pp 204ndash238 2003

[174] A Schweitzer and N Lewandowski ldquoSocial factors in conver-gence of f1 and f2 in spontaneous speechrdquo in Proceedings ofthe 10th International Seminar on Speech Production CologneGermany 2014

[175] M J Pickering and S Garrod ldquoToward a mechanistic psychol-ogy of dialoguerdquo Behavioral and Brain Sciences vol 27 no 2 pp169ndash190 2004

[176] C M Clarke and M F Garrett ldquoRapid adaptation to foreign-accented Englishrdquo The Journal of the Acoustical Society ofAmerica vol 116 no 6 pp 3647ndash3658 2004

[177] W J Levelt ldquoDie konnektionistische moderdquo Sprache UndKognition vol 10 no 2 pp 61ndash72 1991

[178] A Kostic ldquoInformational load constraints on processinginflected morphologyrdquo in Morphological Aspects of LanguageProcessing L B Feldman Ed Lawrence Erlbaum Inc Publish-ers New Jersey USA 1995

[179] J P BlevinsWord andParadigmMorphology OxfordUniversityPress 2016

[180] E Forstemann Uber Deutsche volksetymologie Zeitschrift furvergleichende Sprachforschung auf dem Gebiete des DeutschenGriechischen und Lateinischen 1852

[181] K Nault Morphological Therapy Protocal [Ph D thesis] Uni-versity of Alberta Edmonton 2010

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 6: The Discriminative Lexicon: A Unified Computational Model ...downloads.hindawi.com/journals/complexity/2019/4895891.pdf · as developed within computational linguistics and com-puter

6 Complexity

two distinct proportional analogies one analogy for goingfrom form to meaning and a second analogy for going frommeaning to form Consider for instance a semantic matrix Sand a form matrix C with rows representing words

S = (1198861 11988621198871 1198872)C = (1199011 11990121199021 1199022) (9)

The transformation matrix G that maps S onto C111988611198872 minus 11988621198871 ( 1198872 minus1198862minus1198871 1198861 )⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟Sminus1

(1199011 11990121199022 1199022)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟C= 111988611198872 minus 11988621198871 [( 11988721199011 11988721199012minus11988711199011 minus11988711199012) + (minus11988621199021 minus1198862119902211988611199021 11988611199022 )]⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟

G

(10)

can be written as the sum of two matrices (both of whichare scaled by 11988611198872 minus 11988621198871) The first matrix takes the firstform vector of C and weights it by the elements of thesecond semantic vector The second matrix takes the secondform vector of C and weights this by the elements of thefirst semantic vector of S Thus with this linear mapping apredicted form vector is a semantically weighted mixture ofthe form vectors of C

The remainder of this paper is structured as follows Wefirst introduce the semantic vector space S that we will beusing which we derived from the tasa corpus [63 64] Nextwe show that we canmodel word comprehension bymeans ofa matrix (or network) F that transforms the cue row vectorsof C into the semantic row vectors of S ie CF = S We thenshow that we can model the production of word forms giventheir semantics by a transformation matrix G ie SG = CwhereC is a matrix specifying for each word which triphonesit contains As the network is informative only about whichtriphones are at issue for a word but remains silent abouttheir order an algorithm building on graph theory is usedto properly order the triphones All models are evaluated ontheir accuracy and are also tested against experimental data

In the general discussion we will reflect on the conse-quences for linguistic theory of our finding that it is indeedpossible to model the lexicon and lexical processing usingsystemic discriminative learning without requiring lsquohiddenunitsrsquo such as morphemes and stems

3 A Semantic Vector Space Derived fromthe TASA Corpus

The first part of this section describes how we constructedsemantic vectors Specific to our approach is that we con-structed semantic vectors not only for content words butalso for inflectional and derivational functions The semanticvectors for these functions are of especial importance forthe speech production component of the model The second

part of this section addresses the validation of the semanticvectors for which we used paired associate learning scoressemantic relatedness ratings and semantic plausibility andsemantic transparency ratings for derived words

31 Constructing the Semantic Vector Space The semanticvector space that we use in this study is derived from thetasa corpus [63 64] We worked with 752130 sentencesfrom this corpus to a total of 10719386 word tokens Weused the treetagger software [65] to obtain for eachword token its stem and a part of speech tag Compoundswritten with a space such as apple pie and jigsaw puzzlewere when listed in the celex lexical database [66] joinedinto one onomasiological unit Information about wordsrsquomorphological structure was retrieved from celex

In computational morphology several options have beenexplored for representing themeaning of affixesMitchell andLapata [67] proposed to model the meaning of an affix as avector that when added to the vector b of a base word givesthe vector d of the derived word They estimated this vectorby calculating d119894 minus b119894 for all available pairs 119894 of base andderived words and taking the average of the resulting vectorsLazaridou et al [68] andMarelli and Baroni [60]modeled thesemantics of affixes by means of matrix operations that takeb as input and produce d as output ie

d = Mb (11)

whereas Cotterell et al [69] constructed semantic vectorsas latent variables in a Gaussian graphical model Whatthese approaches have in common is that they derive orimpute semantic vectors given the semantic vectors of wordsproduced by algorithms such as word2vec algorithmswhichwork with (stemmed) words as input units

From the perspective of discriminative linguistics how-ever it does not make sense to derive one meaning fromanother Furthermore rather than letting writing conven-tions dictate what units are accepted as input to onersquosfavorite tool for constructing semantic vectors includingnot and again but excluding the prefixes un- in- and re-we included not only lexomes for content words but alsolexomes for prefixes and suffixes as units for constructinga semantic vector space As a result semantic vectors forprefixes and suffixes are obtained straightforwardly togetherwith semantic vectors for content words

To illustrate this procedure consider the sentence theboysrsquo happiness was great to observe Standard methods willapply stemming and remove stop words resulting in thelexomes boy happiness great and observe being theinput to the algorithm constructing semantic vectors fromlexical co-occurrences In our approach by contrast thelexomes considered are the boy happiness be greatto and observe and in addition plural ness past Stopwords are retained (although including function words mayseem counterproductive from the perspective of semanticvectors in natural language processing applications they areretained in the present approach for two reasons First sincein our model semantic vectors drive speech production inorder to model the production of function words semantic

Complexity 7

vectors for function words are required Second althoughthe semantic vectors of function words typically are lessinformative (they typically have small association strengthsto very large numbers of words) they still show structureas illustrated in Appendix C for pronouns and prepositions)inflectional endings are replaced by the inflectional functionsthey subserve and for derived words the semantic functionof the derivational suffix is identified as well In what followswe outline in more detail what considerations motivate thisway of constructing the input for our algorithm constructingsemantic vectors We note here that our method for handlingmorphologically complex words can be combined with anyof the currently available algorithms for building semanticvectors We also note here that we are not concerned withthe question of how lexomes for plurality or tense might beinduced from word forms from world knowledge or anycombination of the two All we assume is first that anyoneunderstanding the sentence the boysrsquo happiness was great toobserve knows that more than one boy is involved and thatthe narrator situates the event in the past Second we takethis understanding to drive learning

We distinguished the following seven inflectional func-tions comparative and superlative for adjectives sin-gular and plural for nouns and past perfective con-tinuous persistence (denotes the persistent relevance ofthe predicate in sentences such as London is a big city) andperson3 (third person singular) for verbs

The semantics of derived words tend to be characterizedby idiosyncracies [70] that can be accompanied by formalidiosyncracies (eg business and resound) but often areformally unremarkable (eg heady with meanings as diverseas intoxicating exhilerating impetuous and prudent) Wetherefore paired derived words with their own content lex-omes as well as with a lexome for the derivational functionexpressed in the derivedwordWe implemented the followingderivational lexomes ordinal for ordinal numbers notfor negative un and in undo for reversative un other fornonnegative in and its allomorphs ion for ation ution itionand ion ee for ee agent for agents with er instrumentfor instruments with er impagent for impersonal agentswith er causer for words with er expressing causers (thedifferentiation of different semantic functions for er wasbased on manual annotation of the list of forms with er theassignment of a given form to a category was not informedby sentence context and hence can be incorrect) again forre and ness ity ism ist ic able ive ous ize enceful ish under sub self over out mis and dis for thecorresponding prefixes and suffixes This set-up of lexomes isinformed by the literature on the semantics of English wordformation [71ndash73] and targets affixal semantics irrespective ofaffixal forms (following [74])

This way of setting up the lexomes for the semantic vectorspace model illustrates an important aspect of the presentapproach namely that lexomes target what is understoodand not particular linguistic forms Lexomes are not unitsof form For example in the study of Geeraert et al [75]the forms die pass away and kick the bucket are all linkedto the same lexome die In the present study we have notattempted to identify idioms and likewise no attempt was

made to disambiguate homographs These are targets forfurther research

In the present study we did implement the classicaldistinction between inflection and word formation by rep-resenting inflected words with a content lexome for theirstem but derived words with a content lexome for the derivedword itself In this way the sentence scientists believe thatexposure to shortwave ultraviolet rays can cause blindnessis associated with the following lexomes scientist plbelieve persistence that exposure sg to shortwaveultraviolet ray can present cause blindness andness In our model sentences constitute the learning eventsie for each sentence we train the model to predict thelexomes in that sentence from the very same lexomes inthat sentence Sentences or for spoken corpora utterancesare more ecologically valid than windows of say two wordspreceeding and two words following a target word mdash theinterpretation of a word may depend on the company thatit keeps at long distances in a sentence We also did notremove any function words contrary to standard practicein distributional semantics (see Appendix C for a heatmapillustrating the kind of clustering observable for pronounsand prepositions) The inclusion of semantic vectors for bothaffixes and function words is necessitated by the generalframework of our model which addresses not only com-prehension but also production and which in the case ofcomprehension needs to move beyond lexical identificationas inflectional functions play an important role during theintegration of words in sentence and discourse

Only words with a frequency exceeding 8 occurrences inthe tasa corpus were assigned lexomes This threshold wasimposed in order to avoid excessive data sparseness whenconstructing the distributional vector space The number ofdifferent lexomes that met the criterion for inclusion was23562

We constructed a semantic vector space by training anndl network on the tasa corpus using the ndl2 packagefor R [76] Weights on lexome-to-lexome connections wererecalibrated sentence by sentence in the order in which theyappear in the tasa corpus using the learning rule of ndl(ie a simplified version of the learning rule of Rescorlaand Wagner [43] that has only two free parameters themaximum amount of learning 120582 (set to 1) and a learning rate120588 (set to 0001) This resulted in a 23562 times 23562 matrixSentence-based training keeps the carbon footprint of themodel down as the number of learning events is restrictedto the number of utterances (approximately 750000) ratherthan the number of word tokens (approximately 10000000)The row vectors of the resulting matrix henceforth S are thesemantic vectors that we will use in the remainder of thisstudy (work is in progress to further enhance the S matrixby including word sense disambiguation and named entityrecognition when setting up lexomes The resulting lexomicversion of the TASA corpus and scripts (in python as wellas in R) for deriving the S matrix will be made available athttpwwwsfsuni-tuebingendesimhbaayen)

The weights on themain diagonal of thematrix tend to behigh unsurprisingly as in each sentence on which the modelis trained each of the words in that sentence is an excellent

8 Complexity

predictor of that sameword occurring in that sentenceWhenthe focus is on semantic similarity it is useful to set the maindiagonal of this matrix to zero However for more generalassociation strengths between words the diagonal elementsare informative and should be retained

Unlike standard models of distributional semantics wedid not carry out any dimension reduction using forinstance singular value decomposition Because we do notwork with latent variables each dimension of our semanticspace is given by a column vector of S and hence eachdimension is linked to a specific lexomeThus the rowvectorsof S specify for each of the lexomes how well this lexomediscriminates between all the lexomes known to the model

Although semantic vectors of length 23562 can providegood results vectors of this length can be challenging forstatistical evaluation Fortunately it turns out that many ofthe column vectors of S are characterized by extremely smallvariance Such columns can be removed from the matrixwithout loss of accuracy In practice we have found it sufficestoworkwith approximately 4 to 5 thousand columns selectedcalculating column variances and using only those columnswith a variance exceeding a threshold value

32 Validation of the Semantic Vector Space As shown byBaayen et al [59] and Milin et al [58 77] measures basedon matrices such as S are predictive for behavioral measuressuch as reaction times in the visual lexical decision taskas well as for self-paced reading latencies In what followswe first validate the semantic vectors of S on two data setsone data set with accuracies in paired associate learningand one dataset with semantic similarity ratings We thenconsidered specifically the validity of semantic vectors forinflectional and derivational functions by focusing first onthe correlational structure of the pertinent semantic vectorsfollowed by an examination of the predictivity of the semanticvectors for semantic plausibility and semantic transparencyratings for derived words

321 PairedAssociate Learning Thepaired associate learning(PAL) task is a widely used test in psychology for evaluatinglearning and memory Participants are given a list of wordpairs tomemorize Subsequently at testing they are given thefirst word and have to recall the second wordThe proportionof correctly recalled words is the accuracy measure on thePAL task Accuracy on the PAL test decreases with age whichhas been attributed to cognitive decline over the lifetimeHowever Ramscar et al [78] and Ramscar et al [79] provideevidence that the test actually measures the accumulationof lexical knowledge In what follows we use the data onPAL performance reported by desRosiers and Ivison [80]We fitted a linear mixed model to accuracy in the PAL taskas a function of the Pearson correlation 119903 of paired wordsrsquosemantic vectors in S (but with weights on the main diagonalincluded and using the 4275 columns with highest variance)with random intercepts for word pairs sex and age as controlvariables and crucially an interaction of 119903 by age Given thefindings of Ramscar and colleagues we expect to find that theslope of 119903 (which is always negative) as a predictor of PAL

00 01 02 03 04 05r

010

2030

4050

simila

rity

ratin

g

Figure 1 Similarity rating in the MEN dataset as a function of thecorrelation 119903 between the row vectors in S of the words in the testpairs

accuracy increases with age (indicating decreasing accuracy)Table 1 shows that this prediction is born out For age group20ndash29 (the reference level of age) the slope for 119903 is estimatedat 031 For the next age level this slope is adjusted upward by012 and as age increases these upward adjustments likewiseincrease to 021 024 and 032 for age groups 40ndash49 50-59 and 60-69 respectively Older participants know theirlanguage better and hence are more sensitive to the semanticsimilarity (or lack thereof) of thewords thatmake up PAL testpairs For the purposes of the present study the solid supportfor 119903 as a predictor for PAL accuracy contributes to validatingthe row vectors of S as semantic vectors

322 Semantic Relatedness Ratings We also examined per-formance of S on the MEN test collection [81] that providesfor 3000 word pairs crowd sourced ratings of semanticsimilarity For 2267 word pairs semantic vectors are availablein S for both words Figure 1 shows that there is a nonlinearrelation between the correlation of wordsrsquo semantic vectorsin S and the ratings The plot shows as well that for lowcorrelations the variability in the MEN ratings is larger Wefitted a Gaussian location scale additive model to this dataset summarized in Table 2 which supported 119903 as a predictorfor both mean MEN rating and the variability in the MENratings

To put this performance in perspective wecollected the latent semantic analysis (LSA) similarityscores for the MEN word pairs using the website athttplsacoloradoedu The Spearman correlationfor LSA scores andMEN ratings was 0697 and that for 119903 was0704 (both 119901 lt 00001) Thus our semantic vectors performon a par with those of LSA a well-established older techniquethat still enjoys wide use in psycholinguistics Undoubtedlyoptimized techniques from computational linguistics such asword2vec will outperform our model The important pointhere is that even with training on full sentences rather thanusing small windows and even when including function

Complexity 9

Table 1 Linear mixed model fit to paired associate learning scores with the correlation 119903 of the row vectors in S of the paired words aspredictor Treatment coding was used for factors For the youngest age group 119903 is not predictive but all other age groups show increasinglylarge slopes compared to the slope of the youngest age group The range of 119903 is [minus488 minus253] hence larger values for the coefficients of 119903 andits interactions imply worse performance on the PAL task

A parametric coefficients Estimate Std Error t-value p-valueintercept 36220 06822 53096 lt 00001119903 03058 01672 18293 00691age=39 02297 01869 12292 02207age=49 04665 01869 24964 00135age=59 06078 01869 32528 00014age=69 08029 01869 42970 lt 00001sex=male -01074 00230 -46638 lt 00001119903age=39 01167 00458 25490 00117119903age=49 02090 00458 45640 lt 00001119903age=59 02463 00458 53787 lt 00001119903age=69 03239 00458 70735 lt 00001B smooth terms edf Refdf F-value p-valuerandom intercepts word pair 178607 180000 1282283 lt 00001

Table 2 Summary of a Gaussian location scale additive model fitted to the similarity ratings in the MEN dataset with as predictor thecorrelation 119903 of wordrsquos semantic vectors in the Smatrix TPRS thin plate regression spline b theminimum standard deviation for the logb linkfunction Location parameter estimating the mean scale parameter estimating the variance through the logb link function (120578 = log(120590 minus 119887))A parametric coefficients Estimate Std Error t-value p-valueIntercept [location] 252990 01799 1406127 lt 00001Intercept [scale] 21297 00149 1430299 lt 00001B smooth terms edf Refdf F-value p-valueTPRS 119903 [location] 83749 88869 27662399 lt 00001TPRS 119903 [scale] 59333 70476 875384 lt 00001

words performance of our linear network with linguisticallymotivated lexomes is sufficiently high to serve as a basis forfurther analysis

323 Correlational Structure of Inflectional and DerivationalVectors Now that we have established that the semantic vec-tor space S obtainedwith perhaps the simplest possible error-driven learning algorithm discriminating between outcomesgiven multiple cues (1) indeed captures semantic similaritywe next consider how the semantic vectors of inflectional andderivational lexomes cluster in this space Figure 2 presents aheatmap for the correlation matrix of the functional lexomesthat we implemented as described in Section 31

Nominal and verbal inflectional lexomes as well as adver-bial ly cluster in the lower left corner and tend to be eithernot correlated with derivational lexomes (indicated by lightyellow) or to be negatively correlated (more reddish colors)Some inflectional lexomes however are interspersed withderivational lexomes For instance comparative superla-tive and perfective form a small subcluster togetherwithin the group of derivational lexomes

The derivational lexomes show for a majority of pairssmall positive correlationsand stronger correlations in thecase of the verb-forming derivational lexomes mis over

under and undo which among themselves show thevstrongest positive correlations of allThe inflectional lexomescreating abstract nouns ion ence ity and ness also form asubcluster In other words Figure 2 shows that there is somestructure to the distribution of inflectional and derivationallexomes in the semantic vector space

Derived words (but not inflected words) have their owncontent lexomes and hence their own row vectors in S Do thesemantic vectors of these words cluster according to the theirformatives To address this question we extracted the rowvectors from S for a total of 3500 derivedwords for 31 differentformatives resulting in a 3500 times 23562 matrix 119863 sliced outof S To this matrix we added the semantic vectors for the 31formatives We then used linear discriminant analysis (LDA)to predict a lexomersquos formative from its semantic vector usingthe lda function from the MASS package [82] (for LDA towork we had to remove columns from D with very smallvariance as a consequence the dimension of the matrix thatwas the input to LDA was 3531 times 814) LDA accuracy for thisclassification task with 31 possible outcomes was 072 All 31derivational lexomes were correctly classified

When the formatives are randomly permuted thus break-ing the relation between the semantic vectors and theirformatives LDA accuracy was on average 0528 (range across

10 Complexity

SGPR

ESEN

TPE

RSIS

TEN

CE PLCO

NTI

NU

OU

SPA

ST LYG

ENPE

RSO

N3

PASS

IVE

CAN

FUTU

REU

ND

OU

ND

ERO

VER MIS

ISH

SELF IZ

EIN

STRU

MEN

TD

IS IST

OU

GH

T EE ISM

SUB

COM

PARA

TIV

ESU

PERL

ATIV

EPE

RFEC

TIV

EPE

RSO

N1

ION

ENCE IT

YN

ESS Y

SHA

LLO

RDIN

AL

OU

TAG

AIN

FUL

LESS

ABL

EAG

ENT

MEN

T ICO

US

NO

TIV

E

SGPRESENTPERSISTENCEPLCONTINUOUSPASTLYGENPERSON3PASSIVECANFUTUREUNDOUNDEROVERMISISHSELFIZEINSTRUMENTDISISTOUGHTEEISMSUBCOMPARATIVESUPERLATIVEPERFECTIVEPERSON1IONENCEITYNESSYSHALLORDINALOUTAGAINFULLESSABLEAGENTMENTICOUSNOTIVE

minus02 0 01 02Value

020

060

010

00Color Key

and Histogram

Cou

nt

Figure 2 Heatmap for the correlation matrix of the row vectors in S of derivational and inflectional function lexomes (individual words willbecome legible by zooming in on the figure at maximal magnification)

10 permutations 0519ndash0537) From this we conclude thatderived words show more clustering in the semantic space ofS than can be expected under randomness

How are the semantic vectors of derivational lexomespositioned with respect to the clusters of their derived wordsTo address this question we constructed heatmaps for thecorrelation matrices of the pertinent semantic vectors Anexample of such a heatmap is presented for ness in Figure 3Apart from the existence of clusters within the cluster of nesscontent lexomes it is striking that the ness lexome itself isfound at the very left edge of the dendrogram and at the very

left column and bottom row of the heatmapThe color codingindicates that surprisingly the ness derivational lexome isnegatively correlated with almost all content lexomes thathave ness as formative Thus the semantic vector of nessis not a prototype at the center of its cloud of exemplarsbut an antiprototype This vector is close to the cloud ofsemantic vectors but it is outside its periphery This patternis not specific to ness but is found for the other derivationallexomes as well It is intrinsic to our model

The reason for this is straightforward During learningalthough for a derived wordrsquos lexome 119894 and its derivational

Complexity 11

NES

Ssle

eple

ssne

ssas

sert

iven

ess

dire

ctne

ssso

undn

ess

bign

ess

perm

issiv

enes

sap

prop

riate

ness

prom

ptne

sssh

rew

dnes

sw

eakn

ess

dark

ness

wild

erne

ssbl

ackn

ess

good

ness

kind

ness

lone

lines

sfo

rgiv

enes

ssti

llnes

sco

nsci

ousn

ess

happ

ines

ssic

knes

saw

aren

ess

blin

dnes

sbr

ight

ness

wild

ness

mad

ness

grea

tnes

sso

ftnes

sth

ickn

ess

tend

erne

ssbi

ttern

ess

effec

tiven

ess

hard

ness

empt

ines

sw

illin

gnes

sfo

olish

ness

eage

rnes

sne

atne

sspo

liten

ess

fresh

ness

high

ness

nerv

ousn

ess

wet

ness

unea

sines

slo

udne

ssco

olne

ssid

lene

ssdr

ynes

sbl

uene

ssda

mpn

ess

fond

ness

liken

ess

clean

lines

ssa

dnes

ssw

eetn

ess

read

ines

scle

vern

ess

noth

ingn

ess

cold

ness

unha

ppin

ess

serio

usne

ssal

ertn

ess

light

ness

uglin

ess

flatn

ess

aggr

essiv

enes

sse

lfminusco

nsci

ousn

ess

ruth

lessn

ess

quie

tnes

sva

guen

ess

shor

tnes

sbu

sines

sho

lines

sop

enne

ssfit

ness

chee

rful

ness

earn

estn

ess

right

eous

ness

unpl

easa

ntne

ssqu

ickn

ess

corr

ectn

ess

disti

nctiv

enes

sun

cons

ciou

snes

sfie

rcen

ess

orde

rline

ssw

hite

ness

talln

ess

heav

ines

sw

icke

dnes

sto

ughn

ess

lawle

ssne

ssla

zine

ssnu

mbn

ess

sore

ness

toge

ther

ness

resp

onsiv

enes

scle

arne

ssne

arne

ssde

afne

ssill

ness

dizz

ines

sse

lfish

ness

fatn

ess

rest

lessn

ess

shyn

ess

richn

ess

thor

ough

ness

care

less

ness

tired

ness

fairn

ess

wea

rines

stig

htne

ssge

ntle

ness

uniq

uene

ssfr

iend

lines

sva

stnes

sco

mpl

eten

ess

than

kful

ness

slow

ness

drun

kenn

ess

wre

tche

dnes

sfr

ankn

ess

exac

tnes

she

lples

snes

sat

trac

tiven

ess

fulln

ess

roug

hnes

sm

eann

ess

hope

lessn

ess

drow

sines

ssti

ffnes

sclo

sene

ssgl

adne

ssus

eful

ness

firm

ness

rude

ness

bold

ness

shar

pnes

sdu

llnes

sw

eigh

tless

ness

stra

ngen

ess

smoo

thne

ss

NESSsleeplessnessassertivenessdirectnesssoundnessbignesspermissivenessappropriatenesspromptnessshrewdnessweaknessdarknesswildernessblacknessgoodnesskindnesslonelinessforgivenessstillnessconsciousnesshappinesssicknessawarenessblindnessbrightnesswildnessmadnessgreatnesssoftnessthicknesstendernessbitternesseffectivenesshardnessemptinesswillingnessfoolishnesseagernessneatnesspolitenessfreshnesshighnessnervousnesswetnessuneasinessloudnesscoolnessidlenessdrynessbluenessdampnessfondnesslikenesscleanlinesssadnesssweetnessreadinessclevernessnothingnesscoldnessunhappinessseriousnessalertnesslightnessuglinessflatnessaggressivenessselfminusconsciousnessruthlessnessquietnessvaguenessshortnessbusinessholinessopennessfitnesscheerfulnessearnestnessrighteousnessunpleasantnessquicknesscorrectnessdistinctivenessunconsciousnessfiercenessorderlinesswhitenesstallnessheavinesswickednesstoughnesslawlessnesslazinessnumbnesssorenesstogethernessresponsivenessclearnessnearnessdeafnessillnessdizzinessselfishnessfatnessrestlessnessshynessrichnessthoroughnesscarelessnesstirednessfairnesswearinesstightnessgentlenessuniquenessfriendlinessvastnesscompletenessthankfulnessslownessdrunkennesswretchednessfranknessexactnesshelplessnessattractivenessfullnessroughnessmeannesshopelessnessdrowsinessstiffnessclosenessgladnessusefulnessfirmnessrudenessboldnesssharpnessdullnessweightlessnessstrangenesssmoothness

Color Keyand Histogram

0 05minus05

Value

020

0040

0060

0080

00C

ount

Figure 3 Heatmap for the correlation matrix of lexomes for content words with ness as well as the derivational lexome of ness itself Thislexome is found at the very left edge of the dendrograms and is negatively correlated with almost all content lexomes (Individual words willbecome legible by zooming in on the figure at maximal magnification)

lexome ness are co-present cues the derivational lexomeoccurs in many other words 119895 and each time anotherword 119895 is encountered weights are reduced from ness to119894 As this happens for all content lexomes the derivationallexome is during learning slowly but steadily discriminatedaway from its content lexomes We shall see that this is animportant property for our model to capture morphologicalproductivity for comprehension and speech production

When the additive model of Mitchell and Lapata [67]is used to construct a semantic vector for ness ie when

the average vector is computed for the vectors obtained bysubtracting the vector of the derived word from that of thebase word the result is a vector that is embedded insidethe cluster of derived vectors and hence inherits semanticidiosyncracies from all these derived words

324 Semantic Plausibility and Transparency Ratings forDerived Words In order to obtain further evidence for thevalidity of inflectional and derivational lexomes we re-analyzed the semantic plausibility judgements for word pairs

12 Complexity

consisting of a base and a novel derived form (eg accentaccentable) reported byMarelli and Baroni [60] and availableat httpcliccimecunitnitcomposesFRACSSThe number of pairs available to us for analysis 236 wasrestricted compared to the original 2559 pairs due to theconstraints that the base words had to occur in the tasacorpus Furthermore we also explored the role of wordsrsquoemotional valence arousal and dominance as availablein Warriner et al [83] which restricted the number ofitems even further The reason for doing so is that humanjudgements such as those of age of acquisition may reflectdimensions of emotion [59]

A measure derived from the semantic vectors of thederivational lexomes activation diversity and the L1-normof the semantic vector (the sum of the absolute values of thevector elements ie its city-block distance) turned out tobe predictive for these plausibility ratings as documented inTable 3 and the left panel of Figure 4 Activation diversity is ameasure of lexicality [58] In an auditory word identificationtask for instance speech that gives rise to a low activationdiversity elicits fast rejections whereas speech that generateshigh activation diversity elicits higher acceptance rates but atthe cost of longer response times [18]

Activation diversity interacted with word length Agreater word length had a strong positive effect on ratedplausibility but this effect progressively weakened as acti-vation diversity increases In turn activation diversity hada positive effect on plausibility for shorter words and anegative effect for longer words Apparently the evidencefor lexicality that comes with higher L1-norms contributespositively when there is little evidence coming from wordlength As word length increases and the relative contributionof the formative in theword decreases the greater uncertaintythat comes with higher lexicality (a greater L1-norm impliesmore strong links to many other lexomes) has a detrimentaleffect on the ratings Higher arousal scores also contributedto higher perceived plausibility Valence and dominance werenot predictive and were therefore left out of the specificationof themodel reported here (word frequency was not includedas predictor because all derived words are neologisms withzero frequency addition of base frequency did not improvemodel fit 119901 gt 091)

The degree to which a complex word is semanticallytransparent with respect to its base word is of both practi-cal and theoretical interest An evaluation of the semantictransparency of complex words using transformation matri-ces is developed in Marelli and Baroni [60] Within thepresent framework semantic transparency can be examinedstraightforwardly by comparing the correlations of (i) thesemantic vectors of the base word to which the semanticvector of the affix is added with (ii) the semantic vector ofthe derived word itself The more distant the two vectors arethe more negative their correlation should be This is exactlywhat we find For ness for instance the six most negativecorrelations are business (119903 = minus066) effectiveness (119903 = minus051)awareness (119903 = minus050) loneliness (119903 = minus045) sickness (119903 =minus044) and consciousness (119903 = minus043) Although ness canhave an anaphoric function in discourse [84] words such asbusiness and consciousness have a much deeper and richer

semantics than just reference to a previously mentioned stateof affairs A simple comparison of the wordrsquos actual semanticvector in S (derived from the TASA corpus) and its semanticvector obtained by accumulating evidence over base and affix(ie summing the vectors of base and affix) brings this outstraightforwardly

However evaluating correlations for transparency isprone to researcher bias We therefore also investigated towhat extent the semantic transparency ratings collected byLazaridou et al [68] can be predicted from the activationdiversity of the semantic vector of the derivational lexomeThe original dataset of Lazaridou and colleagues comprises900 words of which 330 meet the criterion of occurringmore than 8 times in the tasa corpus and for which wehave semantic vectors available The summary of a Gaussianlocation-scale additivemodel is given in Table 4 and the rightpanel of Figure 4 visualizes the interaction of word lengthby activation diversity Although modulated by word lengthoverall transparency ratings decrease as activation diversityis increased Apparently the stronger the connections aderivational lexome has with other lexomes the less clearit becomes what its actual semantic contribution to a novelderived word is In other words under semantic uncertaintytransparency ratings decrease

4 Comprehension

Now that we have established that the present semanticvectors make sense even though they are based on a smallcorpus we next consider a comprehension model that hasform vectors as input and semantic vectors as output (a pack-age for R implementing the comprehension and productionalgorithms of linear discriminative learning is available athttpwwwsfsuni-tuebingendesimhbaayenpublicationsWpmWithLdl 10targz The package isdescribed in Baayen et al [85]) We begin with introducingthe central concepts underlying mappings from form tomeaning We then discuss visual comprehension and thenturn to auditory comprehension

41 Setting Up the Mapping Let C denote the cue matrix amatrix that specifies for each word (rows) the form cues ofthat word (columns) For a toy lexicon with the words onetwo and three the Cmatrix is

C

= wV wVn Vn tu tu Tr Tri rionetwothree

( 100100

100010

010001

001001 ) (12)

where we use the disc keyboard phonetic alphabet (theldquoDIstinct SingleCharacterrdquo representationwith one characterfor each phoneme was introduced by CELEX [86]) fortriphones the symbol denotes the word boundary Supposethat the semantic vectors for these words are the row vectors

Complexity 13

26 27

28

29

29

3 3

31

31

32

32

33

33

34

34

35

35

3

3

35

4

45 36

38

40

42

44

Activ

atio

n D

iver

sity

of th

e Der

ivat

iona

l Lex

ome

8 10 12 14 16 186Word Length

36

38

40

42

44

Activ

atio

n D

iver

sity

of th

e Der

ivat

iona

l Lex

ome

6 8 10 12 144Word Length

Figure 4 Interaction of word length by activation diversity in GAMs fitted to plausibility ratings for complex words (left) and to semantictransparency ratings (right)

Table 3 GAM fitted to plausibility ratings for derivational neologisms with the derivational lexomes able again agent ist less and notdata from Marelli and Baroni [60]

A parametric coefficients Estimate Std Error t-value p-valueIntercept -18072 27560 -06557 05127Word Length 05901 02113 27931 00057Activation Diversity 11221 06747 16632 00976Arousal 03847 01521 25295 00121Word Length times Activation Diversity -01318 00500 -26369 00089B smooth terms edf Refdf F-value p-valueRandom Intercepts Affix 33610 40000 556723 lt 00001

of the following matrix S

S = one two threeonetwothree

( 100201031001

040110 ) (13)

We are interested in a transformation matrix 119865 such that

CF = S (14)

The transformation matrix is straightforward to obtain LetC1015840 denote the Moore-Penrose generalized inverse of Cavailable in R as the ginv function in theMASS package [82]Then

F = C1015840S (15)

For the present example

F =one two three

wVwVnVntutuTrTriri

((((((((((((

033033033010010003003003

010010010050050003003003

013013013005005033033033

))))))))))))

(16)

and for this simple example CF is exactly equal to SIn the remainder of this section we investigate how well

this very simple end-to-end model performs for visual wordrecognition as well as auditory word recognition For visualword recognition we use the semantic matrix S developed inSection 3 but we consider two different cue matrices C oneusing letter trigrams (following [58]) and one using phonetrigramsA comparison of the performance of the twomodels

14 Complexity

Table 4 Gaussian location-scale additive model fitted to the semantic transparency ratings for derived words te tensor product smooth ands thin plate regression spline

A parametric coefficients Estimate Std Error t-value p-valueintercept [location] 55016 02564 214537 lt 00001intercept [scale] -04135 00401 -103060 lt 00001B smooth terms edf Refdf F-value p-valuete(activation diversity word length) [location] 53879 62897 237798 00008s(derivational lexome) [location] 143180 150000 3806254 lt 00001s(activation diversity) [scale] 21282 25993 284336 lt 00001s(word length) [scale] 17722 21403 1327311 lt 00001

sheds light on the role of wordsrsquo ldquosound imagerdquo on wordrecognition in reading (cf [87ndash89] for phonological effectsin visual word Recognition) For auditory word recognitionwe make use of the acoustic features developed in Arnold etal [18]

Although the features that we selected as cues are basedon domain knowledge they do not have an ontological statusin our framework in contrast to units such as phonemes andmorphemes in standard linguistic theories In these theoriesphonemes and morphemes are Russelian atomic units offormal phonological and syntactic calculi and considerableresearch has been directed towards showing that these unitsare psychologically real For instance speech errors involvingmorphemes have been interpreted as clear evidence for theexistence in the mind of morphemes [90] Research hasbeen directed at finding behavioral and neural correlates ofphonemes and morphemes [6 91 92] ignoring fundamentalcriticisms of both the phoneme and the morpheme astheoretical constructs [25 93 94]

The features that we use for representing aspects of formare heuristic features that have been selected or developedprimarily because they work well as discriminators We willgladly exchange the present features for other features if theseother features can be shown to afford higher discriminativeaccuracy Nevertheless the features that we use are groundedin domain knowledge For instance the letter trigrams usedfor modeling reading (see eg[58]) are motivated by thefinding in stylometry that letter trigrams are outstandingfeatures for discriminating between authorial hands [95] (inwhat follows we work with letter triplets and triphoneswhich are basically contextually enriched letter and phoneunits For languages with strong phonotactic restrictionssuch that the syllable inventory is quite small (eg Viet-namese) digraphs work appear to work better than trigraphs[96] Baayen et al [85] show that working with four-gramsmay enhance performance and current work in progress onmany other languages shows that for highly inflecting lan-guages with long words 4-grams may outperform 3-gramsFor computational simplicity we have not experimented withmixtures of units of different length nor with algorithmswithwhich such units might be determined) Letter n-grams arealso posited by Cohen and Dehaene [97] for the visual wordform system at the higher endof a hierarchy of neurons tunedto increasingly large visual features of words

Triphones the units that we use for representing thelsquoacoustic imagersquo or lsquoauditory verbal imageryrsquo of canonicalword forms have the same discriminative potential as lettertriplets but have as additional advantage that they do betterjustice compared to phonemes to phonetic contextual inter-dependencies such as plosives being differentiated primarilyby formant transitions in adjacent vowels The acousticfeatures that we use for modeling auditory comprehensionto be introduced in more detail below are motivated in partby the sensitivity of specific areas on the cochlea to differentfrequencies in the speech signal

Evaluation of model predictions proceeds by comparingthe predicted semantic vector s obtained by multiplying anobserved cue vector c with the transformation matrix F (s =cF) with the corresponding target row vector s of S A word 119894is counted as correctly recognized when s119894 is most stronglycorrelated with the target semantic vector s119894 of all targetvectors s119895 across all words 119895

Inflected words do not have their own semantic vectors inS We therefore created semantic vectors for inflected wordsby adding the semantic vectors of stem and affix and addedthese as additional row vectors to S before calculating thetransformation matrix F

We note here that there is only one transformationmatrix ie one discrimination network that covers allaffixes inflectional and derivational as well as derived andmonolexomic (simple) words This approach contrasts withthat of Marelli and Baroni [60] who pair every affix with itsown transformation matrix

42 Visual Comprehension For visual comprehension wefirst consider a model straightforwardly mapping form vec-tors onto semantic vectors We then expand the model withan indirect route first mapping form vectors onto the vectorsfor the acoustic image (derived from wordsrsquo triphone repre-sentations) and thenmapping the acoustic image vectors ontothe semantic vectors

The dataset on which we evaluated our models com-prised 3987 monolexomic English words 6595 inflectedvariants of monolexomic words and 898 derived words withmonolexomic base words to a total of 11480 words Thesecounts follow from the simultaneous constraints of (i) a wordappearing with sufficient frequency in TASA (ii) the wordbeing available in the British Lexicon Project (BLP [98]) (iii)

Complexity 15

the word being available in the CELEX database and theword being of the abovementioned morphological type Inthe present study we did not include inflected variants ofderived words nor did we include compounds

421 The Direct Route Straight from Orthography to Seman-tics To model single word reading as gauged by the visuallexical decision task we used letter trigrams as cues In ourdataset there were a total of 3465 unique trigrams resultingin a 11480 times 3465 orthographic cue matrix C The semanticvectors of the monolexomic and derived words were takenfrom the semantic weight matrix described in Section 3 Forinflected words semantic vectors were obtained by summa-tion of the semantic vectors of base words and inflectionalfunctions For the resulting semantic matrix S we retainedthe 5030 column vectors with the highest variances settingthe cutoff value for the minimal variance to 034times10minus7 FromS and C we derived the transformation matrix F which weused to obtain estimates s of the semantic vectors s in S

For 59 of the words s had the highest correlation withthe targeted semantic vector s (for the correctly predictedwords the mean correlation was 083 and the median was086 With respect to the incorrectly predicted words themean and median correlation were both 048) The accuracyobtained with naive discriminative learning using orthog-onal lexomes as outcomes instead of semantic vectors was27Thus as expected performance of linear discriminationlearning (ldl) is substantially better than that of naivediscriminative learning (ndl)

To assess whether this model is productive in the sensethat it canmake sense of novel complex words we considereda separate set of inflected and derived words which were notincluded in the original dataset For an unseen complexwordboth base word and the inflectional or derivational functionappeared in the training set but not the complex word itselfThe network F therefore was presented with a novel formvector which it mapped onto a novel vector in the semanticspace Recognition was successful if this novel vector wasmore strongly correlated with the semantic vector obtainedby summing the semantic vectors of base and inflectional orderivational function than with any other semantic vector

Of 553 unseen inflected words 43 were recognized suc-cessfully The semantic vectors predicted from their trigramsby the network F were overall well correlated in the meanwith the targeted semantic vectors of the novel inflectedwords (obtained by summing the semantic vectors of theirbase and inflectional function) 119903 = 067 119901 lt 00001The predicted semantic vectors also correlated well with thesemantic vectors of the inflectional functions (119903 = 061 119901 lt00001) For the base words the mean correlation dropped to119903 = 028 (119901 lt 00001)

For unseen derivedwords (514 in total) we also calculatedthe predicted semantic vectors from their trigram vectorsThe resulting semantic vectors s had moderate positivecorrelations with the semantic vectors of their base words(119903 = 040 119901 lt 00001) but negative correlations withthe semantic vectors of their derivational functions (119903 =minus013 119901 lt 00001) They did not correlate with the semantic

vectors obtained by summation of the semantic vectors ofbase and affix (119903 = 001 119901 = 056)

The reduced correlations for the derived words as com-pared to those for the inflected words likely reflects tosome extent that the model was trained on many moreinflected words (in all 6595) than derived words (898)whereas the number of different inflectional functions (7)was much reduced compared to the number of derivationalfunctions (24) However the negative correlations of thederivational semantic vectors with the semantic vectors oftheir derivational functions fit well with the observation inSection 323 that derivational functions are antiprototypesthat enter into negative correlations with the semantic vectorsof the corresponding derived words We suspect that theabsence of a correlation of the predicted vectors with thesemantic vectors obtained by integrating over the vectorsof base and derivational function is due to the semanticidiosyncracies that are typical for derivedwords For instanceausterity can denote harsh discipline or simplicity or apolicy of deficit cutting or harshness to the taste Anda worker is not just someone who happens to work butsomeone earning wages or a nonreproductive bee or waspor a thread in a computer program Thus even within theusages of one and the same derived word there is a lotof semantic heterogeneity that stands in stark contrast tothe straightforward and uniform interpretation of inflectedwords

422 The Indirect Route from Orthography via Phonology toSemantics Although reading starts with orthographic inputit has been shown that phonology actually plays a role duringthe reading process as well For instance developmentalstudies indicate that childrenrsquos reading development can bepredicted by their phonological abilities [87 99] Evidence ofthe influence of phonology on adultsrsquo reading has also beenreported [89 100ndash105] As pointed out by Perrone-Bertolottiet al [106] silent reading often involves an imagery speechcomponent we hear our own ldquoinner voicerdquo while readingSince written words produce a vivid auditory experiencealmost effortlessly they argue that auditory verbal imageryshould be part of any neurocognitive model of readingInterestingly in a study using intracranial EEG recordingswith four epileptic neurosurgical patients they observed thatsilent reading elicits auditory processing in the absence ofany auditory stimulation suggesting that auditory images arespontaneously evoked during reading (see also [107])

To explore the role of auditory images in silent readingwe operationalized sound images bymeans of phone trigrams(in our model phone trigrams are independently motivatedas intermediate targets of speech production see Section 5for further discussion) We therefore ran the model againwith phone trigrams instead of letter triplets The semanticmatrix S was exactly the same as above However a newtransformation matrix F was obtained for mapping a 11480times 5929 cue matrix C of phone trigrams onto S To our sur-prise the triphone model outperformed the trigram modelsubstantially Overall accuracy rose to 78 an improvementof almost 20

16 Complexity

In order to integrate this finding into our model weimplemented an additional network K that maps ortho-graphic vectors (coding the presence or absence of trigrams)onto phonological vectors (indicating the presence or absenceof triphones) The result is a dual route model for (silent)reading with a first route utilizing a network mappingstraight from orthography to semantics and a secondindirect route utilizing two networks one mapping fromtrigrams to triphones and the other subsequently mappingfrom triphones to semantics

The network K which maps trigram vectors onto tri-phone vectors provides good support for the relevant tri-phones but does not provide guidance as to their orderingUsing a graph-based algorithm detailed in the Appendix (seealso Section 52) we found that for 92 of the words thecorrect sequence of triphones jointly spanning the wordrsquossequence of letters received maximal support

We then constructed amatrix Twith the triphone vectorspredicted from the trigrams by networkK and defined a newnetwork H mapping these predicted triphone vectors ontothe semantic vectors SThemean correlation for the correctlyrecognized words was 090 and the median correlation was094 For words that were not recognized correctly the meanand median correlation were 045 and 043 respectively Wenow have for each word two estimated semantic vectors avector s1 obtained with network F of the first route and avector s2 obtained with network H from the auditory targetsthat themselves were predicted by the orthographic cuesThemean of the correlations of these pairs of vectors was 119903 = 073(119901 lt 00001)

To assess to what extent the networks that we definedare informative for actual visual word recognition we madeuse of the reaction times in the visual lexical decision taskavailable in the BLPAll 11480words in the current dataset arealso included in BLPWe derived threemeasures from each ofthe two networks

The first measure is the sum of the L1-norms of s1 ands2 to which we will refer as a wordrsquos total activation diversityThe total activation diversity is an estimate of the support fora wordrsquos lexicality as provided by the two routes The secondmeasure is the correlation of s1 and s2 to which we will referas a wordrsquos route congruency The third measure is the L1-norm of a lexomersquos column vector in S to which we referas its prior This last measure has previously been observedto be a strong predictor for lexical decision latencies [59] Awordrsquos prior is an estimate of a wordrsquos entrenchment and prioravailability

We fitted a generalized additive model to the inverse-transformed response latencies in the BLP with these threemeasures as key covariates of interest including as controlpredictors word length andword type (derived inflected andmonolexomic with derived as reference level) As shown inTable 5 inflected words were responded to more slowly thanderived words and the same holds for monolexomic wordsalbeit to a lesser extent Response latencies also increasedwith length Total activation diversity revealed a U-shapedeffect (Figure 5 left panel) For all but the lowest totalactivation diversities we find that response times increase as

total activation diversity increases This result is consistentwith previous findings for auditory word recognition [18]As expected given the results of Baayen et al [59] the priorwas a strong predictor with larger priors affording shorterresponse times There was a small effect of route congruencywhich emerged in a nonlinear interaction with the priorsuch that for large priors a greater route congruency affordedfurther reduction in response time (Figure 5 right panel)Apparently when the two routes converge on the samesemantic vector uncertainty is reduced and a faster responsecan be initiated

For comparison the dual-route model was implementedwith ndl as well Three measures were derived from thenetworks and used to predict the same response latenciesThese measures include activation activation diversity andprior all of which have been reported to be reliable predictorsfor RT in visual lexical decision [54 58 59] However sincehere we are dealing with two routes the three measures canbe independently derived from both routes We thereforesummed up the measures of the two routes obtaining threemeasures total activation total activation diversity and totalprior With word type and word length as control factorsTable 6 shows that thesemeasures participated in a three-wayinteraction presented in Figure 6 Total activation showed aU-shaped effect on RT that is increasingly attenuated as totalactivation diversity is increased (left panel) Total activationalso interacted with the total prior (right panel) For mediumrange values of total activation RTs increased with totalactivation and as expected decreased with the total priorThecenter panel shows that RTs decrease with total prior formostof the range of total activation diversity and that the effect oftotal activation diversity changes sign going from low to highvalues of the total prior

Model comparison revealed that the generalized additivemodel with ldl measures (Table 5) provides a substantiallyimproved fit to the data compared to the GAM using ndlmeasures (Table 6) with a difference of no less than 65101AIC units At the same time the GAM based on ldl is lesscomplex

Given the substantially better predictability of ldl mea-sures on human behavioral data one remaining question iswhether it is really the case that two routes are involved insilent reading After all the superior accuracy of the secondroute might be an artefact of a simple machine learningtechnique performing better for triphones than for trigramsThis question can be addressed by examining whether thefit of the GAM summarized in Table 5 improves or worsensdepending on whether the activation diversity of the first orthe second route is taken out of commission

When the GAM is provided access to just the activationdiversity of s2 the AIC increased by 10059 units Howeverwhen themodel is based only on the activation diversity of s1themodel fit increased by no less than 14790 AIC units Fromthis we conclude that at least for the visual lexical decisionlatencies in the BLP the second route first mapping trigramsto triphones and then mapping triphones onto semanticvectors plays the more important role

The superiority of the second route may in part bedue to the number of triphone features being larger

Complexity 17

minus0295

minus00175

026

part

ial e

ffect

1 2 3 4 50Total activation diversity

000

005

010

015

Part

ial e

ffect

05

10

15

20

Prio

r

02 04 06 08 1000Selfminuscorrelation

Figure 5 The partial effects of total activation diversity (left) and the interaction of route congruency and prior (right) on RT in the BritishLexicon Project

Table 5 Summary of a generalized additive model fitted to response latencies in visual lexical decision using measures based on ldl s thinplate regression spline smooth te tensor product smooth

A parametric coefficients Estimate Std Error t-value p-valueintercept -17774 00087 -205414 lt0001word typeInflected 01110 00059 18742 lt0001word typeMonomorphemic 00233 00054 4322 lt0001word length 00117 00011 10981 lt0001B smooth terms edf Refdf F p-values(total activation diversity) 6002 7222 236 lt0001te(route congruency prior) 14673 17850 2137 lt0001

02 06 10 14

NDL total activation

minus0171

0031

0233

part

ial e

ffect

minus4 minus2 0 2 4

10

15

20

25

30

35

40

NDL total prior

ND

L to

tal a

ctiv

atio

n di

vers

ity

minus0416

01915

0799

part

ial e

ffect

02 06 10 14

minus4

minus2

02

4

NDL total activation

ND

L to

tal p

rior

minus0158

01635

0485

part

ial e

ffect

10

15

20

25

30

35

40

ND

L to

tal a

ctiv

atio

n di

vers

ity

Figure 6The interaction of total activation and total activation diversity (left) of total prior by total activation diversity (center) and of totalactivation by total prior (right) on RT in the British Lexicon Project based on ndl

18 Complexity

Table 6 Summary of a generalized additive model fitted to response latencies in visual lexical decision using measures from ndl te tensorproduct smooth

A parametric coefficients Estimate Std Error t-value p-valueintercept -16934 00086 -195933 lt0001word typeInflected 00286 00052 5486 lt0001word typeMonomorphemic 00042 00055 0770 = 4word length 00066 00011 5893 lt0001B smooth terms edf Refdf F p-valuete(activation activation diversity prior) 4134 4972 1084 lt0001

than the number of trigram features (3465 trigrams ver-sus 5929 triphones) More features which mathematicallyamount to more predictors enable more precise map-pings Furthermore the heavy use made in English ofletter sequences such as ough (with 10 different pro-nunciations httpswwwdictionarycomesough)reduces semantic discriminability compared to the corre-sponding spoken forms It is noteworthy however that thebenefits of the triphone-to-semantics mapping are possibleonly thanks to the high accuracy with which orthographictrigram vectors are mapped onto phonological triphonevectors (92)

It is unlikely that the lsquophonological routersquo is always dom-inant in silent reading Especially in fast lsquodiagonalrsquo readingthe lsquodirect routersquomay bemore dominantThere is remarkablealthough for the present authors unexpected convergencewith the dual route model for reading aloud of Coltheartet al [108] and Coltheart [109] However while the text-to-phonology route of their model has as primary function toexplain why nonwords can be pronounced our results showthat both routes can actually be active when silently readingreal words A fundamental difference is however that in ourmodel wordsrsquo semantics play a central role

43 Auditory Comprehension For the modeling of readingwe made use of letter trigrams as cues These cues abstractaway from the actual visual patterns that fall on the retinapatterns that are already transformed at the retina beforebeing sent to the visual cortex Our hypothesis is that lettertrigrams represent those high-level cells or cell assemblies inthe visual system that are critical for reading andwe thereforeleave the modeling possibly with deep learning networksof how patterns on the retina are transformed into lettertrigrams for further research

As a consequence of the high level of abstraction of thetrigrams a word form is represented by a unique vectorspecifying which of a fixed set of letter trigrams is present inthe word Although one might consider modeling auditorycomprehension with phone triplets (triphones) replacing theletter trigrams of visual comprehension such an approachwould not do justice to the enormous variability of actualspeech Whereas the modeling of reading printed wordscan depart from the assumption that the pixels of a wordrsquosletters on a computer screen are in a fixed configurationindependently of where the word is shown on the screenthe speech signal of the same word type varies from token

to token as illustrated in Figure 7 for the English wordcrisis A survey of the Buckeye corpus [110] of spontaneousconversations recorded at Columbus Ohio [111] indicatesthat around 5 of the words are spoken with one syllablemissing and that a little over 20 of words have at least onephone missing

It is widely believed that the phoneme as an abstractunit of sound is essential for coming to grips with thehuge variability that characterizes the speech signal [112ndash114]However the phoneme as theoretical linguistic constructis deeply problematic [93] and for many spoken formscanonical phonemes do not do justice to the phonetics ofthe actual sounds [115] Furthermore if words are defined assequences of phones the problem arises what representationsto posit for words with two ormore reduced variants Addingentries for reduced forms to the lexicon turns out not toafford better overal recognition [116] Although exemplarmodels have been put forward to overcome this problem[117] we take a different approach here and following Arnoldet al [18] lay out a discriminative approach to auditorycomprehension

The cues that we make use of to represent the acousticsignal are the Frequency Band Summary Features (FBSFs)introduced by Arnold et al [18] as input cues FBSFs summa-rize the information present in the spectrogram of a speechsignal The algorithm that derives FBSFs first chunks theinput at the minima of the Hilbert amplitude envelope of thesignalrsquos oscillogram (see the upper panel of Figure 8) Foreach chunk the algorithm distinguishes 21 frequency bandsin the MEL scaled spectrum and intensities are discretizedinto 5 levels (lower panel in Figure 8) for small intervalsof time For each chunk and for each frequency band inthese chunks a discrete feature is derived that specifies chunknumber frequency band number and a description of thetemporal variation in the band bringing together minimummaximum median initial and final intensity values The21 frequency bands are inspired by the 21 receptive areason the cochlear membrane that are sensitive to differentranges of frequencies [118] Thus a given FBSF is a proxyfor cell assemblies in the auditory cortex that respond to aparticular pattern of changes over time in spectral intensityThe AcousticNDLCodeR package [119] for R [120] wasemployed to extract the FBSFs from the audio files

We tested ldl on 20 hours of speech sampled from theaudio files of the ucla library broadcast newsscapedata a vast repository of multimodal TV news broadcasts

Complexity 19

0 5000 10000 15000 20000 25000 30000 35000 40000minus6000

minus4000

minus2000

0

2000

4000

6000

8000

0 5000 10000 15000 20000 25000 30000 35000 40000minus6000

minus4000

minus2000

0

2000

4000

6000

8000

Figure 7Oscillogram for twodifferent realizations of theword crisiswith different degrees of reduction in theNewsScape archive [kraız](left)and [khraısıs](right)

00 01 02 03 04 05 06time

minus020minus015minus010minus005

000005010015

ampl

itude

1

21

frequ

ency

ban

d

Figure 8 Oscillogram of the speech signal for a realization of the word economic with Hilbert envelope (in orange) outlining the signal isshown on the top panel The lower panel depicts the discretized MEL scaled spectrum of the signalThe vertical bars are the boundary pointsthat partition the signal into 4 chunks For one chunk horizontal lines (in red) highlight one of the frequency bands for which the FBSFsprovide summaries of the variation over time in that frequency band The FBSF for the highlighted band is ldquoband11-start3-median3-min2-max5-end3-part2rdquo

provided to us by the Distributed Little Red Hen Lab Theaudio files of this resource were automatically classified asclean for relatively clean parts where there is speech withoutbackground noise or music and noisy for speech snippetswhere background noise or music is present Here we reportresults for 20 hours of clean speech to a total of 131673word tokens (representing 4779word types) with in all 40639distinct FBSFs

The FBSFs for the word tokens are brought together ina matrix C119886 with dimensions 131673 audio tokens times 40639FBSFs The targeted semantic vectors are taken from the Smatrix which is expanded to a matrix with 131673 rows onefor each audio token and 4609 columns the dimension of

the semantic vectors Although the transformation matrix Fcould be obtained by calculating C1015840S the calculation of C1015840is numerically expensive To reduce computational costs wecalculated F as follows (the transpose of a square matrix Xdenoted by X119879 is obtained by replacing the upper triangleof the matrix by the lower triangle and vice versa Thus( 3 87 2 )119879 = ( 3 78 2 ) For a nonsquare matrix the transpose isobtained by switching rows and columnsThus a 4times2 matrixbecomes a 2 times 4 matrix when transposed)

CF = S

C119879CF = C119879S

20 Complexity

(C119879C)minus1 (C119879C) F = (C119879C)minus1 C119879SF = (C119879C)minus1 C119879S

(17)

In this way matrix inversion is required for a much smallermatrixC119879C which is a square matrix of size 40639 times 40639

To evaluate model performance we compared the esti-mated semantic vectors with the targeted semantic vectorsusing the Pearson correlation We therefore calculated the131673 times 131673 correlation matrix for all possible pairs ofestimated and targeted semantic vectors Precisionwas calcu-lated by comparing predicted vectors with the gold standardprovided by the targeted vectors Recognition was defined tobe successful if the correlation of the predicted vector withthe targeted gold vector was the highest of all the pairwisecorrelations of this predicted vector with any of the other goldsemantic vectors Precision defined as the proportion of cor-rect recognitions divided by the total number of audio tokenswas at 3361 For the correctly identified words the meancorrelation was 072 and for the incorrectly identified wordsit was 055 To place this in perspective a naive discrimi-nation model with discrete lexomes as output performed at12 and a deep convolution network Mozilla DeepSpeech(httpsgithubcommozillaDeepSpeech based onHannun et al [9]) performed at 6 The low performanceofMozilla DeepSpeech is due primarily due to its dependenceon a languagemodelWhenpresentedwith utterances insteadof single words it performs remarkably well

44 Discussion A problem with naive discriminative learn-ing that has been puzzling for a long time is that measuresbased on ndl performed well as predictors of processingtimes [54 58] whereas accuracy for lexical decisions waslow An accuracy of 27 for a model performing an 11480-classification task is perhaps reasonable but the lack ofprecision is unsatisfactory when the goal is to model humanvisual lexicality decisions Bymoving fromndl to ldlmodelaccuracy is substantially improved (to 59) At the sametime predictions for reaction times improved considerablyas well As lexicality decisions do not require word identifica-tion further improvement in predicting decision behavior isexpected to be possible by considering not only whether thepredicted semantic is closest to the targeted vector but alsomeasures such as how densely the space around the predictedsemantic vector is populated

Accuracy for auditory comprehension is lower for thedata we considered above at around 33 Interestingly aseries of studies indicates that recognizing isolated wordstaken out of running speech is a nontrivial task also forhuman listeners [121ndash123] Correct identification by nativespeakers of 1000 randomly sampled word tokens from aGerman corpus of spontaneous conversational speech rangedbetween 21 and 44 [18] For both human listeners andautomatic speech recognition systems recognition improvesconsiderably when words are presented in their naturalcontext Given that ldl with FBSFs performs very wellon isolated word recognition it seems worth investigating

further whether the present approach can be developed into afully-fledged model of auditory comprehension that can takefull utterances as input For a blueprint of how we plan toimplement such a model see Baayen et al [124]

5 Speech Production

This section examines whether we can predict wordsrsquo formsfrom the semantic vectors of S If this is possible for thepresent dataset with reasonable accuracy we have a proofof concept that discriminative morphology is feasible notonly for comprehension but also for speech productionThe first subsection introduces the computational imple-mentation The next subsection reports on the modelrsquosperformance which is evaluated first for monomorphemicwords then for inflected words and finally for derivedwords Section 53 provides further evidence for the pro-duction network by showing that as the support from thesemantics for the triphones becomes weaker the amount oftime required for articulating the corresponding segmentsincreases

51 Computational Implementation For a productionmodelsome representational format is required for the output thatin turn drives articulation In what follows we make useof triphones as output features Triphones capture part ofthe contextual dependencies that characterize speech andthat render problematic the phoneme as elementary unit ofa phonological calculus [93] Triphones are in many waysnot ideal in that they inherit the limitations that come withdiscrete units Other output features structured along thelines of gestural phonology [125] or time series of movementsof key articulators registered with electromagnetic articulog-raphy or ultrasound are on our list for further explorationFor now we use triphones as a convenience construct andwe will show that given the support from the semantics forthe triphones the sequence of phones can be reconstructedwith high accuracy As some models of speech productionassemble articulatory targets from phone segments (eg[126]) the present implementation can be seen as a front-endfor this type of model

Before putting the model to the test we first clarify theway the model works by means of our toy lexicon with thewords one two and three Above we introduced the semanticmatrix S (13) which we repeat here for convenience

S = one two threeonetwothree

( 100201031001

040110 ) (18)

as well as an C indicator matrix specifying which triphonesoccur in which words (12) As in what follows this matrixspecifies the triphones targeted for productionwe henceforthrefer to this matrix as the Tmatrix

Complexity 21

T

= wV wVn Vn tu tu Tr Tri rionetwothree

( 100100

100010

010001

001001 ) (19)

For production our interest is in thematrixG that transformsthe row vectors of S into the row vectors of T ie we need tosolve

SG = T (20)

Given G we can predict for any semantic vector s the vectorof triphones t that quantifies the support for the triphonesprovided by s simply by multiplying s with G

t = sG (21)

As before the transformation matrix G is straightforwardto estimate Let S1015840 denote the Moore-Penrose generalizedinverse of S Since

S1015840SG = S1015840T (22)

we have

G = S1015840T (23)

Given G we can predict the triphone matrix T from thesemantic matrix S

SG = T (24)

For the present example the inverse of S and S1015840 is

S1015840 = one two threeonetwothree

( 110minus021minus009minus029107minus008

minus041minus002104 ) (25)

and the transformation matrix G is

G = wV wVn Vn tu tu Tr Tri ri

onetwothree

( 110minus021minus009110minus021minus009

110minus021minus009minus029107minus008

minus029107minus008minus041minus002104

minus041minus002104minus041minus002104 ) (26)

For this simple example T is virtually identical to T Forrealistic data T will not be identical to T but will be anapproximation of it that is optimal in the least squares senseThe triphones with the strongest support are expected to bethe most likely to be the triphones defining a wordrsquos form

We made use of the S matrix which we derived fromthe tasa corpus as described in Section 3 The majority ofcolumns of the 23561 times 23561 matrix S show very smalldeviations from zero and hence are uninformative As beforewe reduced the number of columns of S by removing columnswith very low variance Here one option is to remove allcolumns with a variance below a preset threshold 120579 Howeverto avoid adding a threshold as a free parameter we setthe number of columns retained to the number of differenttriphones in a given dataset a number which is around 119899 =4500 In summary S denotes a 119908times119899 matrix that specifies foreach of 119908 words an 119899-dimensional semantic vector

52 Model Performance

521 Performance on Monolexomic Words We first exam-ined model performance on a dataset comprising monolex-omicwords that did not carry any inflectional exponentsThisdataset of 3987words comprised three irregular comparatives

(elder less andmore) two irregular superlatives (least most)28 irregular past tense forms and one irregular past participle(smelt) For this set of words we constructed a 3987 times 4446matrix of semantic vectors S119898 (a submatrix of the S matrixintroduced above in Section 3) and a 3987 times 4446 triphonematrix T119898 (a submatrix of T) The number of colums of S119898was set to the number of different triphones (the columndimension of T119898) Those column vectors of S119898 were retainedthat had the 4446 highest variances We then estimated thetransformation matrix G and used this matrix to predict thetriphones that define wordsrsquo forms

We evaluated model performance in two ways First weinspected whether the triphones with maximal support wereindeed the targeted triphones This turned out to be the casefor all words Targeted triphones had an activation valueclose to one and nontargeted triphones an activation closeto zero As the triphones are not ordered we also investi-gated whether the sequence of phones could be constructedcorrectly for these words To this end we set a thresholdof 099 extracted all triphones with an activation exceedingthis threshold and used the all simple paths (a path issimple if the vertices it visits are not visited more than once)function from the igraph package [127] to calculate all pathsstarting with any left-edge triphone in the set of extractedtriphones From the resulting set of paths we selected the

22 Complexity

longest path which invariably was perfectly aligned with thesequence of triphones that defines wordsrsquo forms

We also evaluated model performance with a secondmore general heuristic algorithm that also makes use of thesame algorithm from graph theory Our algorithm sets up agraph with vertices collected from the triphones that are bestsupported by the relevant semantic vectors and considers allpaths it can find that lead from an initial triphone to a finaltriphoneThis algorithm which is presented inmore detail inthe appendix and which is essential for novel complex wordsproduced the correct form for 3982 out of 3987 words Itselected a shorter form for five words int for intent lin forlinnen mis for mistress oint for ointment and pin for pippinThe correct forms were also found but ranked second due toa simple length penalty that is implemented in the algorithm

From these analyses it is clear that mapping nearly4000 semantic vectors on their corresponding triphone pathscan be accomplished with very high accuracy for Englishmonolexomic words The question to be addressed nextis how well this approach works for complex words Wefirst address inflected forms and limit ourselves here to theinflected variants of the present set of 3987 monolexomicwords

522 Performance on Inflected Words Following the classicdistinction between inflection and word formation inflectedwords did not receive semantic vectors of their own Never-theless we can create semantic vectors for inflected words byadding the semantic vector of an inflectional function to thesemantic vector of its base However there are several ways inwhich themapping frommeaning to form for inflectedwordscan be set up To explain this we need some further notation

Let S119898 and T119898 denote the submatrices of S and T thatcontain the semantic and triphone vectors of monolexomicwords Assume that a subset of 119896 of these monolexomicwords is attested in the training corpus with inflectionalfunction 119886 Let T119886 denote the matrix with the triphonevectors of these inflected words and let S119886 denote thecorresponding semantic vectors To obtain S119886 we take thepertinent submatrix S119898119886 from S119898 and add the semantic vectors119886 of the affix

S119886 = S119898119886 + i⨂ s119886 (27)

Here i is a unit vector of length 119896 and ⨂ is the generalizedKronecker product which in (27) stacks 119896 copies of s119886 row-wise As a first step we could define a separate mapping G119886for each inflectional function 119886

S119886G119886 = T119886 (28)

but in this set-up learning of inflected words does not benefitfrom the knowledge of the base words This can be remediedby a mapping for augmented matrices that contain the rowvectors for both base words and inflected words[S119898

S119886]G119886 = [T119898

T119886] (29)

The dimension of G119886 (length of semantic vector by lengthof triphone vector) remains the same so this option is notmore costly than the preceding one Nevertheless for eachinflectional function a separate large matrix is required Amuch more parsimonious solution is to build augmentedmatrices for base words and all inflected words jointly

[[[[[[[[[[S119898S1198861S1198862S119886119899

]]]]]]]]]]G = [[[[[[[[[[

T119898T1198861T1198862T119886119899

]]]]]]]]]](30)

The dimension of G is identical to that of G119886 but now allinflectional functions are dealt with by a single mappingIn what follows we report the results obtained with thismapping

We selected 6595 inflected variants which met the cri-terion that the frequency of the corresponding inflectionalfunction was at least 50 This resulted in a dataset with 91comparatives 97 superlatives 2401 plurals 1333 continuousforms (eg walking) 859 past tense forms and 1086 formsclassed as perfective (past participles) as well as 728 thirdperson verb forms (eg walks) Many forms can be analyzedas either past tenses or as past participles We followed theanalyses of the treetagger which resulted in a dataset inwhichboth inflectional functions are well-attested

Following (30) we obtained an (augmented) 10582 times5483 semantic matrix S whereas before we retained the 5483columns with the highest column variance The (augmented)triphone matrix T for this dataset had the same dimensions

Inspection of the activations of the triphones revealedthat targeted triphones had top activations for 85 of themonolexomic words and 86 of the inflected words Theproportion of words with at most one intruding triphone was97 for both monolexomic and inflected words The graph-based algorithm performed with an overall accuracy of 94accuracies broken down bymorphology revealed an accuracyof 99 for the monolexomic words and an accuracy of 92for inflected words One source of errors for the productionalgorithm is inconsistent coding in the celex database Forinstance the stem of prosper is coded as having a final schwafollowed by r but the inflected forms are coded without ther creating a mismatch between a partially rhotic stem andcompletely non-rhotic inflected variants

We next put model performance to a more stringent testby using 10-fold cross-validation for the inflected variantsFor each fold we trained on all stems and 90 of all inflectedforms and then evaluated performance on the 10of inflectedforms that were not seen in training In this way we can ascer-tain the extent to which our production system (network plusgraph-based algorithm for ordering triphones) is productiveWe excluded from the cross-validation procedure irregularinflected forms forms with celex phonological forms withinconsistent within-paradigm rhoticism as well as forms thestem of which was not available in the training set Thus

Complexity 23

cross-validation was carried out for a total of 6236 inflectedforms

As before the semantic vectors for inflected words wereobtained by addition of the corresponding content and inflec-tional semantic vectors For each training set we calculatedthe transformationmatrixG from the S andTmatrices of thattraining set For an out-of-bag inflected form in the test setwe calculated its semantic vector and multiplied this vectorwith the transformation matrix (using (21)) to obtain thepredicted triphone vector t

The proportion of forms that were predicted correctlywas 062 The proportion of forms ranked second was 025Forms that were incorrectly ranked first typically were otherinflected forms (including bare stems) that happened toreceive stronger support than the targeted form Such errorsare not uncommon in spoken English For instance inthe Buckeye corpus [110] closest is once reduced to closFurthermore Dell [90] classified forms such as concludementfor conclusion and he relax for he relaxes as (noncontextual)errors

The algorithm failed to produce the targeted form for3 of the cases Examples of the forms produced instead arethe past tense for blazed being realized with [zId] insteadof [zd] the plural mouths being predicted as having [Ts] ascoda rather than [Ds] and finest being reduced to finst Thevoiceless production formouth does not follow the dictionarynorm but is used as attested by on-line pronunciationdictionaries Furthermore the voicing alternation is partlyunpredictable (see eg [128] for final devoicing in Dutch)and hence model performance here is not unreasonable Wenext consider model accuracy for derived words

523 Performance on Derived Words When building thevector space model we distinguished between inflectionand word formation Inflected words did not receive theirown semantic vectors By contrast each derived word wasassigned its own lexome together with a lexome for itsderivational function Thus happiness was paired with twolexomes happiness and ness Since the semantic matrix forour dataset already contains semantic vectors for derivedwords we first investigated how well forms are predictedwhen derived words are assessed along with monolexomicwords without any further differentiation between the twoTo this end we constructed a semantic matrix S for 4885words (rows) by 4993 (columns) and constructed the trans-formation matrix G from this matrix and the correspondingtriphone matrix T The predicted form vectors of T =SG supported the targeted triphones above all other tri-phones without exception Furthermore the graph-basedalgorithm correctly reconstructed 99 of all forms withonly 5 cases where it assigned the correct form secondrank

Next we inspected how the algorithm performs whenthe semantic vectors of derived forms are obtained fromthe semantic vectors of their base words and those of theirderivational lexomes instead of using the semantic vectors ofthe derived words themselves To allow subsequent evalua-tion by cross-validation we selected those derived words that

contained an affix that occured at least 30 times in our data set(again (38) agent (177) ful (45) instrument (82) less(54) ly (127) and ness (57)) to a total of 770 complex words

We first combined these derived words with the 3987monolexomic words For the resulting 4885 words the Tand S matrices were constructed from which we derivedthe transformation matrix G and subsequently the matrixof predicted triphone strengths T The proportion of wordsfor which the targeted triphones were the best supportedtriphones was 096 and the graph algorithm performed withan accuracy of 989

In order to assess the productivity of the system weevaluated performance on derived words by means of 10-fold cross-validation For each fold we made sure that eachaffix was present proportionally to its frequency in the overalldataset and that a derived wordrsquos base word was included inthe training set

We first examined performance when the transformationmatrix is estimated from a semantic matrix that contains thesemantic vectors of the derived words themselves whereassemantic vectors for unseen derived words are obtained bysummation of the semantic vectors of base and affix It turnsout that this set-up results in a total failure In the presentframework the semantic vectors of derived words are tooidiosyncratic and too finely tuned to their own collocationalpreferences They are too scattered in semantic space tosupport a transformation matrix that supports the triphonesof both stem and affix

We then used exactly the same cross-validation procedureas outlined above for inflected words constructing semanticvectors for derived words from the semantic vectors of baseand affix and calculating the transformation matrix G fromthese summed vectors For unseen derivedwordsGwas usedto transform the semantic vectors for novel derived words(obtained by summing the vectors of base and affix) intotriphone space

For 75 of the derived words the graph algorithmreconstructed the targeted triphones For 14 of the derivednouns the targeted form was not retrieved These includecases such as resound which the model produced with [s]instead of the (unexpected) [z] sewer which the modelproduced with R instead of only R and tumbler where themodel used syllabic [l] instead of nonsyllabic [l] given in thecelex target form

53 Weak Links in the Triphone Graph and Delays in SpeechProduction An important property of the model is that thesupport for trigrams changes where a wordrsquos graph branchesout for different morphological variants This is illustratedin Figure 9 for the inflected form blending Support for thestem-final triphone End is still at 1 but then blend can endor continue as blends blended or blending The resultinguncertainty is reflected in the weights on the edges leavingEnd In this example the targeted ing form is driven by theinflectional semantic vector for continuous and hence theedge to ndI is best supported For other forms the edgeweights will be different and hence other paths will be bettersupported

24 Complexity

1

1

1051

024023

03

023

1

001 001

1

02

03

023

002

1 0051

bl

blE

lEn

EndndI

dIN

IN

INI

NIN

dId

Id

IdI

dII

IIN

ndz

dz

dzI

zIN

nd

EnI

nIN

lEI

EIN

Figure 9 The directed graph for blending Vertices represent triphones including triphones such as IIN that were posited by the graphalgorithm to bridge potentially relevant transitions that are not instantiated in the training data Edges are labelled with their edge weightsThe graph the edge weights of which are specific to the lexomes blend and continuous incorporates not only the path for blending butalso the paths for blend blends and blended

The uncertainty that arises at branching points in thetriphone graph is of interest in the light of several exper-imental results For instance inter keystroke intervals intyping become longer at syllable and morph boundaries[129 130] Evidence from articulography suggests variabilityis greater at morph boundaries [131] Longer keystroke exe-cution times and greater articulatory variability are exactlywhat is expected under reduced edge support We thereforeexamined whether the edge weights at the first branchingpoint are predictive for lexical processing To this end weinvestigated the acoustic duration of the segment at the centerof the first triphone with a reduced LDL edge weight (in thepresent example d in ndI henceforth lsquobranching segmentrsquo)for those words in our study that are attested in the Buckeyecorpus [110]

The dataset we extracted from the Buckeye corpus com-prised 15105 tokens of a total of 1327 word types collectedfrom 40 speakers For each of these words we calculatedthe relative duration of the branching segment calculatedby dividing segment duration by word duration For thepurposes of statistical evaluation the distribution of thisrelative duration was brought closer to normality bymeans ofa logarithmic transformationWe fitted a generalized additivemixed model [132] to log relative duration with randomintercepts for word and speaker log LDL edge weight aspredictor of interest and local speech rate (Phrase Rate)log neighborhood density (Ncount) and log word frequencyas control variables A summary of this model is presentedin Table 7 As expected an increase in LDL Edge Weight

goes hand in hand with a reduction in the duration of thebranching segment In other words when the edge weight isreduced production is slowed

The adverse effects of weaknesses in a wordrsquos path inthe graph where the path branches is of interest againstthe discussion about the function of weak links in diphonetransitions in the literature on comprehension [133ndash138] Forcomprehension it has been argued that bigram or diphonelsquotroughsrsquo ie low transitional probabilities in a chain of hightransitional probabilities provide points where sequencesare segmented and parsed into their constituents Fromthe perspective of discrimination learning however low-probability transitions function in exactly the opposite wayfor comprehension [54 124] Naive discriminative learningalso predicts that troughs should give rise to shorter process-ing times in comprehension but not because morphologicaldecomposition would proceed more effectively Since high-frequency boundary bigrams and diphones are typically usedword-internally across many words they have a low cuevalidity for thesewords Conversely low-frequency boundarybigrams are much more typical for very specific base+affixcombinations and hence are better discriminative cues thatafford enhanced activation of wordsrsquo lexomes This in turngives rise to faster processing (see also Ramscar et al [78] forthe discriminative function of low-frequency bigrams in low-frequency monolexomic words)

There is remarkable convergence between the directedgraphs for speech production such as illustrated in Figure 9and computational models using temporal self-organizing

Complexity 25

Table 7 Statistics for the partial effects of a generalized additive mixed model fitted to the relative duration of edge segments in the Buckeyecorpus The trend for log frequency is positive accelerating TPRS thin plate regression spline

A parametric coefficients Estimate Std Error t-value p-valueIntercept -19165 00592 -323908 lt 00001Log LDL Edge Weight -01358 00400 -33960 00007Phrase Rate 00047 00021 22449 00248Log Ncount 01114 00189 58837 lt 00001B smooth terms edf Refdf F-value p-valueTPRS log Frequency 19007 19050 75495 00004random intercepts word 11498839 13230000 314919 lt 00001random intercepts speaker 302995 390000 345579 lt 00001

maps (TSOMs [139ndash141]) TSOMs also use error-drivenlearning and in recent work attention is also drawn toweak edges in wordsrsquo paths in the TSOM and the conse-quences thereof for lexical processing [142] We are not usingTSOMs however one reason being that in our experiencethey do not scale up well to realistically sized lexiconsA second reason is that we are pursuing the hypothesisthat form serves meaning and that self-organization ofform by itself is not necessary This hypothesis is likelytoo strong especially as we do not provide a rationale forthe trigram and triphone units that we use to representaspects of form It is worth noting that spatial organizationof form features is not restricted to TSOMs Although inour approach the spatial organization of triphone features isleft unspecified such an organization can be easily enforced(see [85] for further details) by using an algorithm suchas graphopt which self-organizes the vertices of a graphinto a two-dimensional plane Figure 9 was obtained withthis algorithm (Graphopt was developed for the layout oflarge graphs httpwwwschmuhlorggraphopt andis implemented in the igraph package Graphopt uses basicprinciples of physics to iteratively determine an optimallayout Each node in the graph is given both mass and anelectric charge and edges between nodes are modeled asspringsThis sets up a system inwhich there are attracting andrepelling forces between the vertices of the graph and thisphysical system is simulated until it reaches an equilibrium)

54 Discussion Our production network reconstructsknown wordsrsquo forms with a high accuracy 999 formonolexomic words 92 for inflected words and 99 forderived words For novel complex words accuracy under10-fold cross-validation was 62 for inflected words and 75for derived words

The drop in performance for novel forms is perhapsunsurprising given that speakers understand many morewords than they themselves produce even though they hearnovel forms on a fairly regular basis as they proceed throughlife [79 143]

However we also encountered several technical problemsthat are not due to the algorithm but to the representationsthat the algorithm has had to work with First it is surprisingthat accuracy is as high as it is given that the semanticvectors are constructed from a small corpus with a very

simple discriminative algorithm Second we encounteredinconsistencies in the phonological forms retrieved from thecelex database inconsistencies that in part are due to theuse of discrete triphones Third many cases where the modelpredictions do not match the targeted triphone sequence thetargeted forms have a minor irregularity (eg resound with[z] instead of [s]) Fourth several of the typical errors that themodel makes are known kinds of speech errors or reducedforms that one might encounter in engaged conversationalspeech

It is noteworthy that themodel is almost completely data-driven The triphones are derived from celex and otherthan somemanual corrections for inconsistencies are derivedautomatically from wordsrsquo phone sequences The semanticvectors are based on the tasa corpus and were not in any wayoptimized for the production model Given the triphone andsemantic vectors the transformation matrix is completelydetermined No by-hand engineering of rules and exceptionsis required nor is it necessary to specify with hand-codedlinks what the first segment of a word is what its secondsegment is etc as in the weaver model [37] It is only in theheuristic graph algorithm that three thresholds are requiredin order to avoid that graphs become too large to remaincomputationally tractable

The speech production system that emerges from thisapproach comprises first of all an excellent memory forforms that have been encountered before Importantly thismemory is not a static memory but a dynamic one It is not arepository of stored forms but it reconstructs the forms fromthe semantics it is requested to encode For regular unseenforms however a second network is required that projects aregularized semantic space (obtained by accumulation of thesemantic vectors of content and inflectional or derivationalfunctions) onto the triphone output space Importantly itappears that no separate transformations or rules are requiredfor individual inflectional or derivational functions

Jointly the two networks both of which can also betrained incrementally using the learning rule ofWidrow-Hoff[44] define a dual route model attempts to build a singleintegrated network were not successfulThe semantic vectorsof derived words are too idiosyncratic to allow generalizationfor novel forms It is the property of the semantic vectorsof derived lexomes described in Section 323 namely thatthey are close to but not inside the cloud of their content

26 Complexity

lexomes which makes them productive Novel forms do notpartake in the idiosyncracies of lexicalized existing wordsbut instead remain bound to base and derivational lexomesIt is exactly this property that makes them pronounceableOnce produced a novel form will then gravitate as experi-ence accumulates towards the cloud of semantic vectors ofits morphological category meanwhile also developing itsown idiosyncracies in pronunciation including sometimeshighly reduced pronunciation variants (eg tyk for Dutchnatuurlijk ([natyrlk]) [111 144 145] It is perhaps possibleto merge the two routes into one but for this much morefine-grained semantic vectors are required that incorporateinformation about discourse and the speakerrsquos stance withrespect to the adressee (see eg [115] for detailed discussionof such factors)

6 Bringing in Time

Thus far we have not considered the role of time The audiosignal comes in over time longerwords are typically readwithmore than one fixation and likewise articulation is a temporalprocess In this section we briefly outline how time can bebrought into the model We do so by discussing the readingof proper names

Proper names (morphologically compounds) pose achallenge to compositional theories as the people referredto by names such as Richard Dawkins Richard NixonRichardThompson and SandraThompson are not obviouslysemantic composites of each other We therefore assignlexomes to names irrespective of whether the individu-als referred to are real or fictive alive or dead Further-more we assume that when the personal name and familyname receive their own fixations both names are under-stood as pertaining to the same named entity which istherefore coupled with its own unique lexome Thus forthe example sentences in Table 8 the letter trigrams ofJohn are paired with the lexome JohnClark and like-wise the letter trigrams of Clark are paired with this lex-ome

By way of illustration we obtained thematrix of semanticvectors training on 100 randomly ordered tokens of thesentences of Table 8 We then constructed a trigram cuematrix C specifying for each word (John Clark wrote )which trigrams it contains In parallel a matrix L specify-ing for each word its corresponding lexomes (JohnClarkJohnClark write ) was set up We then calculatedthe matrix F by solving CF = L and used F to cal-culate estimated (predicted) semantic vectors L Figure 10presents the correlations of the estimated semantic vectorsfor the word forms John John Welsch Janet and Clarkwith the targeted semantic vectors for the named entitiesJohnClark JohnWelsch JohnWiggam AnneHastieand JaneClark For the sequentially read words John andWelsch the semantic vector generated by John and thatgenerated by Welsch were summed to obtain the integratedsemantic vector for the composite name

Figure 10 upper left panel illustrates that upon readingthe personal name John there is considerable uncertainty

about which named entity is at issue When subsequentlythe family name is read (upper right panel) uncertainty isreduced and John Welsch now receives full support whereasother named entities have correlations close to zero or evennegative correlations As in the small world of the presenttoy example Janet is a unique personal name there is nouncertainty about what named entity is at issue when Janetis read For the family name Clark on its own by contrastJohn Clark and Janet Clark are both viable options

For the present example we assumed that every ortho-graphic word received one fixation However depending onwhether the eye lands far enough into the word namessuch as John Clark and compounds such as graph theorycan also be read with a single fixation For single-fixationreading learning events should comprise the joint trigrams ofthe two orthographic words which are then simultaneouslycalibrated against the targeted lexome (JohnClark graph-theory)

Obviously the true complexities of reading such asparafoveal preview are not captured by this example Never-theless as a rough first approximation this example showsa way forward for integrating time into the discriminativelexicon

7 General Discussion

We have presented a unified discrimination-driven modelfor visual and auditory word comprehension and wordproduction the discriminative lexicon The layout of thismodel is visualized in Figure 11 Input (cochlea and retina)and output (typing and speaking) systems are presented inlight gray the representations of themodel are shown in darkgray For auditory comprehension we make use of frequencyband summary (FBS) features low-level cues that we deriveautomatically from the speech signal The FBS features serveas input to the auditory network F119886 that maps vectors of FBSfeatures onto the semantic vectors of S For reading lettertrigrams represent the visual input at a level of granularitythat is optimal for a functional model of the lexicon (see [97]for a discussion of lower-level visual features) Trigrams arerelatively high-level features compared to the FBS featuresFor features implementing visual input at a much lower levelof visual granularity using histograms of oriented gradients[146] see Linke et al [15] Letter trigrams are mapped by thevisual network F119900 onto S

For reading we also implemented a network heredenoted by K119886 that maps trigrams onto auditory targets(auditory verbal images) represented by triphones Theseauditory triphones in turn are mapped by network H119886onto the semantic vectors S This network is motivated notonly by our observation that for predicting visual lexicaldecision latencies triphones outperform trigrams by a widemargin Network H119886 is also part of the control loop forspeech production see Hickok [147] for discussion of thiscontrol loop and the necessity of auditory targets in speechproduction Furthermore the acoustic durations with whichEnglish speakers realize the stem vowels of English verbs[148] as well as the duration of word final s in English [149]

Complexity 27

Table 8 Example sentences for the reading of proper names To keep the example simple inflectional lexomes are not taken into account

John Clark wrote great books about antsJohn Clark published a great book about dangerous antsJohn Welsch makes great photographs of airplanesJohn Welsch has a collection of great photographs of airplanes landingJohn Wiggam will build great harpsichordsJohn Wiggam will be a great expert in tuning harpsichordsAnne Hastie has taught statistics to great second year studentsAnne Hastie has taught probability to great first year studentsJanet Clark teaches graph theory to second year studentsJohnClark write great book about antJohnClark publish a great book about dangerous antJohnWelsch make great photograph of airplaneJohnWelsch have a collection of great photograph airplane landingJohnWiggam future build great harpsichordJohnWiggam future be a great expert in tuning harpsichordAnneHastie have teach statistics to great second year studentAnneHastie have teach probability to great first year studentJanetClark teach graphtheory to second year student

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

John

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

John Welsch

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

Janet

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

Clark

Pear

son

corr

elat

ion

minus0

50

00

51

0

Figure 10 Correlations of estimated semantic vectors for John JohnWelsch Janet andClarkwith the targeted semantic vectors of JohnClarkJohnWelsch JohnWiggam AnneHastie and JaneClark

can be predicted from ndl networks using auditory targets todiscriminate between lexomes [150]

Speech production is modeled by network G119886 whichmaps semantic vectors onto auditory targets which in turnserve as input for articulation Although current modelsof speech production assemble articulatory targets fromphonemes (see eg [126 147]) a possibility that is certainlycompatible with our model when phones are recontextual-ized as triphones we think that it is worthwhile investigatingwhether a network mapping semantic vectors onto vectors ofarticulatory parameters that in turn drive the articulators canbe made to work especially since many reduced forms arenot straightforwardly derivable from the phone sequences oftheir lsquocanonicalrsquo forms [111 144]

We have shown that the networks of our model havereasonable to high production and recognition accuraciesand also generate a wealth of well-supported predictions forlexical processing Central to all networks is the hypothesisthat the relation between form and meaning can be modeleddiscriminatively with large but otherwise surprisingly simplelinear networks the underlyingmathematics of which (linearalgebra) is well-understood Given the present results it isclear that the potential of linear algebra for understandingthe lexicon and lexical processing has thus far been severelyunderestimated (the model as outlined in Figure 11 is amodular one for reasons of computational convenience andinterpretabilityThe different networks probably interact (seefor some evidence [47]) One such interaction is explored

28 Complexity

typing speaking

orthographic targets

trigrams

auditory targets

triphones

semantic vectors

TASA

auditory cues

FBS features

orthographic cues

trigrams

cochlea retina

To Ta

Ha

GoGa

KaS

Fa Fo

CoCa

Figure 11 Overview of the discriminative lexicon Input and output systems are presented in light gray the representations of the model areshown in dark gray Each of these representations is a matrix with its own set of features Arcs are labeled with the discriminative networks(transformationmatrices) thatmapmatrices onto each otherThe arc from semantic vectors to orthographic vectors is in gray as thismappingis not explored in the present study

in Baayen et al [85] In their production model for Latininflection feedback from the projected triphone paths backto the semantics (synthesis by analysis) is allowed so thatof otherwise equally well-supported paths the path that bestexpresses the targeted semantics is selected For informationflows between systems in speech production see Hickok[147]) In what follows we touch upon a series of issues thatare relevant for evaluating our model

Incremental learning In the present study we esti-mated the weights on the connections of these networkswith matrix operations but importantly these weights canalso be estimated incrementally using the learning rule ofWidrow and Hoff [44] further improvements in accuracyare expected when using the Kalman filter [151] As allmatrices can be updated incrementally (and this holds aswell for the matrix with semantic vectors which are alsotime-variant) and in theory should be updated incrementally

whenever information about the order of learning events isavailable the present theory has potential for studying lexicalacquisition and the continuing development of the lexiconover the lifetime [79]

Morphology without compositional operations Wehave shown that our networks are predictive for a rangeof experimental findings Important from the perspective ofdiscriminative linguistics is that there are no compositionaloperations in the sense of Frege and Russell [1 2 152] Thework on compositionality in logic has deeply influencedformal linguistics (eg [3 153]) and has led to the beliefthat the ldquoarchitecture of the language facultyrdquo is groundedin a homomorphism between a calculus (or algebra) ofsyntactic representations and a calculus (or algebra) based onsemantic primitives Within this tradition compositionalityarises when rules combining representations of form arematched with rules combining representations of meaning

Complexity 29

The approach of Marelli and Baroni [60] who derive thesemantic vector of a complex word from the semantic vectorof its base and a dedicated affix-specific mapping from thepertinent base words to the corresponding derived wordsis in the spirit of compositionality in that a change in form(through affixation) goes hand in hand with a change inmeaning (modeled with a linear mapping in semantic space)

The perspective on morphological processing developedin the present study breaks with this tradition Althoughsemantic vectors for inflected words are obtained by summa-tion of the semantic vectors of the base word and those ofits inflectional functions (see [85] for modeling of the richverbal paradigms of Latin) the form vectors correspondingto the resulting semantic vectors are not obtained by thesummation of vectors for chunks of form nor by any otheroperations on forms Attempts to align chunks of wordsrsquo formwith their semantic structures mdash typically represented bygraphs constructed over semantic primitives (see eg [154])mdash are destined to end in quagmires of arbitrary decisionsabout the number of inflectional classes required (Estonian30 or 300) what the chunks should look like (does Germanhave a morpheme d in its articles and demonstratives derdie das diese ) whether Semitic morphology requiresoperations on form on one or two tiers [155ndash157] andhow many affixal position slots are required for Athebascanlanguages such asNavajo [158] A language that is particularlypainful for compositional theories is Yelı Dnye an Indo-Pacific language spoken on Rossel island (in the south-east ofPapuaNewGuinea)The language has a substantial inventoryof over a 1000 function words used for verb agreementTypically these words are monosyllables that simultaneouslyspecify values for negation tense aspect person and numberof subject deixis evidentiality associated motion and coun-terfactuality Furthermore verbs typically have no less thaneight different stems the use of which depends on particularconstellations of inflectional functions [159 160] Yelı Dnyethus provides a striking counterexample to the hypothesisthat a calculus of meaning would be paralleled by a calculusof form

Whereas the theory of linear discriminative learningradically rejects the idea that morphological processing isgrounded in compositionality as defined in mainstream for-mal linguistic theory it does allow for the conjoint expressionin form of lexical meanings and inflectional functions andfor forms to map onto such conjoint semantic vectors Butjust asmultiple inflectional functions can be expressed jointlyin a single word multiple words can map onto nonconjointsemantic vectors as was illustrated above for named entityrecognition John Wiggam can be understood to refer to avery specific person without this person being a composi-tional function of the proper name John and the surnameWiggam In both these cases there is no isomorphy betweeninflectional functions and lexical meanings on the one handand chunks of form on the other hand

Of course conjoint vectors for lexical meanings andinflectional functions can be implemented andmade to workonly because a Latin verb form such as sapıvissemus isunderstood when building the semantic space S to expresslexomes for person (first) number (plural) voice (active)

tense (pluperfect) mood (subjunctive) and lexical meaning(to know) (see [85] for computational modeling details)Although this could loosely be described as a decomposi-tional approach to inflected words we prefer not to use theterm lsquodecompositionalrsquo as in formal linguistics this termas outlined above has a very different meaning A moreadequate terminology would be that the system is conjointrealizational for production and conjoint inferential for com-prehension (we note here that we have focused on temporallyconjoint realization and inference leaving temporally disjointrealization and inference for sequences of words and possiblycompounds and fixed expression for further research)

Naive versus linear discriminative learning There isone important technical difference between the learningengine of the current study ldl and naive discriminativelearning (ndl) Because ndl makes the simplifying assump-tion that outcomes are orthogonal the weights from the cuesto a given outcome are independent of the weights from thecues to the other outcomes This independence assumptionwhich motivates the word naive in naive discriminativelearning makes it possible to very efficiently update networkweights during incremental learning In ldl by contrast thisindependence no longer holds learning is no longer naiveWe therefore refer to our networks not as ndl networksbut simply as linear discriminative learning networks Actualincremental updating of ldl networks is computationallymore costly but further numerical optimization is possibleand implementations are underway

Scaling An important property of the present approachis that good results are obtained already for datasets ofmodestsize Our semantic matrix is derived from the tasa corpuswhich with only 10 million words is dwarfed by the 28 billionwords of the ukWaC corpus used by Marelli and Baroni [60]more words than any individual speaker can ever encounterin their lifetime Similarly Arnold et al [18] report goodresults for word identification when models are trained on20 hours of speech which contrasts with the huge volumesof speech required for training deep learning networks forspeech recognition Although performance increases some-whatwithmore training data [161] it is clear that considerableheadway can be made with relatively moderate volumesof speech It thus becomes feasible to train models onfor instance Australian English and New Zealand Englishusing already existing speech corpora and to implementnetworks that make precise quantitative predictions for howunderstanding is mediated by listenersrsquo expectations abouttheir interlocutors (see eg [162])

In this study we have obtained good results with asemantic matrix with a dimensionality of around 4000lexomes By itself we find it surprising that a semantic vectorspace can be scaffolded with a mere 4000 words But thisfindingmay also shed light on the phenomenon of childhoodamnesia [163 164] Bauer and Larkina [165] pointed outthat young children have autobiographical memories thatthey may not remember a few years later They argue thatthe rate of forgetting diminishes as we grow older andstabilizes in adulthood Part of this process of forgettingmay relate to the changes in the semantic matrix Cruciallywe expect the correlational structure of lexomes to change

30 Complexity

considerably over time as experience accumulates If this isindeed the case the semantic vectors for the lexomes foryoung children are different from the corresponding lexomesfor older children which will again differ from those ofadults As a consequence autobiographical memories thatwere anchored to the lexomes at a young age can no longerbe accessed as the semantic vectors to which these memorieswere originally anchored no longer exist

No examplars Given how we estimate the transforma-tion matrices with which we navigate between form andmeaning it might seem that our approach is in essence animplementation of an exemplar-based theory of the mentallexicon After all for instance the C119886 matrix specifies foreach auditory word form (exemplar) its pertinent frequencyband summary features However the computational imple-mentation of the present study should not be confused withthe underlying algorithmic conceptualization The underly-ing conceptualization is that learning proceeds incrementallytrial by trial with the weights of transformation networksbeing updated with the learning rule of Widrow and Hoff[44] In the mean trial-by-trial updating with the Widrow-Hoff learning rule results in the same expected connectionweights as obtained by the present estimates using thegeneralized inverse The important conceptual advantage ofincremental learning is however that there is no need to storeexemplars By way of example for auditory comprehensionacoustic tokens are given with the input to the learningsystemThese tokens the ephemeral result of interactingwiththe environment leave their traces in the transformationmatrix F119886 but are not themselves stored The same holdsfor speech production We assume that speakers dynamicallyconstruct the semantic vectors from their past experiencesrather than retrieving these semantic vectors from somedictionary-like fixed representation familiar from currentfile systems The dynamic nature of these semantic vectorsemerged loud and clear from the modeling of novel complexwords for speech production under cross-validation In otherwords our hypothesis is that all input and output vectors areephemeral in the sense that they represent temporary statesof the cognitive system that are not themselves represented inthat system but that leave their traces in the transformationmatrices that jointly characterize and define the cognitivesystem that we approximate with the discriminative lexicon

This dynamic perspective on the mental lexicon asconsisting of several coupled networks that jointly bringthe cognitive system into states that in turn feed into fur-ther networks (not modeled here) for sentence integration(comprehension) and articulation (production) raises thequestion about the status of units in this theory Units playan important role when constructing numeric vectors forform and meaning Units for letter trigrams and phonetrigrams are of course context-enriched letter and phoneunits Content lexomes as well as lexomes for derivationaland inflectional functions are crutches for central semanticfunctions ranging from functions realizing relatively concreteonomasiological functions (lsquothis is a dog not a catrsquo) toabstract situational functions allowing entities events andstates to be specified on the dimensions of time spacequantity aspect etc However crucial to the central thrust of

the present study there are no morphological symbols (unitscombining form andmeaning) normorphological formunitssuch as stems and exponents of any kind in the modelAbove we have outlined the many linguistic considerationsthat have led us not to want to incorporate such units aspart of our theory Importantly as there are no hidden layersin our model it is not possible to seek for lsquosubsymbolicrsquoreflexes of morphological units in patterns of activation overhidden layers Here we depart from earlier connectionistmodels which rejected symbolic representations but retainedthe belief that there should be morphological representationalbeit subsymbolic ones Seidenberg and Gonnerman [166]for instance discuss morphological representations as beinglsquointerlevel representationsrsquo that are emergent reflections ofcorrelations among orthography phonology and semanticsThis is not to say that themore agglutinative a language is themore the present model will seem to the outside observer tobe operating withmorphemes But themore a language tendstowards fusional or polysyntheticmorphology the less strongthis impression will be

Model complexity Next consider the question of howdifficult lexical processing actually is Sidestepping the com-plexities of the neural embedding of lexical processing [126147] here we narrow down this question to algorithmiccomplexity State-of-the-art models in psychology [37 114167] implementmany layers of hierarchically organized unitsand many hold it for an established fact that such units arein some sense psychologically real (see eg [19 91 168])However empirical results can be equally well understoodwithout requiring the theoretical construct of the morpheme(see eg [47 54 58 166 169 170]) The present studyillustrates this point for lsquoboundaryrsquo effects in written andoral production When paths in the triphone graph branchuncertainty increases and processing slows down Althoughobserved lsquoboundaryrsquo effects are compatible with a post-Bloomfieldian lexicon the very same effects arise in ourmodel even though morphemes do not figure in any way inour computations

Importantly the simplicity of the linear mappings in ourmodel of the discriminative lexicon allows us to sidestep theproblem that typically arise in computationally implementedfull-scale hierarchical systems namely that errors arisingat lower levels propagate to higher levels The major stepsforward made in recent end-to-end models in languageengineering suggest that end-to-end cognitive models arealso likely to perform much better One might argue thatthe hidden layers of deep learning networks represent thetraditional lsquohidden layersrsquo of phonemes morphemes andword forms mediating between form and meaning in lin-guistic models and offshoots thereof in psychology Howeverthe case of baboon lexical learning [12 13] illustrates thatthis research strategy is not without risk and that simplerdiscriminative networks can substantially outperform deeplearning networks [15] We grant however that it is conceiv-able that model predictions will improve when the presentlinear networks are replaced by deep learning networks whenpresented with the same input and output representationsused here (see Zhao et al [171] for a discussion of lossfunctions for deep learning targeting numeric instead of

Complexity 31

categorical output vectors) What is unclear is whethersuch networks will improve our understandingmdashthe currentlinear networks offer the analyst connection weights betweeninput and output representations that are straightforwardlylinguistically interpretable Where we think deep learningnetworks really will come into their own is bridging the gapbetween for instance the cochlea and retina on the one handand the heuristic representations (letter trigrams) that wehave used as a starting point for our functional theory of themappings between form and meaning

Network flexibility An important challenge for deeplearning networks designed to model human lexical process-ing is to keep the networks flexible and open to rapid updat-ing This openness to continuous learning is required notonly by findings on categorization [40 41 41 42] includingphonetic categorization [172 173] but is also visible in timeseries of reaction times Baayen and Hendrix [17] reportedimproved prediction of visual lexical decision reaction timesin the British Lexicon Project when a naive discriminativelearning network is updated from trial to trial simulating thewithin-experiment learning that goes on as subjects proceedthrough the experiment In natural speech the phenomenonof phonetic convergence between interlocutors [174] likewisebears witness to a lexicon that is constantly recalibratedin interaction with its environment (for the many ways inwhich interlocutorsrsquo speech aligns see [175]) Likewise therapidity with which listeners adapt to foreign accents [176]within a few minutes shows that lexical networks exhibitfast local optimization As deep learning networks typicallyrequire huge numbers of cycles through the training dataonce trained they are at risk of not having the requiredflexibility for fast local adaptation This risk is reduced for(incremental) wide learning as illustrated by the phoneticconvergence evidenced by the model of Arnold et al [18] forauditory comprehension (a very similar point was made byLevelt [177] in his critique of connectionist models in thenineties)

Open questions This initial study necessarily leavesmany questions unanswered Compounds and inflectedderived words have not been addressed and morphologicaloperations such as reduplication infixation and nonconcate-native morphology provide further testing grounds for thepresent approach We also need to address the processingof compounds with multiple constituents ubiquitous inlanguages as different as German Mandarin and EstonianFurthermore in actual utterances words undergo assimila-tion at their boundaries and hence a crucial test case forthe discriminative lexicon is to model the production andcomprehension of words in multi-word utterances For ourproduction model we have also completely ignored stresssyllabification and pitch which however likely has renderedthe production task more difficult than necessary

A challenge for discriminative morphology is the devel-opment of proper semantic matrices For languages withrich morphology such as Estonian current morphologicalparsers will identify case-inflected words as nominativesgenitives or partitives (among others) but these labelsfor inflectional forms do not correspond to the semanticfunctions encoded in these forms (see eg [178 179]) What

is required are computational tools that detect the appropriateinflectional lexomes for these words in the sentences orutterances in which they are embedded

A related problem specifically for the speech productionmodel is how to predict the strongly reduced word formsthat are rampant in conversational speech [111] Our auditorycomprehension network F119886 is confronted with reduced formsin training and the study of Arnold et al [18] indicates thatpresent identification accuracymay approximate human per-formance for single-word recognition Whereas for auditorycomprehension we are making progress what is required forspeech production is a much more detailed understanding ofthe circumstances under which speakers actually use reducedword forms The factors driving reduction may be in partcaptured by the semantic vectors but we anticipate thataspects of the unfolding discourse play an important role aswell which brings us back to the unresolved question of howto best model not isolated words but words in context

An important challenge for our theory which formalizesincremental implicit learning is how to address and modelthe wealth of explicit knowledge that speakers have abouttheir language We play with the way words sound in poetrywe are perplexed by wordsrsquo meanings and reinterpret themso that they make more sense (eg the folk etymology [180]reflected in modern English crawfish a crustacean and nota fish originally Middle English crevis compare ecrevissein French) we teach morphology in schools and we evenfind that making morphological relations explicit may bebeneficial for aphasic patients [181] The rich culture of worduse cannot but interact with the implicit system but howexactly this interaction unfolds andwhat its consequences areat both sides of the consciousunconscious divide is presentlyunclear

Although many questions remain our model is remark-ably successful in capturing many aspects of lexical andmorphological processing in both comprehension and pro-duction Since themodel builds on a combination of linguisti-callymotivated lsquosmartrsquo low-level representations for form andmeaning and large but otherwise straightforward andmathe-matically well-understood two-layer linear networks (to beclear networks without any hidden layers) the conclusionseems justified that algorithmically the ldquomental lexiconrdquomaybe much simpler than previously thought

Appendix

A Vectors Matrices and MatrixMultiplication

Figure 12 presents two data points 119886 and 119887 with coordinates(3 4) and (-2 -3) Arrows drawn from the origin to thesepoints are shown in blue with a solid and dashed linerespectively We refer to these points as vectors Vectors aredenoted with lower case letters in bold and we place the 119909coordinate of a point next to the 119910 coordinate

a = (3 4) b = (minus2 minus3) (A1)

32 Complexity

minus4 minus2 0 2 4

minus4

minus2

02

4

x

y

a

b

x

y

Figure 12 Points 119886 and 119887 can be transformed into points 119901 and 119902 bya linear transformation

We can represent a and b jointly by placing them next to eachother in a matrix For matrices we use capital letters in boldfont

A = ( 3 4minus2 minus3) (A2)

In addition to the points 119886 and 119887 Figure 12 also shows twoother datapoints 119909 and 119910 The matrix for these two points is

B = (minus5 24 minus1) (A3)

Let us assume that points 119886 and 119887 represent the forms oftwo words 1205961 and 1205962 and that the points 119909 and 119910 representthe meanings of these two words (below we discuss in detailhow exactly wordsrsquo forms and meanings can be representedas vectors of numbers) As morphology is the study of therelation between wordsrsquo forms and their meanings we areinterested in how to transform points 119886 and b into points119909 and 119910 and likewise in how to transform points 119909 and119910 back into points 119886 and 119887 The first transformation isrequired for comprehension and the second transformationfor production

We begin with the transformation for comprehensionFormally we have matrix A which we want to transforminto matrix B A way of mapping A onto B is by means ofa linear transformation using matrix multiplication For 2 times2 matrices matrix multiplication is defined as follows(1198861 11988621198871 1198872) (1199091 11990921199101 1199102) = (11988611199091 + 11988621199101 11988611199092 + 1198862119910211988711199091 + 11988721199101 11988711199092 + 11988721199102) (A4)

Such a mapping of A onto B is given by a matrix F

F = ( 1 2minus2 minus1) (A5)

and it is straightforward to verify that indeed( 3 4minus2 minus3) ( 1 2minus2 minus1) = (minus5 24 minus1) (A6)

Using matrix notation we can write

AF = B (A7)

Given A and B how do we obtain F The answeris straightforward but we need two further concepts theidentity matrix and the matrix inverse For multiplication ofnumbers the identity multiplication is multiplication with 1For matrices multiplication with the identity matrix I leavesthe locations of the points unchangedThe identitymatrix hasones on the main diagonal and zeros elsewhere It is easilyverified that indeed(1 00 1) (1199091 11990921199101 1199102) = (1199091 11990921199101 1199102) (1 00 1) = (1199091 11990921199101 1199102) (A8)

For numbers the inverse of multiplication with 119904 is dividingby 119904 (1119904 ) (119904119909) = 119909 (A9)

For matrices the inverse of a square matrix X is that matrixXminus1 such that their product is the identity matrix

Xminus1X = XXminus1 = I (A10)

For nonsquare matrices the inverse Yminus1 is defined such that

Yminus1 (YX) = (XY) 119884minus1 = X (A11)

We find the square matrix F that maps the square matrices Aonto B as follows

AF = B

Aminus1AF = Aminus1B

IF = Aminus1B

F = Aminus1B(A12)

Since for the present example A and its inverse happen to beidentical (Aminus1 = A) we obtain (A6)

Fmaps wordsrsquo form vectors onto wordsrsquo semantic vectorsLet us now consider the reverse and see how we can obtainwordsrsquo form vectors from wordsrsquo semantic vectors That isgiven B we want to find that matrix G which maps B ontoA ie

BG = A (A13)

Solving this equation proceeds exactly as above

BG = A

Bminus1BG = Bminus1A

IG = Bminus1A

G = Bminus1A(A14)

Complexity 33

The inverse of B is

Bminus1 = (13 2343 53) (A15)

and hence we have

(13 2343 53) ( 3 4minus2 minus3) = (minus13 minus2323 13 ) = G (A16)

The inverse of a matrix needs not exist A square matrixthat does not have an inverse is referred to as a singularmatrix Almost all matrices with which we work in theremainder of this study are singular For singular matricesan approximation of the inverse can be used such as theMoore-Penrose generalized inverse (In R the generalizedinverse is available in theMASS package function ginv Thenumeric library that is used by ginv is LAPACK availableat httpwwwnetliborglapack) In this study wedenote the generalized inverse of a matrix X by X1015840

The examples presented here are restricted to 2 times 2matrices but themathematics generalize tomatrices of largerdimensions When multiplying two matrices X and Y theonly constraint is that the number of columns ofX is the sameas the number of rows ofYThe resultingmatrix has the samenumber of rows as X and the same number of columns as Y

In the present study the rows of matrices represent wordsand columns the features of thesewords For instance the rowvectors of A can represent wordsrsquo form features and the rowvectors of and B the corresponding semantic features Thefirst row vector of A is mapped onto the first row vector ofB as follows (3 4) ( 1 2minus2 minus1) = (minus5 2) (A17)

A transformationmatrix such as F can be represented as atwo-layer networkThe network corresponding to F is shownin Figure 13 When the input vector (3 4) is presented to thenetwork the nodes 1198941 and 1198942 are set to 3 and 4 respectively Toobtain the activations 1199001 = minus5 and 1199002 = 2 for the output vector(minus5 2) we cumulate the evidence at the input weighted bythe connection strengths specified on the edges of the graphThus for 1199001 we obtain the value 3 times 1 + 4 times minus2 = minus5 and for1199002 we have 3 times 2 + 4 times minus1 = 2 In other words the networkmaps the input vector (3 4) onto the output vector (minus5 2)

An important property of linear maps is that they areproductive We can present a novel form vector say

s = (2 2) (A18)

to the network and it will map this vector onto a novelsemantic vector Using matrices this new semantic vector isstraightforwardly calculated(2 2) ( 1 2minus2 minus1) = (minus2 2) (A19)

1 2 minus2 minus1

i1 i2

o1 o2

semantic vectors come out here

form features go in here

connection weightsare the transformationmatrix

Figure 13 The linear network corresponding to the transformationmatrix F There are two input nodes 1198941 and 1198942 (in blue)and twooutput nodes 1199001 and 1199002 in red When the input vector (3 4) ispresented to the network the value at 1199001 is 3 times 1 + 4 times minus2 = minus5and that at 1199002 is 3 times 2 + 4 times minus1 = 2 The network maps the point(3 4) onto the point (minus5 2)B Graph-Based Triphone Sequencing

The hypothesis underlying graph-based triphone sequencingis that in the directed graph that has triphones as verticesand an edge between any two vertices that overlap (iesegments 2 and 3 of triphone 1 are identical to segments 1and 2 of triphone 2) the path from a wordrsquos initial triphone(the triphone starting with ) to a wordrsquos final triphone (thetriphone ending with a ) is the path receiving the bestsupport of all possible paths from the initial to the finaltriphone Unfortunately the directed graph containing allvertices and edges is too large tomake computations tractablein a reasonable amount of computation time The followingalgorithm is a heuristic algorithm that first collects potentiallyrelevant vertices and edges and then calculates all pathsstarting fromany initial vertex using theall simple pathsfunction from the igraph package [127] As a first step allvertices that are supported by the stem and whole wordwith network support exceeding 01 are selected To theresulting set those vertices are added that are supported bythe affix which is conditional on these vertices having anetwork support exceeding 095 A directed graph can nowbe constructed and for each initial vertex all paths startingat these vertices are calculated The subset of those pathsreaching a final vertex is selected and the support for eachpath is calculated As longer paths trivially will have greatersupport the support for a path is weighted for its lengthsimply by dividing the raw support by the path length Pathsare ranked by weighted path support and the path withmaximal support is selected as the acoustic image drivingarticulation

It is possible that no path from an initial vertex to a finalvertex is found due to critical boundary triphones not beinginstantiated in the data on which the model was trainedThis happens when the model is assessed on novel inflected

34 Complexity

nobo

dyso

met

hing

ever

ythi

ngno

thin

gev

eryo

nean

ythi

ngan

yone

som

eone

hers

him his

via

than I

exce

ptou

tside

thei

rsw

hoev

erth

ence

min

ein

side

amon

gst its

thei

rou

rm

yyo

uyo

ur her

she

your

selv

esw

hate

ver us we it

they he

them m

em

yself

him

self

hers

elfou

rselv

esyo

urse

lfits

elfth

emse

lves

abou

tw

ithou

tw

ithin

beyo

ndam

ong

betw

een

thro

ugho

utdu

ring

since by

from in of

with at fo

r tobe

low

abov

e up offdo

wn

upon on

agai

nst

tow

ard

tow

ards

arou

ndne

aral

ong

unde

rbe

hind

besid

eth

roug

hon

toin

toov

erac

ross

nobodysomethingeverythingnothingeveryoneanythinganyonesomeonehershimhisviathanIexceptoutsidetheirswhoeverthencemineinsideamongstitstheirourmyyouyourhersheyourselveswhateverusweittheyhethemmemyselfhimselfherselfourselvesyourselfitselfthemselvesaboutwithoutwithinbeyondamongbetweenthroughoutduringsincebyfrominofwithatfortobelowaboveupoffdownupononagainsttowardtowardsaroundnearalongunderbehindbesidethroughontointooveracross

minus01 0 01 02Value

Color Keyand Histogram

050

015

0025

00C

ount

Figure 14 Heatmap for the correlations of the semantic vectors of pronouns and prepositions Note that quantifier pronouns form a groupof their own that prepositions and the other pronouns form distinguisable groups and that within groups further subclusters are visible egforwe and us she and her and across and over All diagonal elements represent correlations equal to 1 (this is not brought out by the heatmapwhich color-codes diagonal elements with the color for the highest correlation of the range we specified 025)

and derived words under cross-validationTherefore verticesand edges are added to the graph for all nonfinal verticesthat are at the end of any of the paths starting at any initialtriphone Such novel vertices and edges are assigned zeronetwork support Typically paths from the initial vertex tothe final vertex can now be constructed and path support isevaluated as above

For an algorithm that allows triphone paths to includecycles which may be the case in languages with much richer

morphology than English see Baayen et al [85] and for codethe R packageWpmWithLdl described therein

C A Heatmap for Function Words(Pronouns and Prepositions)

The heatmap for the correlations of the semantic vectors ofpronouns and prepositions is presented in Figure 14

Complexity 35

Data Availability

With the exception of the data from theDistributed Little RedHen Lab all primary data are publicly available from the citedsources

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The authors are indebted to Geoff Hollis Jessie Nixon ChrisWestbury and LuisMienhardt for their constructive feedbackon earlier versions of this manuscript This research wassupported by an ERC advancedGrant (no 742545) to the firstauthor

References

[1] G Frege ldquoBegriffsschrift a formula language modeled uponthat of arithmetic for pure thoughtrdquo in From Frege to GodelA Source Book in Mathematical Logic 1879-1931 pp 1ndash82 1967

[2] B RussellAn Inquiry intoMeaning andTruth Allen andUnwinLondon UK 1942

[3] R Montague ldquoThe proper treatment of quantification in ordi-nary englishrdquo in Approaches to Natural Language pp 221ndash242Springer 1973

[4] N Chomsky Syntactic Structures vol 33 Mouton The Hague1957

[5] W J M Levelt Formal Grammars in Linguistics And Psycholin-guistics Volume 1 An Introduction to the Theory of FormalLanguages and Automata Volume 2 Applications in LinguisticTheory Volume 3 Psycholinguistic Applications John BenjaminsPublishing 2008

[6] D F Kleinschmidt and T Florian Jaeger ldquoRobust speechperception Recognize the familiar generalize to the similar andadapt to the novelrdquo Psychological Review vol 122 no 2 pp 148ndash203 2015

[7] L Breiman ldquoRandom forestsrdquo Machine Learning vol 45 no 1pp 5ndash32 2001

[8] C Strobl A-L Boulesteix A Zeileis and T Hothorn ldquoBiasin random forest variable importance measures illustrationssources and a solutionrdquo BMC Bioinformatics vol 8 2007

[9] A Hannun C Case J Casper et al ldquoDeep speechScaling up end-to-end speech recognitionrdquo 2014httpsarxivorgabs14125567

[10] Z Tuske P Golik R Schluter and H Ney ldquoAcoustic modelingwith deep neural networks using raw time signal for lvcsrrdquo inProceedings of the INTERSPEECH pp 890ndash894 2014

[11] J Schmidhuber ldquoDeep learning in neural networks anoverviewrdquo Neural Networks vol 61 pp 85ndash117 2015

[12] T Hannagan J C Ziegler S Dufau J Fagot and J GraingerldquoDeep learning of orthographic representations in baboonsrdquoPLoS ONE vol 9 no 1 2014

[13] J Grainger S Dufau M Montant J C Ziegler and J FagotldquoOrthographic processing in baboons (Papio papio)rdquo Sciencevol 336 no 6078 pp 245ndash248 2012

[14] D Scarf K Boy A U Reinert J Devine O Gunturkun andM Colombo ldquoOrthographic processing in pigeons (columba

livia)rdquo Proceedings of the National Academy of Sciences vol 113no 40 pp 11272ndash11276 2016

[15] M Linke F Broker M Ramscar and R H Baayen ldquoArebaboons learning ldquoorthographicrdquo representations Probablynotrdquo PLoS ONE vol 12 no 8 2017

[16] D H Hubel and T N Wiesel ldquoReceptive fields binocularinteraction and functional architecture in the catrsquos visualcortexrdquo The Journal of Physiology vol 160 pp 106ndash154 1962

[17] R H Baayen and P Hendrix ldquoTwo-layer networks non-linear separation and human learningrdquo in From Semantics toDialectometry Festschrift in honor of JohnNerbonne Tributes 32M Wieling M Kroon G Van Noord and G Bouma Eds pp13ndash22 College Publications 2017

[18] D Arnold F Tomaschek K Sering F Lopez and R H BaayenldquoWords from spontaneous conversational speech can be rec-ognized with human-like accuracy by an error-driven learningalgorithm that discriminates between meanings straight fromsmart acoustic features bypassing the phoneme as recognitionunitrdquo PLoS ONE vol 12 no 4 2017

[19] P Zwitserlood ldquoProcessing and representation of morphologi-cal complexity in native language comprehension and produc-tionrdquo in The Construction of Words G E Booij Ed pp 583ndash602 Springer 2018

[20] G Booij ldquoConstruction morphologyrdquo Language and LinguisticsCompass vol 4 no 7 pp 543ndash555 2010

[21] J P Blevins ldquoWord-based morphologyrdquo Journal of Linguisticsvol 42 no 3 pp 531ndash573 2006

[22] G Stump Inflectional Morphology A Theory of ParadigmStructure Cambridge University Press 2001

[23] R Beard ldquoOn the extent and nature of irregularity in thelexiconrdquo Lingua vol 42 no 4 pp 305ndash341 1977

[24] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1974

[25] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1991

[26] C Hockett ldquoThe origin of speechrdquo Scientific American vol 203pp 89ndash97 1960

[27] M Erelt Estonian Language M Erelt Ed Estonian AcademyPublishers Tallinn Estonia 2003

[28] P H Matthews Grammatical Theory in the United States fromBloomfield to Chomsky Cambridge University Press Cam-bridge 1993

[29] N Chomsky andMHalleTheSound Pattern of English Harperand Row NY USA 1968

[30] T Landauer and S Dumais ldquoA solution to Platorsquos problem Thelatent semantic analysis theory of acquisition induction andrepresentation of knowledgerdquo Psychological Review vol 104 no2 pp 211ndash240 1997

[31] W Weaver ldquoTranslationrdquo in Machine Translation of LanguagesFourteen Essays W N Locke and A D Booth Eds pp 15ndash23MIT Press Cambridge 1955

[32] J R Firth Selected Papers of J R Firth 1952-59 IndianaUniversity Press 1968

[33] K Lund and C Burgess ldquoProducing high-dimensional seman-tic spaces from lexical co-occurrencerdquo Behavior Research Meth-ods Instruments and Computers vol 28 no 2 pp 203ndash2081996

[34] C Shaoul and C Westbury ldquoExploring lexical co-occurrencespace using HiDExrdquo Behavior Research Methods vol 42 no 2pp 393ndash413 2010

36 Complexity

[35] T Mikolov I Sutskever K Chen G Corrado and J DeanldquoDistributed representations of words and phrases and theircompositionalityrdquo inAdvances in Neural Information ProcessingSystems pp 3111ndash3119 2013

[36] A Roelofs ldquoThe WEAVER model of word-form encoding inspeech productionrdquo Cognition vol 64 no 3 pp 249ndash284 1997

[37] W J M Levelt A Roelofs and A S Meyer ldquoA theory of lexicalaccess in speech productionrdquoBehavioral and Brain Sciences vol22 no 1 pp 1ndash38 1999

[38] M Taft ldquoInteractive-activation as a Framework for Under-standing Morphological Processingrdquo Language and CognitiveProcesses vol 9 no 3 pp 271ndash294 1994

[39] R Schreuder and R H Baayen ldquoModeling morphologicalprocessingrdquo inMorphological Aspects of Language Processing LB Feldman Ed pp 131ndash154 Lawrence ErlbaumHillsdale NewJersey USA 1995

[40] B C Love D L Medin and T M Gureckis ldquoSUSTAIN ANetwork Model of Category Learningrdquo Psychological Reviewvol 111 no 2 pp 309ndash332 2004

[41] C J Marsolek ldquoWhat antipriming reveals about primingrdquoTrends in Cognitive Sciences vol 12 no 5 pp 176ndash181 2008

[42] M Ramscar and R Port ldquoCategorization (without categories)rdquoin Handbook of Cognitive Linguistics E Dabrowska and DDivjak Eds pp 75ndash99 De Gruyter Berlin Germany 2015

[43] R A Rescorla and A R Wagner ldquoA theory of Pavlovianconditioning Variations in the effectiveness of reinforcementand nonreinforcementrdquo in Classical Conditioning II CurrentResearch and Theory A H Black and W F Prokasy Eds pp64ndash99 Appleton Century Crofts NY USA 1972

[44] B Widrow and M E Hoff ldquoAdaptive switching circuitsrdquo 1960WESCON Convention Record Part IV pp 96ndash104 1960

[45] R H Baayen R Schreuder and R Sproat ldquoMorphology inthe Mental Lexicon A Computational Model for Visual WordRecognitionrdquo in Lexicon Development for Speech and LanguageProcessing F van Eynde and D Gibbon Eds pp 267ndash291Kluwer Academic Publishers 2000

[46] M Taft ldquoA morphological-decomposition model of lexicalrepresentationrdquo Linguistics vol 26 no 4 pp 657ndash667 1988

[47] MWHarmandM S Seidenberg ldquoComputing themeanings ofwords in reading Cooperative division of labor between visualand phonological processesrdquo Psychological Review vol 111 no3 pp 662ndash720 2004

[48] P C Trimmer J M McNamara A Houston and J A RMarshall ldquoDoes natural selection favour the Rescorla-Wagnerrulerdquo Journal of Theoretical Biology vol 302 pp 39ndash52 2012

[49] R R Miller R C Barnet and N J Grahame ldquoAssessment ofthe Rescorla-Wagner modelrdquo Psychological Bulletin vol 117 no3 pp 363ndash386 1995

[50] L J Kamin ldquoPredictability surprise attention and condition-ingrdquo in Punishment and Aversive Behavior B A Campbell andRMChurch Eds pp 276ndash296Appleton-Century-CroftsNYUSA 1969

[51] R A Rescorla ldquoPavlovian conditioning Itrsquos not what you thinkit isrdquo American Psychologist vol 43 no 3 pp 151ndash160 1988

[52] M Ramscar D Yarlett M Dye K Denny and K ThorpeldquoThe Effects of Feature-Label-Order and Their Implications forSymbolic Learningrdquo Cognitive Science vol 34 no 6 pp 909ndash957 2010

[53] M Ramscar and D Yarlett ldquoLinguistic self-correction in theabsence of feedback A new approach to the logical problem oflanguage acquisitionrdquo Cognitive Science vol 31 no 6 pp 927ndash960 2007

[54] R H Baayen P Milin D F ETHurđevic P Hendrix and MMarelli ldquoAn Amorphous Model for Morphological Process-ing in Visual Comprehension Based on Naive DiscriminativeLearningrdquo Psychological Review vol 118 no 3 pp 438ndash4812011

[55] F M Del Prado Martın R Bertram T Haikio R Schreuderand R H Baayen ldquoMorphological family size in a morphologi-cally rich languageThe case of Finnish compared to Dutch andHebrewrdquo Journal of Experimental Psychology LearningMemoryand Cognition vol 30 pp 1271ndash1278 2004

[56] P Milin D F Durdevic and F M Del Prado Martın ldquoThesimultaneous effects of inflectional paradigms and classes onlexical recognition Evidence from Serbianrdquo Journal of Memoryand Language vol 60 no 1 pp 50ndash64 2009

[57] B K Bergen ldquoThe psychological reality of phonaesthemesrdquoLanguage vol 80 no 2 pp 290ndash311 2004

[58] P Milin L B Feldman M Ramscar P Hendrix and R HBaayen ldquoDiscrimination in lexical decisionrdquo PLoS ONE vol 12no 2 2017

[59] R H Baayen P Milin and M Ramscar ldquoFrequency in lexicalprocessingrdquo Aphasiology vol 30 no 11 pp 1174ndash1220 2016

[60] M Marelli and M Baroni ldquoAffixation in semantic space Mod-eling morpheme meanings with compositional distributionalsemanticsrdquo Psychological Review vol 122 no 3 pp 485ndash5152015

[61] T Sering PMilin andRH Baayen ldquoLanguage comprehensionas amultiple label classification problemrdquo StatisticaNeerlandicapp 1ndash15 2018

[62] R Kaye and RWilson Linear Algebra Oxford University Press1998

[63] S H Ivens and B L Koslin Demands for Reading LiteracyRequire New Accountability Methods Touchstone Applied Sci-ence Associates 1991

[64] T K Landauer P W Foltz and D Laham ldquoAn introduction tolatent semantic analysisrdquoDiscourse Processes vol 25 no 2-3 pp259ndash284 1998

[65] H Schmid ldquoImprovements in Part-of-Speech Tagging with anApplication to Germanrdquo in Proceedings of the ACL SIGDAT-Workshop pp 13ndash25 Dublin Ireland 1995

[66] R H Baayen R Piepenbrock and L Gulikers The CELEXLexical Database (CD-ROM) Linguistic Data ConsortiumUniversity of Pennsylvania Philadelphia PA 1995

[67] J Mitchell and M Lapata ldquoVector-based models of semanticcompositionrdquo in Proceedings of the ACL pp 236ndash244 2008

[68] A Lazaridou M Marelli R Zamparelli and M BaronildquoCompositionally derived representations of morphologicallycomplex words in distributional semanticsrdquo in Proceedings ofthe ACL (1) pp 1517ndash1526 2013

[69] R Cotterell H Schutze and J Eisner ldquoMorphological smooth-ing and extrapolation of word embeddingsrdquo in Proceedings ofthe 54th Annual Meeting of the Association for ComputationalLinguistics vol 1 pp 1651ndash1660 2016

[70] B D Zeller S Pado and J Snajder ldquoTowards semantic valida-tion of a derivational lexiconrdquo in Proceedings of the COLING2014 the 25th International Conference on Computational Lin-guistics Technical Papers pp 1728ndash1739 2014

[71] H Marchand The Categories and Types of Present-Day EnglishWord Formation A Synchronic-Diachronic Approach BeckrsquoscheVerlagsbuchhandlung Munchen Germany 1969

[72] L Bauer EnglishWord-Formation Cambridge University PressCambridge 1983

Complexity 37

[73] I PlagWord-Formation in English CambridgeUniversity PressCambridge 2003

[74] R Beard Lexeme-Morpheme Base Morphology A GeneralTheory of Inflection and Word Formation State University ofNew York Press Albany NY USA 1995

[75] K Geeraert J Newman and R H Baayen ldquoIdiom VariationExperimental Data and a Blueprint of a Computational ModelrdquoTopics in Cognitive Science vol 9 no 3 pp 653ndash669 2017

[76] C Shaoul S Bitschau N Schilling et al ldquondl2 Naive discrim-inative learning an implementation in R R packagerdquo 2015

[77] P Milin D Divjak and R H Baayen ldquoA learning perspectiveon individual differences in skilled reading Exploring andexploiting orthographic and semantic discrimination cuesrdquoJournal of Experimental Psychology Learning Memory andCognition vol 43 no 11 pp 1730ndash1751 2017

[78] M Ramscar P Hendrix C Shaoul P Milin and R H BaayenldquoNonlinear dynamics of lifelong learning the myth of cognitivedeclinerdquo Topics in Cognitive Science vol 6 pp 5ndash42 2014

[79] M Ramscar C C Sun P Hendrix and R H Baayen ldquoTheMis-measurement of Mind Life-Span Changes in Paired-Associate-Learning Scores Reflect the ldquoCostrdquo of Learning Not CognitiveDeclinerdquo Psychological Science vol 28 no 8 pp 1171ndash1179 2017

[80] G desRosiers and D Ivison ldquoPaired associate learning Form 1and form 2 of the Wechsler memory scalerdquo Archives of ClinicalNeuropsychology vol 3 no 1 pp 47ndash67 1988

[81] E Bruni N K Tran andM Baroni ldquoMultimodal distributionalsemanticsrdquo Journal of Artificial Intelligence Research vol 49 pp1ndash47 2014

[82] W N Venables and B D RipleyModern Applied Statistics withS-PLUS Springer NY USA 2002

[83] A B Warriner V Kuperman and M Brysbaert ldquoNorms ofvalence arousal and dominance for 13915 English lemmasrdquoBehavior Research Methods vol 45 no 4 pp 1191ndash1207 2013

[84] D Kastovsky ldquoProductivity in word formationrdquo Linguistics vol24 no 3 pp 585ndash600 1986

[85] R H Baayen Y Chuang and J P Blevins ldquoInflectionalmorphology with linear mappingsrdquoTheMental Lexicon vol 13no 2 pp 232ndash270 2018

[86] Burnage CELEX A Guide for Users Centre for Lexical Infor-mation Nijmegen The Netherlands 1988

[87] C McBride-Chang ldquoModels of Speech Perception and Phono-logical Processing in Readingrdquo Child Development vol 67 no4 pp 1836ndash1856 1996

[88] G C Van Orden H Kloos et al ldquoThe question of phonologyand readingrdquo inThe Science of Reading A Handbook pp 61ndash782005

[89] D Jared and K OrsquoDonnell ldquoSkilled adult readers activate themeanings of high-frequency words using phonology Evidencefrom eye trackingrdquoMemoryampCognition vol 45 no 2 pp 334ndash346 2017

[90] G S Dell ldquoA Spreading-Activation Theory of Retrieval inSentence Productionrdquo Psychological Review vol 93 no 3 pp283ndash321 1986

[91] A Marantz ldquoNo escape from morphemes in morphologicalprocessingrdquo Language and Cognitive Processes vol 28 no 7 pp905ndash916 2013

[92] M Bozic W D Marslen-Wilson E A Stamatakis M HDavis and L K Tyler ldquoDifferentiating morphology formand meaning Neural correlates of morphological complexityrdquoCognitive Neuroscience vol 19 no 9 pp 1464ndash1475 2007

[93] R F Port andA P Leary ldquoAgainst formal phonologyrdquoLanguagevol 81 no 4 pp 927ndash964 2005

[94] J P Blevins ldquoStems and paradigmsrdquo Language vol 79 no 4 pp737ndash767 2003

[95] R Forsyth and D Holmes ldquoFeature-finding for test classifica-tionrdquo Literary and Linguistic Computing vol 11 no 4 pp 163ndash174 1996

[96] H Pham and R H Baayen ldquoVietnamese compounds showan anti-frequency effect in visual lexical decisionrdquo LanguageCognition and Neuroscience vol 30 no 9 pp 1077ndash1095 2015

[97] L Cohen and S Dehaene ldquoVentral and dorsal contributions toword readingrdquo inTheCognitive Neurosciences M S GazzanigaEd pp 789ndash804 Massachusetts Institute of Technology 2009

[98] E Keuleers P Lacey K Rastle and M Brysbaert ldquoThe BritishLexicon Project Lexical decision data for 28730 monosyllabicand disyllabic English wordsrdquo Behavior Research Methods vol44 no 1 pp 287ndash304 2012

[99] R K Wagner J K Torgesen and C A Rashotte ldquoDevel-opment of reading-related phonological processing abilitiesnew evidence of bidirectional causality from a latent variablelongitudinal studyrdquoDevelopmental Psychology vol 30 no 1 pp73ndash87 1994

[100] K F EWong andH-C Chen ldquoOrthographic and phonologicalprocessing in reading Chinese text Evidence from eye fixa-tionsrdquo Language and Cognitive Processes vol 14 no 5-6 pp461ndash480 1999

[101] R L Newman D Jared and C A Haigh ldquoDoes phonologyplay a role when skilled readers read high-frequency wordsEvidence from ERPsrdquo Language and Cognitive Processes vol 27no 9 pp 1361ndash1384 2012

[102] D Jared J Ashby S J Agauas and B A Levy ldquoPhonologicalactivation of word meanings in grade 5 readersrdquo Journal ofExperimental Psychology LearningMemory andCognition vol42 no 4 pp 524ndash541 2016

[103] T Bitan A Kaftory A Meiri-Leib Z Eviatar and O PelegldquoPhonological ambiguity modulates resolution of semanticambiguity during reading An fMRI study of Hebrewrdquo Neu-ropsychology vol 31 no 7 pp 759ndash777 2017

[104] D Jared and S Bainbridge ldquoReading homophone puns Evi-dence from eye trackingrdquo Canadian Journal of ExperimentalPsychology vol 71 no 1 pp 2ndash13 2017

[105] S Amenta M Marelli and S Sulpizio ldquoFrom sound tomeaning Phonology-to-Semantics mapping in visual wordrecognitionrdquo Psychonomic Bulletin amp Review vol 24 no 3 pp887ndash893 2017

[106] M Perrone-Bertolotti J Kujala J R Vidal et al ldquoHow silent issilent reading intracerebral evidence for top-down activationof temporal voice areas during readingrdquo The Journal of Neuro-science vol 32 no 49 pp 17554ndash17562 2012

[107] B Yao P Belin andC Scheepers ldquoSilent reading of direct versusindirect speech activates voice-selective areas in the auditorycortexrdquo Journal of Cognitive Neuroscience vol 23 no 10 pp3146ndash3152 2011

[108] M Coltheart B Curtis P Atkins and M Haller ldquoMod-els of Reading Aloud Dual-Route and Parallel-Distributed-Processing Approachesrdquo Psychological Review vol 100 no 4pp 589ndash608 1993

[109] M Coltheart ldquoModeling reading The dual-route approachrdquo inThe Science of Reading A Handbook pp 6ndash23 2005

38 Complexity

[110] M Pitt K Johnson E Hume S Kiesling and W RaymondldquoThe Buckeye corpus of conversational speech Labeling con-ventions and a test of transcriber reliabilityrdquo Speech Communi-cation vol 45 no 1 pp 89ndash95 2005

[111] K Johnson ldquoMassive reduction in conversational AmericanEnglishrdquo in Proceedings of the Spontaneous Speech Data andAnalysis Proceedings of the 1st Session of the 10th InternationalSymposium pp 29ndash54 The National International Institute forJapanese Language Tokyo Japan 2004

[112] C Phillips ldquoLevels of representation in the electrophysiology ofspeech perceptionrdquo Cognitive Science vol 25 no 5 pp 711ndash7312001

[113] R L Diehl A J Lotto and L L Holt ldquoSpeech perceptionrdquoAnnual Review of Psychology vol 55 pp 149ndash179 2004

[114] D Norris and J M McQueen ldquoShortlist B a Bayesian model ofcontinuous speech recognitionrdquo Psychological Review vol 115no 2 pp 357ndash395 2008

[115] S Hawkins ldquoRoles and representations of systematic finephonetic detail in speech understandingrdquo Journal of Phoneticsvol 31 no 3-4 pp 373ndash405 2003

[116] CCucchiarini andH Strik ldquoAutomatic Phonetic TranscriptionAn overviewrdquo in Proceedings of the 15th ICPhS pp 347ndash350Barcelona Spain 2003

[117] K JohnsonTheAuditoryPerceptual Basis for Speech Segmenta-tion Ohio State University Working Papers in Linguistics 1997

[118] H Fletcher ldquoAuditory patternsrdquo Reviews of Modern Physics vol12 no 1 pp 47ndash65 1940

[119] D Arnold ldquoAcousticndlcoder Coding sound files for use withndl R package version 101rdquo 2017

[120] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2016

[121] J Pickett and I Pollack ldquoIntelligibility of excerpts from fluentspeech Effects of rate of utterance and duration of excerptrdquoLanguage and Speech vol 6 pp 151ndash164 1963

[122] L Shockey ldquoPerception of reduced forms by non-native speak-ers of Englishrdquo in Sound Patterns of Spontaneous Speech DDuez Ed pp 97ndash100 ESCA Aix 1998

[123] M Ernestus H Baayen and R Schreuder ldquoThe recognition ofreduced word formsrdquo Brain and Language vol 81 no 1-3 pp162ndash173 2002

[124] R H Baayen C Shaoul J Willits and M Ramscar ldquoCom-prehension without segmentation a proof of concept withnaive discriminative learningrdquo Language Cognition and Neu-roscience vol 31 no 1 pp 106ndash128 2016

[125] C P Browman and L Goldstein ldquoArticulatory phonology Anoverviewrdquo Phonetica vol 49 no 3-4 pp 155ndash180 1992

[126] J A Tourville and F H Guenther ldquoThe DIVA model A neuraltheory of speech acquisition and productionrdquo Language andCognitive Processes vol 26 no 7 pp 952ndash981 2011

[127] G Csardi and T Nepusz ldquoThe igraph software package forcomplex network researchrdquo InterJournal Complex Systems vol1695 no 5 pp 1ndash9 2006

[128] M Ernestus and R H Baayen ldquoPredicting the unpredictableInterpreting neutralized segments in Dutchrdquo Language vol 79no 1 pp 5ndash38 2003

[129] R Weingarten G Nottbusch and U Will ldquoMorphemes sylla-bles and graphemes in written word productionrdquo inMultidisci-plinary Approaches to Speech Production T Pechmann and CHabel Eds pp 529ndash572 Mouton de Gruyter Berlin Germany2004

[130] R Bertram F E Toslashnnessen S Stromqvist J Hyona andP Niemi ldquoCascaded processing in written compound wordproductionrdquo Frontiers in Human Neuroscience vol 9 pp 1ndash102015

[131] T Cho ldquoEffects of morpheme boundaries on intergesturaltiming Evidence from Koreanrdquo Phonetica vol 58 no 3 pp129ndash162 2001

[132] S N Wood Generalized Additive Models Chapman ampHallCRC NY USA 2017

[133] M Seidenberg ldquoSublexical structures in visual word recogni-tion Access units or orthographic redundancyrdquo in Attentionand Performance XII M Coltheart Ed pp 245ndash264 LawrenceErlbaum Associates Hove 1987

[134] J M McQueen ldquoSegmentation of continuous speech usingphonotacticsrdquo Journal of Memory and Language vol 39 no 1pp 21ndash46 1998

[135] J B Hay Causes and Consequences of Word Structure Rout-ledge NY USA London UK 2003

[136] J B Hay ldquoFrom speech perception to morphology Affix-ordering revisitedrdquo Language vol 78 pp 527ndash555 2002

[137] J B Hay and R H Baayen ldquoPhonotactics parsing and produc-tivityrdquo Italian Journal of Linguistics vol 15 no 1 pp 99ndash1302003

[138] J Hay J Pierrehumbert and M Beckman ldquoSpeech perceptionwell-formedness and the statistics of the lexiconrdquo Papers inlaboratory phonology VI pp 58ndash74 2004

[139] M Ferro C Marzi and V Pirrelli ldquoA self-organizing modelof word storage and processing implications for morphologylearningrdquo Lingue e Linguaggio vol 10 no 2 pp 209ndash226 2011

[140] F ChersiM Ferro G Pezzulo andV Pirrelli ldquoTopological self-organization and prediction learning support both action andlexical chains in the brainrdquo Topics in Cognitive Science vol 6no 3 pp 476ndash491 2014

[141] V Pirrelli M Ferro and C Marzi ldquoComputational complexityof abstractive morphologyrdquo in Understanding and MeasuringMorphological Complexity M Bearman D Brown and G GCorbett Eds pp 141ndash166 Oxford University Press 2015

[142] C Marzi M Ferro and V Pirrelli ldquoIs inflectional irregularitydysfunctional to human processingrdquo in Abstract Booklet TheMental Lexicon 2018 V Kuperman Ed University of AlbertaEdmonton 2018

[143] E Keuleers M Stevens P Mandera and M Brysbaert ldquoWordknowledge in the crowd Measuring vocabulary size and wordprevalence in a massive online experimentrdquo The QuarterlyJournal of Experimental Psychology vol 68 no 8 pp 1665ndash16922015

[144] M Ernestus Voice Assimilation And Segment Reduction inCasual Dutch A Corpus-Based Study of the Phonology-PhoneticsInterface LOT Utrecht the Netherlands 2000

[145] R Kemps M Ernestus R Schreuder and H Baayen ldquoProcess-ing reducedword formsThe suffix restoration effectrdquoBrain andLanguage vol 90 no 1-3 pp 117ndash127 2004

[146] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) vol 1 pp 886ndash893 June 2005

[147] G Hickok ldquoThe architecture of speech production and the roleof the phoneme in speech processingrdquo Language Cognition andNeuroscience vol 29 no 1 pp 2ndash20 2014

[148] B V Tucker M Sims and R H Baayen Opposing Forceson Acoustic Duration Manuscript University of Alberta andUniversity of Tubingen 2018

Complexity 39

[149] I Plag J Homann and G Kunter ldquoHomophony and mor-phology The acoustics of word-final S in Englishrdquo Journal ofLinguistics vol 53 no 1 pp 181ndash216 2017

[150] F Tomaschek I Plag M Ernestus and R H BaayenModeling the Duration of Word-Final s in English withNaive Discriminative Learning Manuscript University ofSiegenTubingenNijmegen 2018

[151] R E Kalman ldquoA new approach to linear filtering and predictionproblemsrdquo Journal of Fluids Engineering vol 82 no 1 pp 35ndash451960

[152] B Russell ldquoOn denotingrdquo Mind vol 14 no 56 pp 479ndash4931905

[153] N Hornstein Logical Form FromGB toMinimalism Blackwell1995

[154] R Jackendoff Semantic StructuresMITPress Cambridge 1990[155] J J McCarthy ldquoA prosodic theory of non-concatenative mor-

phologyrdquo Linguistic Inquiry vol 12 pp 373ndash418 1981[156] A Ussishkin ldquoA Fixed Prosodic Theory of Nonconcatenative

Templaticmorphologyrdquo Natural Language amp Linguistic Theoryvol 23 no 1 pp 169ndash218 2005

[157] A Ussishkin ldquoAffix-favored contrast inequity and psycholin-guistic grounding for non-concatenative morphologyrdquo Mor-phology vol 16 no 1 pp 107ndash125 2006

[158] RW Young andWMorganTheNavajo Language A Grammarand Colloquial Dictionary University of New Mexico Press1980

[159] S C Levinson and A Majid ldquoThe island of time Yelı Dnye thelanguage of Rossel islandrdquo Frontiers in psychology vol 4 2013

[160] S C Levinson ldquoThe language of space in Yelı Dnyerdquo inGrammars of Space Explorations in Cognitive Diversity pp 157ndash203 Cambridge University Press 2006

[161] E Shafaei-Bajestan and R H Baayen ldquoWide Learning forAuditory Comprehensionrdquo in Proceedings of the Interspeech2018 2018

[162] J Hay and K Drager ldquoStuffed toys and speech perceptionrdquoLinguistics vol 48 no 4 pp 865ndash892 2010

[163] V Henri and C Henri ldquoOn our earliest recollections ofchildhoodrdquo Psychological Review vol 2 pp 215-216 1895

[164] S Freud ldquoChildhood and concealing memoriesrdquo in The BasicWritings of Sigmund Freud The Modern Library NY USA19051953

[165] P J Bauer and M Larkina ldquoThe onset of childhood amnesiain childhood A prospective investigation of the course anddeterminants of forgetting of early-life eventsrdquoMemory vol 22no 8 pp 907ndash924 2014

[166] M S Seidenberg and L M Gonnerman ldquoExplaining deriva-tional morphology as the convergence of codesrdquo Trends inCognitive Sciences vol 4 no 9 pp 353ndash361 2000

[167] M Coltheart K Rastle C Perry R Langdon and J ZieglerldquoDRC a dual route cascaded model of visual word recognitionand reading aloudrdquoPsychological Review vol 108 no 1 pp 204ndash256 2001

[168] M V Butz and E F Kutter How the Mind Comes into BeingIntroducing Cognitive Science from a Functional and Computa-tional Perspective Oxford University Press 2016

[169] M W Harm and M S Seidenberg ldquoPhonology readingacquisition and dyslexia Insights from connectionist modelsrdquoPsychological Review vol 106 no 3 pp 491ndash528 1999

[170] L M Gonnerman M S Seidenberg and E S AndersenldquoGraded semantic and phonological similarity effects in prim-ing Evidence for a distributed connectionist approach to

morphologyrdquo Journal of Experimental Psychology General vol136 no 2 pp 323ndash345 2007

[171] H Zhao O Gallo I Frosio and J Kautz ldquoLoss Functions forImage Restoration With Neural Networksrdquo IEEE Transactionson Computational Imaging vol 3 no 1 pp 47ndash57 2017

[172] C Clarke and P Luce ldquoPerceptual adaptation to speakercharacteristics Vot boundaries in stop voicing categorizationrdquoin Proceedings of the ISCA Workshop on Plasticity in SpeechPerception 2005

[173] DNorris JMMcQueen andACutler ldquoPerceptual learning inspeechrdquo Cognitive Psychology vol 47 no 2 pp 204ndash238 2003

[174] A Schweitzer and N Lewandowski ldquoSocial factors in conver-gence of f1 and f2 in spontaneous speechrdquo in Proceedings ofthe 10th International Seminar on Speech Production CologneGermany 2014

[175] M J Pickering and S Garrod ldquoToward a mechanistic psychol-ogy of dialoguerdquo Behavioral and Brain Sciences vol 27 no 2 pp169ndash190 2004

[176] C M Clarke and M F Garrett ldquoRapid adaptation to foreign-accented Englishrdquo The Journal of the Acoustical Society ofAmerica vol 116 no 6 pp 3647ndash3658 2004

[177] W J Levelt ldquoDie konnektionistische moderdquo Sprache UndKognition vol 10 no 2 pp 61ndash72 1991

[178] A Kostic ldquoInformational load constraints on processinginflected morphologyrdquo in Morphological Aspects of LanguageProcessing L B Feldman Ed Lawrence Erlbaum Inc Publish-ers New Jersey USA 1995

[179] J P BlevinsWord andParadigmMorphology OxfordUniversityPress 2016

[180] E Forstemann Uber Deutsche volksetymologie Zeitschrift furvergleichende Sprachforschung auf dem Gebiete des DeutschenGriechischen und Lateinischen 1852

[181] K Nault Morphological Therapy Protocal [Ph D thesis] Uni-versity of Alberta Edmonton 2010

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 7: The Discriminative Lexicon: A Unified Computational Model ...downloads.hindawi.com/journals/complexity/2019/4895891.pdf · as developed within computational linguistics and com-puter

Complexity 7

vectors for function words are required Second althoughthe semantic vectors of function words typically are lessinformative (they typically have small association strengthsto very large numbers of words) they still show structureas illustrated in Appendix C for pronouns and prepositions)inflectional endings are replaced by the inflectional functionsthey subserve and for derived words the semantic functionof the derivational suffix is identified as well In what followswe outline in more detail what considerations motivate thisway of constructing the input for our algorithm constructingsemantic vectors We note here that our method for handlingmorphologically complex words can be combined with anyof the currently available algorithms for building semanticvectors We also note here that we are not concerned withthe question of how lexomes for plurality or tense might beinduced from word forms from world knowledge or anycombination of the two All we assume is first that anyoneunderstanding the sentence the boysrsquo happiness was great toobserve knows that more than one boy is involved and thatthe narrator situates the event in the past Second we takethis understanding to drive learning

We distinguished the following seven inflectional func-tions comparative and superlative for adjectives sin-gular and plural for nouns and past perfective con-tinuous persistence (denotes the persistent relevance ofthe predicate in sentences such as London is a big city) andperson3 (third person singular) for verbs

The semantics of derived words tend to be characterizedby idiosyncracies [70] that can be accompanied by formalidiosyncracies (eg business and resound) but often areformally unremarkable (eg heady with meanings as diverseas intoxicating exhilerating impetuous and prudent) Wetherefore paired derived words with their own content lex-omes as well as with a lexome for the derivational functionexpressed in the derivedwordWe implemented the followingderivational lexomes ordinal for ordinal numbers notfor negative un and in undo for reversative un other fornonnegative in and its allomorphs ion for ation ution itionand ion ee for ee agent for agents with er instrumentfor instruments with er impagent for impersonal agentswith er causer for words with er expressing causers (thedifferentiation of different semantic functions for er wasbased on manual annotation of the list of forms with er theassignment of a given form to a category was not informedby sentence context and hence can be incorrect) again forre and ness ity ism ist ic able ive ous ize enceful ish under sub self over out mis and dis for thecorresponding prefixes and suffixes This set-up of lexomes isinformed by the literature on the semantics of English wordformation [71ndash73] and targets affixal semantics irrespective ofaffixal forms (following [74])

This way of setting up the lexomes for the semantic vectorspace model illustrates an important aspect of the presentapproach namely that lexomes target what is understoodand not particular linguistic forms Lexomes are not unitsof form For example in the study of Geeraert et al [75]the forms die pass away and kick the bucket are all linkedto the same lexome die In the present study we have notattempted to identify idioms and likewise no attempt was

made to disambiguate homographs These are targets forfurther research

In the present study we did implement the classicaldistinction between inflection and word formation by rep-resenting inflected words with a content lexome for theirstem but derived words with a content lexome for the derivedword itself In this way the sentence scientists believe thatexposure to shortwave ultraviolet rays can cause blindnessis associated with the following lexomes scientist plbelieve persistence that exposure sg to shortwaveultraviolet ray can present cause blindness andness In our model sentences constitute the learning eventsie for each sentence we train the model to predict thelexomes in that sentence from the very same lexomes inthat sentence Sentences or for spoken corpora utterancesare more ecologically valid than windows of say two wordspreceeding and two words following a target word mdash theinterpretation of a word may depend on the company thatit keeps at long distances in a sentence We also did notremove any function words contrary to standard practicein distributional semantics (see Appendix C for a heatmapillustrating the kind of clustering observable for pronounsand prepositions) The inclusion of semantic vectors for bothaffixes and function words is necessitated by the generalframework of our model which addresses not only com-prehension but also production and which in the case ofcomprehension needs to move beyond lexical identificationas inflectional functions play an important role during theintegration of words in sentence and discourse

Only words with a frequency exceeding 8 occurrences inthe tasa corpus were assigned lexomes This threshold wasimposed in order to avoid excessive data sparseness whenconstructing the distributional vector space The number ofdifferent lexomes that met the criterion for inclusion was23562

We constructed a semantic vector space by training anndl network on the tasa corpus using the ndl2 packagefor R [76] Weights on lexome-to-lexome connections wererecalibrated sentence by sentence in the order in which theyappear in the tasa corpus using the learning rule of ndl(ie a simplified version of the learning rule of Rescorlaand Wagner [43] that has only two free parameters themaximum amount of learning 120582 (set to 1) and a learning rate120588 (set to 0001) This resulted in a 23562 times 23562 matrixSentence-based training keeps the carbon footprint of themodel down as the number of learning events is restrictedto the number of utterances (approximately 750000) ratherthan the number of word tokens (approximately 10000000)The row vectors of the resulting matrix henceforth S are thesemantic vectors that we will use in the remainder of thisstudy (work is in progress to further enhance the S matrixby including word sense disambiguation and named entityrecognition when setting up lexomes The resulting lexomicversion of the TASA corpus and scripts (in python as wellas in R) for deriving the S matrix will be made available athttpwwwsfsuni-tuebingendesimhbaayen)

The weights on themain diagonal of thematrix tend to behigh unsurprisingly as in each sentence on which the modelis trained each of the words in that sentence is an excellent

8 Complexity

predictor of that sameword occurring in that sentenceWhenthe focus is on semantic similarity it is useful to set the maindiagonal of this matrix to zero However for more generalassociation strengths between words the diagonal elementsare informative and should be retained

Unlike standard models of distributional semantics wedid not carry out any dimension reduction using forinstance singular value decomposition Because we do notwork with latent variables each dimension of our semanticspace is given by a column vector of S and hence eachdimension is linked to a specific lexomeThus the rowvectorsof S specify for each of the lexomes how well this lexomediscriminates between all the lexomes known to the model

Although semantic vectors of length 23562 can providegood results vectors of this length can be challenging forstatistical evaluation Fortunately it turns out that many ofthe column vectors of S are characterized by extremely smallvariance Such columns can be removed from the matrixwithout loss of accuracy In practice we have found it sufficestoworkwith approximately 4 to 5 thousand columns selectedcalculating column variances and using only those columnswith a variance exceeding a threshold value

32 Validation of the Semantic Vector Space As shown byBaayen et al [59] and Milin et al [58 77] measures basedon matrices such as S are predictive for behavioral measuressuch as reaction times in the visual lexical decision taskas well as for self-paced reading latencies In what followswe first validate the semantic vectors of S on two data setsone data set with accuracies in paired associate learningand one dataset with semantic similarity ratings We thenconsidered specifically the validity of semantic vectors forinflectional and derivational functions by focusing first onthe correlational structure of the pertinent semantic vectorsfollowed by an examination of the predictivity of the semanticvectors for semantic plausibility and semantic transparencyratings for derived words

321 PairedAssociate Learning Thepaired associate learning(PAL) task is a widely used test in psychology for evaluatinglearning and memory Participants are given a list of wordpairs tomemorize Subsequently at testing they are given thefirst word and have to recall the second wordThe proportionof correctly recalled words is the accuracy measure on thePAL task Accuracy on the PAL test decreases with age whichhas been attributed to cognitive decline over the lifetimeHowever Ramscar et al [78] and Ramscar et al [79] provideevidence that the test actually measures the accumulationof lexical knowledge In what follows we use the data onPAL performance reported by desRosiers and Ivison [80]We fitted a linear mixed model to accuracy in the PAL taskas a function of the Pearson correlation 119903 of paired wordsrsquosemantic vectors in S (but with weights on the main diagonalincluded and using the 4275 columns with highest variance)with random intercepts for word pairs sex and age as controlvariables and crucially an interaction of 119903 by age Given thefindings of Ramscar and colleagues we expect to find that theslope of 119903 (which is always negative) as a predictor of PAL

00 01 02 03 04 05r

010

2030

4050

simila

rity

ratin

g

Figure 1 Similarity rating in the MEN dataset as a function of thecorrelation 119903 between the row vectors in S of the words in the testpairs

accuracy increases with age (indicating decreasing accuracy)Table 1 shows that this prediction is born out For age group20ndash29 (the reference level of age) the slope for 119903 is estimatedat 031 For the next age level this slope is adjusted upward by012 and as age increases these upward adjustments likewiseincrease to 021 024 and 032 for age groups 40ndash49 50-59 and 60-69 respectively Older participants know theirlanguage better and hence are more sensitive to the semanticsimilarity (or lack thereof) of thewords thatmake up PAL testpairs For the purposes of the present study the solid supportfor 119903 as a predictor for PAL accuracy contributes to validatingthe row vectors of S as semantic vectors

322 Semantic Relatedness Ratings We also examined per-formance of S on the MEN test collection [81] that providesfor 3000 word pairs crowd sourced ratings of semanticsimilarity For 2267 word pairs semantic vectors are availablein S for both words Figure 1 shows that there is a nonlinearrelation between the correlation of wordsrsquo semantic vectorsin S and the ratings The plot shows as well that for lowcorrelations the variability in the MEN ratings is larger Wefitted a Gaussian location scale additive model to this dataset summarized in Table 2 which supported 119903 as a predictorfor both mean MEN rating and the variability in the MENratings

To put this performance in perspective wecollected the latent semantic analysis (LSA) similarityscores for the MEN word pairs using the website athttplsacoloradoedu The Spearman correlationfor LSA scores andMEN ratings was 0697 and that for 119903 was0704 (both 119901 lt 00001) Thus our semantic vectors performon a par with those of LSA a well-established older techniquethat still enjoys wide use in psycholinguistics Undoubtedlyoptimized techniques from computational linguistics such asword2vec will outperform our model The important pointhere is that even with training on full sentences rather thanusing small windows and even when including function

Complexity 9

Table 1 Linear mixed model fit to paired associate learning scores with the correlation 119903 of the row vectors in S of the paired words aspredictor Treatment coding was used for factors For the youngest age group 119903 is not predictive but all other age groups show increasinglylarge slopes compared to the slope of the youngest age group The range of 119903 is [minus488 minus253] hence larger values for the coefficients of 119903 andits interactions imply worse performance on the PAL task

A parametric coefficients Estimate Std Error t-value p-valueintercept 36220 06822 53096 lt 00001119903 03058 01672 18293 00691age=39 02297 01869 12292 02207age=49 04665 01869 24964 00135age=59 06078 01869 32528 00014age=69 08029 01869 42970 lt 00001sex=male -01074 00230 -46638 lt 00001119903age=39 01167 00458 25490 00117119903age=49 02090 00458 45640 lt 00001119903age=59 02463 00458 53787 lt 00001119903age=69 03239 00458 70735 lt 00001B smooth terms edf Refdf F-value p-valuerandom intercepts word pair 178607 180000 1282283 lt 00001

Table 2 Summary of a Gaussian location scale additive model fitted to the similarity ratings in the MEN dataset with as predictor thecorrelation 119903 of wordrsquos semantic vectors in the Smatrix TPRS thin plate regression spline b theminimum standard deviation for the logb linkfunction Location parameter estimating the mean scale parameter estimating the variance through the logb link function (120578 = log(120590 minus 119887))A parametric coefficients Estimate Std Error t-value p-valueIntercept [location] 252990 01799 1406127 lt 00001Intercept [scale] 21297 00149 1430299 lt 00001B smooth terms edf Refdf F-value p-valueTPRS 119903 [location] 83749 88869 27662399 lt 00001TPRS 119903 [scale] 59333 70476 875384 lt 00001

words performance of our linear network with linguisticallymotivated lexomes is sufficiently high to serve as a basis forfurther analysis

323 Correlational Structure of Inflectional and DerivationalVectors Now that we have established that the semantic vec-tor space S obtainedwith perhaps the simplest possible error-driven learning algorithm discriminating between outcomesgiven multiple cues (1) indeed captures semantic similaritywe next consider how the semantic vectors of inflectional andderivational lexomes cluster in this space Figure 2 presents aheatmap for the correlation matrix of the functional lexomesthat we implemented as described in Section 31

Nominal and verbal inflectional lexomes as well as adver-bial ly cluster in the lower left corner and tend to be eithernot correlated with derivational lexomes (indicated by lightyellow) or to be negatively correlated (more reddish colors)Some inflectional lexomes however are interspersed withderivational lexomes For instance comparative superla-tive and perfective form a small subcluster togetherwithin the group of derivational lexomes

The derivational lexomes show for a majority of pairssmall positive correlationsand stronger correlations in thecase of the verb-forming derivational lexomes mis over

under and undo which among themselves show thevstrongest positive correlations of allThe inflectional lexomescreating abstract nouns ion ence ity and ness also form asubcluster In other words Figure 2 shows that there is somestructure to the distribution of inflectional and derivationallexomes in the semantic vector space

Derived words (but not inflected words) have their owncontent lexomes and hence their own row vectors in S Do thesemantic vectors of these words cluster according to the theirformatives To address this question we extracted the rowvectors from S for a total of 3500 derivedwords for 31 differentformatives resulting in a 3500 times 23562 matrix 119863 sliced outof S To this matrix we added the semantic vectors for the 31formatives We then used linear discriminant analysis (LDA)to predict a lexomersquos formative from its semantic vector usingthe lda function from the MASS package [82] (for LDA towork we had to remove columns from D with very smallvariance as a consequence the dimension of the matrix thatwas the input to LDA was 3531 times 814) LDA accuracy for thisclassification task with 31 possible outcomes was 072 All 31derivational lexomes were correctly classified

When the formatives are randomly permuted thus break-ing the relation between the semantic vectors and theirformatives LDA accuracy was on average 0528 (range across

10 Complexity

SGPR

ESEN

TPE

RSIS

TEN

CE PLCO

NTI

NU

OU

SPA

ST LYG

ENPE

RSO

N3

PASS

IVE

CAN

FUTU

REU

ND

OU

ND

ERO

VER MIS

ISH

SELF IZ

EIN

STRU

MEN

TD

IS IST

OU

GH

T EE ISM

SUB

COM

PARA

TIV

ESU

PERL

ATIV

EPE

RFEC

TIV

EPE

RSO

N1

ION

ENCE IT

YN

ESS Y

SHA

LLO

RDIN

AL

OU

TAG

AIN

FUL

LESS

ABL

EAG

ENT

MEN

T ICO

US

NO

TIV

E

SGPRESENTPERSISTENCEPLCONTINUOUSPASTLYGENPERSON3PASSIVECANFUTUREUNDOUNDEROVERMISISHSELFIZEINSTRUMENTDISISTOUGHTEEISMSUBCOMPARATIVESUPERLATIVEPERFECTIVEPERSON1IONENCEITYNESSYSHALLORDINALOUTAGAINFULLESSABLEAGENTMENTICOUSNOTIVE

minus02 0 01 02Value

020

060

010

00Color Key

and Histogram

Cou

nt

Figure 2 Heatmap for the correlation matrix of the row vectors in S of derivational and inflectional function lexomes (individual words willbecome legible by zooming in on the figure at maximal magnification)

10 permutations 0519ndash0537) From this we conclude thatderived words show more clustering in the semantic space ofS than can be expected under randomness

How are the semantic vectors of derivational lexomespositioned with respect to the clusters of their derived wordsTo address this question we constructed heatmaps for thecorrelation matrices of the pertinent semantic vectors Anexample of such a heatmap is presented for ness in Figure 3Apart from the existence of clusters within the cluster of nesscontent lexomes it is striking that the ness lexome itself isfound at the very left edge of the dendrogram and at the very

left column and bottom row of the heatmapThe color codingindicates that surprisingly the ness derivational lexome isnegatively correlated with almost all content lexomes thathave ness as formative Thus the semantic vector of nessis not a prototype at the center of its cloud of exemplarsbut an antiprototype This vector is close to the cloud ofsemantic vectors but it is outside its periphery This patternis not specific to ness but is found for the other derivationallexomes as well It is intrinsic to our model

The reason for this is straightforward During learningalthough for a derived wordrsquos lexome 119894 and its derivational

Complexity 11

NES

Ssle

eple

ssne

ssas

sert

iven

ess

dire

ctne

ssso

undn

ess

bign

ess

perm

issiv

enes

sap

prop

riate

ness

prom

ptne

sssh

rew

dnes

sw

eakn

ess

dark

ness

wild

erne

ssbl

ackn

ess

good

ness

kind

ness

lone

lines

sfo

rgiv

enes

ssti

llnes

sco

nsci

ousn

ess

happ

ines

ssic

knes

saw

aren

ess

blin

dnes

sbr

ight

ness

wild

ness

mad

ness

grea

tnes

sso

ftnes

sth

ickn

ess

tend

erne

ssbi

ttern

ess

effec

tiven

ess

hard

ness

empt

ines

sw

illin

gnes

sfo

olish

ness

eage

rnes

sne

atne

sspo

liten

ess

fresh

ness

high

ness

nerv

ousn

ess

wet

ness

unea

sines

slo

udne

ssco

olne

ssid

lene

ssdr

ynes

sbl

uene

ssda

mpn

ess

fond

ness

liken

ess

clean

lines

ssa

dnes

ssw

eetn

ess

read

ines

scle

vern

ess

noth

ingn

ess

cold

ness

unha

ppin

ess

serio

usne

ssal

ertn

ess

light

ness

uglin

ess

flatn

ess

aggr

essiv

enes

sse

lfminusco

nsci

ousn

ess

ruth

lessn

ess

quie

tnes

sva

guen

ess

shor

tnes

sbu

sines

sho

lines

sop

enne

ssfit

ness

chee

rful

ness

earn

estn

ess

right

eous

ness

unpl

easa

ntne

ssqu

ickn

ess

corr

ectn

ess

disti

nctiv

enes

sun

cons

ciou

snes

sfie

rcen

ess

orde

rline

ssw

hite

ness

talln

ess

heav

ines

sw

icke

dnes

sto

ughn

ess

lawle

ssne

ssla

zine

ssnu

mbn

ess

sore

ness

toge

ther

ness

resp

onsiv

enes

scle

arne

ssne

arne

ssde

afne

ssill

ness

dizz

ines

sse

lfish

ness

fatn

ess

rest

lessn

ess

shyn

ess

richn

ess

thor

ough

ness

care

less

ness

tired

ness

fairn

ess

wea

rines

stig

htne

ssge

ntle

ness

uniq

uene

ssfr

iend

lines

sva

stnes

sco

mpl

eten

ess

than

kful

ness

slow

ness

drun

kenn

ess

wre

tche

dnes

sfr

ankn

ess

exac

tnes

she

lples

snes

sat

trac

tiven

ess

fulln

ess

roug

hnes

sm

eann

ess

hope

lessn

ess

drow

sines

ssti

ffnes

sclo

sene

ssgl

adne

ssus

eful

ness

firm

ness

rude

ness

bold

ness

shar

pnes

sdu

llnes

sw

eigh

tless

ness

stra

ngen

ess

smoo

thne

ss

NESSsleeplessnessassertivenessdirectnesssoundnessbignesspermissivenessappropriatenesspromptnessshrewdnessweaknessdarknesswildernessblacknessgoodnesskindnesslonelinessforgivenessstillnessconsciousnesshappinesssicknessawarenessblindnessbrightnesswildnessmadnessgreatnesssoftnessthicknesstendernessbitternesseffectivenesshardnessemptinesswillingnessfoolishnesseagernessneatnesspolitenessfreshnesshighnessnervousnesswetnessuneasinessloudnesscoolnessidlenessdrynessbluenessdampnessfondnesslikenesscleanlinesssadnesssweetnessreadinessclevernessnothingnesscoldnessunhappinessseriousnessalertnesslightnessuglinessflatnessaggressivenessselfminusconsciousnessruthlessnessquietnessvaguenessshortnessbusinessholinessopennessfitnesscheerfulnessearnestnessrighteousnessunpleasantnessquicknesscorrectnessdistinctivenessunconsciousnessfiercenessorderlinesswhitenesstallnessheavinesswickednesstoughnesslawlessnesslazinessnumbnesssorenesstogethernessresponsivenessclearnessnearnessdeafnessillnessdizzinessselfishnessfatnessrestlessnessshynessrichnessthoroughnesscarelessnesstirednessfairnesswearinesstightnessgentlenessuniquenessfriendlinessvastnesscompletenessthankfulnessslownessdrunkennesswretchednessfranknessexactnesshelplessnessattractivenessfullnessroughnessmeannesshopelessnessdrowsinessstiffnessclosenessgladnessusefulnessfirmnessrudenessboldnesssharpnessdullnessweightlessnessstrangenesssmoothness

Color Keyand Histogram

0 05minus05

Value

020

0040

0060

0080

00C

ount

Figure 3 Heatmap for the correlation matrix of lexomes for content words with ness as well as the derivational lexome of ness itself Thislexome is found at the very left edge of the dendrograms and is negatively correlated with almost all content lexomes (Individual words willbecome legible by zooming in on the figure at maximal magnification)

lexome ness are co-present cues the derivational lexomeoccurs in many other words 119895 and each time anotherword 119895 is encountered weights are reduced from ness to119894 As this happens for all content lexomes the derivationallexome is during learning slowly but steadily discriminatedaway from its content lexomes We shall see that this is animportant property for our model to capture morphologicalproductivity for comprehension and speech production

When the additive model of Mitchell and Lapata [67]is used to construct a semantic vector for ness ie when

the average vector is computed for the vectors obtained bysubtracting the vector of the derived word from that of thebase word the result is a vector that is embedded insidethe cluster of derived vectors and hence inherits semanticidiosyncracies from all these derived words

324 Semantic Plausibility and Transparency Ratings forDerived Words In order to obtain further evidence for thevalidity of inflectional and derivational lexomes we re-analyzed the semantic plausibility judgements for word pairs

12 Complexity

consisting of a base and a novel derived form (eg accentaccentable) reported byMarelli and Baroni [60] and availableat httpcliccimecunitnitcomposesFRACSSThe number of pairs available to us for analysis 236 wasrestricted compared to the original 2559 pairs due to theconstraints that the base words had to occur in the tasacorpus Furthermore we also explored the role of wordsrsquoemotional valence arousal and dominance as availablein Warriner et al [83] which restricted the number ofitems even further The reason for doing so is that humanjudgements such as those of age of acquisition may reflectdimensions of emotion [59]

A measure derived from the semantic vectors of thederivational lexomes activation diversity and the L1-normof the semantic vector (the sum of the absolute values of thevector elements ie its city-block distance) turned out tobe predictive for these plausibility ratings as documented inTable 3 and the left panel of Figure 4 Activation diversity is ameasure of lexicality [58] In an auditory word identificationtask for instance speech that gives rise to a low activationdiversity elicits fast rejections whereas speech that generateshigh activation diversity elicits higher acceptance rates but atthe cost of longer response times [18]

Activation diversity interacted with word length Agreater word length had a strong positive effect on ratedplausibility but this effect progressively weakened as acti-vation diversity increases In turn activation diversity hada positive effect on plausibility for shorter words and anegative effect for longer words Apparently the evidencefor lexicality that comes with higher L1-norms contributespositively when there is little evidence coming from wordlength As word length increases and the relative contributionof the formative in theword decreases the greater uncertaintythat comes with higher lexicality (a greater L1-norm impliesmore strong links to many other lexomes) has a detrimentaleffect on the ratings Higher arousal scores also contributedto higher perceived plausibility Valence and dominance werenot predictive and were therefore left out of the specificationof themodel reported here (word frequency was not includedas predictor because all derived words are neologisms withzero frequency addition of base frequency did not improvemodel fit 119901 gt 091)

The degree to which a complex word is semanticallytransparent with respect to its base word is of both practi-cal and theoretical interest An evaluation of the semantictransparency of complex words using transformation matri-ces is developed in Marelli and Baroni [60] Within thepresent framework semantic transparency can be examinedstraightforwardly by comparing the correlations of (i) thesemantic vectors of the base word to which the semanticvector of the affix is added with (ii) the semantic vector ofthe derived word itself The more distant the two vectors arethe more negative their correlation should be This is exactlywhat we find For ness for instance the six most negativecorrelations are business (119903 = minus066) effectiveness (119903 = minus051)awareness (119903 = minus050) loneliness (119903 = minus045) sickness (119903 =minus044) and consciousness (119903 = minus043) Although ness canhave an anaphoric function in discourse [84] words such asbusiness and consciousness have a much deeper and richer

semantics than just reference to a previously mentioned stateof affairs A simple comparison of the wordrsquos actual semanticvector in S (derived from the TASA corpus) and its semanticvector obtained by accumulating evidence over base and affix(ie summing the vectors of base and affix) brings this outstraightforwardly

However evaluating correlations for transparency isprone to researcher bias We therefore also investigated towhat extent the semantic transparency ratings collected byLazaridou et al [68] can be predicted from the activationdiversity of the semantic vector of the derivational lexomeThe original dataset of Lazaridou and colleagues comprises900 words of which 330 meet the criterion of occurringmore than 8 times in the tasa corpus and for which wehave semantic vectors available The summary of a Gaussianlocation-scale additivemodel is given in Table 4 and the rightpanel of Figure 4 visualizes the interaction of word lengthby activation diversity Although modulated by word lengthoverall transparency ratings decrease as activation diversityis increased Apparently the stronger the connections aderivational lexome has with other lexomes the less clearit becomes what its actual semantic contribution to a novelderived word is In other words under semantic uncertaintytransparency ratings decrease

4 Comprehension

Now that we have established that the present semanticvectors make sense even though they are based on a smallcorpus we next consider a comprehension model that hasform vectors as input and semantic vectors as output (a pack-age for R implementing the comprehension and productionalgorithms of linear discriminative learning is available athttpwwwsfsuni-tuebingendesimhbaayenpublicationsWpmWithLdl 10targz The package isdescribed in Baayen et al [85]) We begin with introducingthe central concepts underlying mappings from form tomeaning We then discuss visual comprehension and thenturn to auditory comprehension

41 Setting Up the Mapping Let C denote the cue matrix amatrix that specifies for each word (rows) the form cues ofthat word (columns) For a toy lexicon with the words onetwo and three the Cmatrix is

C

= wV wVn Vn tu tu Tr Tri rionetwothree

( 100100

100010

010001

001001 ) (12)

where we use the disc keyboard phonetic alphabet (theldquoDIstinct SingleCharacterrdquo representationwith one characterfor each phoneme was introduced by CELEX [86]) fortriphones the symbol denotes the word boundary Supposethat the semantic vectors for these words are the row vectors

Complexity 13

26 27

28

29

29

3 3

31

31

32

32

33

33

34

34

35

35

3

3

35

4

45 36

38

40

42

44

Activ

atio

n D

iver

sity

of th

e Der

ivat

iona

l Lex

ome

8 10 12 14 16 186Word Length

36

38

40

42

44

Activ

atio

n D

iver

sity

of th

e Der

ivat

iona

l Lex

ome

6 8 10 12 144Word Length

Figure 4 Interaction of word length by activation diversity in GAMs fitted to plausibility ratings for complex words (left) and to semantictransparency ratings (right)

Table 3 GAM fitted to plausibility ratings for derivational neologisms with the derivational lexomes able again agent ist less and notdata from Marelli and Baroni [60]

A parametric coefficients Estimate Std Error t-value p-valueIntercept -18072 27560 -06557 05127Word Length 05901 02113 27931 00057Activation Diversity 11221 06747 16632 00976Arousal 03847 01521 25295 00121Word Length times Activation Diversity -01318 00500 -26369 00089B smooth terms edf Refdf F-value p-valueRandom Intercepts Affix 33610 40000 556723 lt 00001

of the following matrix S

S = one two threeonetwothree

( 100201031001

040110 ) (13)

We are interested in a transformation matrix 119865 such that

CF = S (14)

The transformation matrix is straightforward to obtain LetC1015840 denote the Moore-Penrose generalized inverse of Cavailable in R as the ginv function in theMASS package [82]Then

F = C1015840S (15)

For the present example

F =one two three

wVwVnVntutuTrTriri

((((((((((((

033033033010010003003003

010010010050050003003003

013013013005005033033033

))))))))))))

(16)

and for this simple example CF is exactly equal to SIn the remainder of this section we investigate how well

this very simple end-to-end model performs for visual wordrecognition as well as auditory word recognition For visualword recognition we use the semantic matrix S developed inSection 3 but we consider two different cue matrices C oneusing letter trigrams (following [58]) and one using phonetrigramsA comparison of the performance of the twomodels

14 Complexity

Table 4 Gaussian location-scale additive model fitted to the semantic transparency ratings for derived words te tensor product smooth ands thin plate regression spline

A parametric coefficients Estimate Std Error t-value p-valueintercept [location] 55016 02564 214537 lt 00001intercept [scale] -04135 00401 -103060 lt 00001B smooth terms edf Refdf F-value p-valuete(activation diversity word length) [location] 53879 62897 237798 00008s(derivational lexome) [location] 143180 150000 3806254 lt 00001s(activation diversity) [scale] 21282 25993 284336 lt 00001s(word length) [scale] 17722 21403 1327311 lt 00001

sheds light on the role of wordsrsquo ldquosound imagerdquo on wordrecognition in reading (cf [87ndash89] for phonological effectsin visual word Recognition) For auditory word recognitionwe make use of the acoustic features developed in Arnold etal [18]

Although the features that we selected as cues are basedon domain knowledge they do not have an ontological statusin our framework in contrast to units such as phonemes andmorphemes in standard linguistic theories In these theoriesphonemes and morphemes are Russelian atomic units offormal phonological and syntactic calculi and considerableresearch has been directed towards showing that these unitsare psychologically real For instance speech errors involvingmorphemes have been interpreted as clear evidence for theexistence in the mind of morphemes [90] Research hasbeen directed at finding behavioral and neural correlates ofphonemes and morphemes [6 91 92] ignoring fundamentalcriticisms of both the phoneme and the morpheme astheoretical constructs [25 93 94]

The features that we use for representing aspects of formare heuristic features that have been selected or developedprimarily because they work well as discriminators We willgladly exchange the present features for other features if theseother features can be shown to afford higher discriminativeaccuracy Nevertheless the features that we use are groundedin domain knowledge For instance the letter trigrams usedfor modeling reading (see eg[58]) are motivated by thefinding in stylometry that letter trigrams are outstandingfeatures for discriminating between authorial hands [95] (inwhat follows we work with letter triplets and triphoneswhich are basically contextually enriched letter and phoneunits For languages with strong phonotactic restrictionssuch that the syllable inventory is quite small (eg Viet-namese) digraphs work appear to work better than trigraphs[96] Baayen et al [85] show that working with four-gramsmay enhance performance and current work in progress onmany other languages shows that for highly inflecting lan-guages with long words 4-grams may outperform 3-gramsFor computational simplicity we have not experimented withmixtures of units of different length nor with algorithmswithwhich such units might be determined) Letter n-grams arealso posited by Cohen and Dehaene [97] for the visual wordform system at the higher endof a hierarchy of neurons tunedto increasingly large visual features of words

Triphones the units that we use for representing thelsquoacoustic imagersquo or lsquoauditory verbal imageryrsquo of canonicalword forms have the same discriminative potential as lettertriplets but have as additional advantage that they do betterjustice compared to phonemes to phonetic contextual inter-dependencies such as plosives being differentiated primarilyby formant transitions in adjacent vowels The acousticfeatures that we use for modeling auditory comprehensionto be introduced in more detail below are motivated in partby the sensitivity of specific areas on the cochlea to differentfrequencies in the speech signal

Evaluation of model predictions proceeds by comparingthe predicted semantic vector s obtained by multiplying anobserved cue vector c with the transformation matrix F (s =cF) with the corresponding target row vector s of S A word 119894is counted as correctly recognized when s119894 is most stronglycorrelated with the target semantic vector s119894 of all targetvectors s119895 across all words 119895

Inflected words do not have their own semantic vectors inS We therefore created semantic vectors for inflected wordsby adding the semantic vectors of stem and affix and addedthese as additional row vectors to S before calculating thetransformation matrix F

We note here that there is only one transformationmatrix ie one discrimination network that covers allaffixes inflectional and derivational as well as derived andmonolexomic (simple) words This approach contrasts withthat of Marelli and Baroni [60] who pair every affix with itsown transformation matrix

42 Visual Comprehension For visual comprehension wefirst consider a model straightforwardly mapping form vec-tors onto semantic vectors We then expand the model withan indirect route first mapping form vectors onto the vectorsfor the acoustic image (derived from wordsrsquo triphone repre-sentations) and thenmapping the acoustic image vectors ontothe semantic vectors

The dataset on which we evaluated our models com-prised 3987 monolexomic English words 6595 inflectedvariants of monolexomic words and 898 derived words withmonolexomic base words to a total of 11480 words Thesecounts follow from the simultaneous constraints of (i) a wordappearing with sufficient frequency in TASA (ii) the wordbeing available in the British Lexicon Project (BLP [98]) (iii)

Complexity 15

the word being available in the CELEX database and theword being of the abovementioned morphological type Inthe present study we did not include inflected variants ofderived words nor did we include compounds

421 The Direct Route Straight from Orthography to Seman-tics To model single word reading as gauged by the visuallexical decision task we used letter trigrams as cues In ourdataset there were a total of 3465 unique trigrams resultingin a 11480 times 3465 orthographic cue matrix C The semanticvectors of the monolexomic and derived words were takenfrom the semantic weight matrix described in Section 3 Forinflected words semantic vectors were obtained by summa-tion of the semantic vectors of base words and inflectionalfunctions For the resulting semantic matrix S we retainedthe 5030 column vectors with the highest variances settingthe cutoff value for the minimal variance to 034times10minus7 FromS and C we derived the transformation matrix F which weused to obtain estimates s of the semantic vectors s in S

For 59 of the words s had the highest correlation withthe targeted semantic vector s (for the correctly predictedwords the mean correlation was 083 and the median was086 With respect to the incorrectly predicted words themean and median correlation were both 048) The accuracyobtained with naive discriminative learning using orthog-onal lexomes as outcomes instead of semantic vectors was27Thus as expected performance of linear discriminationlearning (ldl) is substantially better than that of naivediscriminative learning (ndl)

To assess whether this model is productive in the sensethat it canmake sense of novel complex words we considereda separate set of inflected and derived words which were notincluded in the original dataset For an unseen complexwordboth base word and the inflectional or derivational functionappeared in the training set but not the complex word itselfThe network F therefore was presented with a novel formvector which it mapped onto a novel vector in the semanticspace Recognition was successful if this novel vector wasmore strongly correlated with the semantic vector obtainedby summing the semantic vectors of base and inflectional orderivational function than with any other semantic vector

Of 553 unseen inflected words 43 were recognized suc-cessfully The semantic vectors predicted from their trigramsby the network F were overall well correlated in the meanwith the targeted semantic vectors of the novel inflectedwords (obtained by summing the semantic vectors of theirbase and inflectional function) 119903 = 067 119901 lt 00001The predicted semantic vectors also correlated well with thesemantic vectors of the inflectional functions (119903 = 061 119901 lt00001) For the base words the mean correlation dropped to119903 = 028 (119901 lt 00001)

For unseen derivedwords (514 in total) we also calculatedthe predicted semantic vectors from their trigram vectorsThe resulting semantic vectors s had moderate positivecorrelations with the semantic vectors of their base words(119903 = 040 119901 lt 00001) but negative correlations withthe semantic vectors of their derivational functions (119903 =minus013 119901 lt 00001) They did not correlate with the semantic

vectors obtained by summation of the semantic vectors ofbase and affix (119903 = 001 119901 = 056)

The reduced correlations for the derived words as com-pared to those for the inflected words likely reflects tosome extent that the model was trained on many moreinflected words (in all 6595) than derived words (898)whereas the number of different inflectional functions (7)was much reduced compared to the number of derivationalfunctions (24) However the negative correlations of thederivational semantic vectors with the semantic vectors oftheir derivational functions fit well with the observation inSection 323 that derivational functions are antiprototypesthat enter into negative correlations with the semantic vectorsof the corresponding derived words We suspect that theabsence of a correlation of the predicted vectors with thesemantic vectors obtained by integrating over the vectorsof base and derivational function is due to the semanticidiosyncracies that are typical for derivedwords For instanceausterity can denote harsh discipline or simplicity or apolicy of deficit cutting or harshness to the taste Anda worker is not just someone who happens to work butsomeone earning wages or a nonreproductive bee or waspor a thread in a computer program Thus even within theusages of one and the same derived word there is a lotof semantic heterogeneity that stands in stark contrast tothe straightforward and uniform interpretation of inflectedwords

422 The Indirect Route from Orthography via Phonology toSemantics Although reading starts with orthographic inputit has been shown that phonology actually plays a role duringthe reading process as well For instance developmentalstudies indicate that childrenrsquos reading development can bepredicted by their phonological abilities [87 99] Evidence ofthe influence of phonology on adultsrsquo reading has also beenreported [89 100ndash105] As pointed out by Perrone-Bertolottiet al [106] silent reading often involves an imagery speechcomponent we hear our own ldquoinner voicerdquo while readingSince written words produce a vivid auditory experiencealmost effortlessly they argue that auditory verbal imageryshould be part of any neurocognitive model of readingInterestingly in a study using intracranial EEG recordingswith four epileptic neurosurgical patients they observed thatsilent reading elicits auditory processing in the absence ofany auditory stimulation suggesting that auditory images arespontaneously evoked during reading (see also [107])

To explore the role of auditory images in silent readingwe operationalized sound images bymeans of phone trigrams(in our model phone trigrams are independently motivatedas intermediate targets of speech production see Section 5for further discussion) We therefore ran the model againwith phone trigrams instead of letter triplets The semanticmatrix S was exactly the same as above However a newtransformation matrix F was obtained for mapping a 11480times 5929 cue matrix C of phone trigrams onto S To our sur-prise the triphone model outperformed the trigram modelsubstantially Overall accuracy rose to 78 an improvementof almost 20

16 Complexity

In order to integrate this finding into our model weimplemented an additional network K that maps ortho-graphic vectors (coding the presence or absence of trigrams)onto phonological vectors (indicating the presence or absenceof triphones) The result is a dual route model for (silent)reading with a first route utilizing a network mappingstraight from orthography to semantics and a secondindirect route utilizing two networks one mapping fromtrigrams to triphones and the other subsequently mappingfrom triphones to semantics

The network K which maps trigram vectors onto tri-phone vectors provides good support for the relevant tri-phones but does not provide guidance as to their orderingUsing a graph-based algorithm detailed in the Appendix (seealso Section 52) we found that for 92 of the words thecorrect sequence of triphones jointly spanning the wordrsquossequence of letters received maximal support

We then constructed amatrix Twith the triphone vectorspredicted from the trigrams by networkK and defined a newnetwork H mapping these predicted triphone vectors ontothe semantic vectors SThemean correlation for the correctlyrecognized words was 090 and the median correlation was094 For words that were not recognized correctly the meanand median correlation were 045 and 043 respectively Wenow have for each word two estimated semantic vectors avector s1 obtained with network F of the first route and avector s2 obtained with network H from the auditory targetsthat themselves were predicted by the orthographic cuesThemean of the correlations of these pairs of vectors was 119903 = 073(119901 lt 00001)

To assess to what extent the networks that we definedare informative for actual visual word recognition we madeuse of the reaction times in the visual lexical decision taskavailable in the BLPAll 11480words in the current dataset arealso included in BLPWe derived threemeasures from each ofthe two networks

The first measure is the sum of the L1-norms of s1 ands2 to which we will refer as a wordrsquos total activation diversityThe total activation diversity is an estimate of the support fora wordrsquos lexicality as provided by the two routes The secondmeasure is the correlation of s1 and s2 to which we will referas a wordrsquos route congruency The third measure is the L1-norm of a lexomersquos column vector in S to which we referas its prior This last measure has previously been observedto be a strong predictor for lexical decision latencies [59] Awordrsquos prior is an estimate of a wordrsquos entrenchment and prioravailability

We fitted a generalized additive model to the inverse-transformed response latencies in the BLP with these threemeasures as key covariates of interest including as controlpredictors word length andword type (derived inflected andmonolexomic with derived as reference level) As shown inTable 5 inflected words were responded to more slowly thanderived words and the same holds for monolexomic wordsalbeit to a lesser extent Response latencies also increasedwith length Total activation diversity revealed a U-shapedeffect (Figure 5 left panel) For all but the lowest totalactivation diversities we find that response times increase as

total activation diversity increases This result is consistentwith previous findings for auditory word recognition [18]As expected given the results of Baayen et al [59] the priorwas a strong predictor with larger priors affording shorterresponse times There was a small effect of route congruencywhich emerged in a nonlinear interaction with the priorsuch that for large priors a greater route congruency affordedfurther reduction in response time (Figure 5 right panel)Apparently when the two routes converge on the samesemantic vector uncertainty is reduced and a faster responsecan be initiated

For comparison the dual-route model was implementedwith ndl as well Three measures were derived from thenetworks and used to predict the same response latenciesThese measures include activation activation diversity andprior all of which have been reported to be reliable predictorsfor RT in visual lexical decision [54 58 59] However sincehere we are dealing with two routes the three measures canbe independently derived from both routes We thereforesummed up the measures of the two routes obtaining threemeasures total activation total activation diversity and totalprior With word type and word length as control factorsTable 6 shows that thesemeasures participated in a three-wayinteraction presented in Figure 6 Total activation showed aU-shaped effect on RT that is increasingly attenuated as totalactivation diversity is increased (left panel) Total activationalso interacted with the total prior (right panel) For mediumrange values of total activation RTs increased with totalactivation and as expected decreased with the total priorThecenter panel shows that RTs decrease with total prior formostof the range of total activation diversity and that the effect oftotal activation diversity changes sign going from low to highvalues of the total prior

Model comparison revealed that the generalized additivemodel with ldl measures (Table 5) provides a substantiallyimproved fit to the data compared to the GAM using ndlmeasures (Table 6) with a difference of no less than 65101AIC units At the same time the GAM based on ldl is lesscomplex

Given the substantially better predictability of ldl mea-sures on human behavioral data one remaining question iswhether it is really the case that two routes are involved insilent reading After all the superior accuracy of the secondroute might be an artefact of a simple machine learningtechnique performing better for triphones than for trigramsThis question can be addressed by examining whether thefit of the GAM summarized in Table 5 improves or worsensdepending on whether the activation diversity of the first orthe second route is taken out of commission

When the GAM is provided access to just the activationdiversity of s2 the AIC increased by 10059 units Howeverwhen themodel is based only on the activation diversity of s1themodel fit increased by no less than 14790 AIC units Fromthis we conclude that at least for the visual lexical decisionlatencies in the BLP the second route first mapping trigramsto triphones and then mapping triphones onto semanticvectors plays the more important role

The superiority of the second route may in part bedue to the number of triphone features being larger

Complexity 17

minus0295

minus00175

026

part

ial e

ffect

1 2 3 4 50Total activation diversity

000

005

010

015

Part

ial e

ffect

05

10

15

20

Prio

r

02 04 06 08 1000Selfminuscorrelation

Figure 5 The partial effects of total activation diversity (left) and the interaction of route congruency and prior (right) on RT in the BritishLexicon Project

Table 5 Summary of a generalized additive model fitted to response latencies in visual lexical decision using measures based on ldl s thinplate regression spline smooth te tensor product smooth

A parametric coefficients Estimate Std Error t-value p-valueintercept -17774 00087 -205414 lt0001word typeInflected 01110 00059 18742 lt0001word typeMonomorphemic 00233 00054 4322 lt0001word length 00117 00011 10981 lt0001B smooth terms edf Refdf F p-values(total activation diversity) 6002 7222 236 lt0001te(route congruency prior) 14673 17850 2137 lt0001

02 06 10 14

NDL total activation

minus0171

0031

0233

part

ial e

ffect

minus4 minus2 0 2 4

10

15

20

25

30

35

40

NDL total prior

ND

L to

tal a

ctiv

atio

n di

vers

ity

minus0416

01915

0799

part

ial e

ffect

02 06 10 14

minus4

minus2

02

4

NDL total activation

ND

L to

tal p

rior

minus0158

01635

0485

part

ial e

ffect

10

15

20

25

30

35

40

ND

L to

tal a

ctiv

atio

n di

vers

ity

Figure 6The interaction of total activation and total activation diversity (left) of total prior by total activation diversity (center) and of totalactivation by total prior (right) on RT in the British Lexicon Project based on ndl

18 Complexity

Table 6 Summary of a generalized additive model fitted to response latencies in visual lexical decision using measures from ndl te tensorproduct smooth

A parametric coefficients Estimate Std Error t-value p-valueintercept -16934 00086 -195933 lt0001word typeInflected 00286 00052 5486 lt0001word typeMonomorphemic 00042 00055 0770 = 4word length 00066 00011 5893 lt0001B smooth terms edf Refdf F p-valuete(activation activation diversity prior) 4134 4972 1084 lt0001

than the number of trigram features (3465 trigrams ver-sus 5929 triphones) More features which mathematicallyamount to more predictors enable more precise map-pings Furthermore the heavy use made in English ofletter sequences such as ough (with 10 different pro-nunciations httpswwwdictionarycomesough)reduces semantic discriminability compared to the corre-sponding spoken forms It is noteworthy however that thebenefits of the triphone-to-semantics mapping are possibleonly thanks to the high accuracy with which orthographictrigram vectors are mapped onto phonological triphonevectors (92)

It is unlikely that the lsquophonological routersquo is always dom-inant in silent reading Especially in fast lsquodiagonalrsquo readingthe lsquodirect routersquomay bemore dominantThere is remarkablealthough for the present authors unexpected convergencewith the dual route model for reading aloud of Coltheartet al [108] and Coltheart [109] However while the text-to-phonology route of their model has as primary function toexplain why nonwords can be pronounced our results showthat both routes can actually be active when silently readingreal words A fundamental difference is however that in ourmodel wordsrsquo semantics play a central role

43 Auditory Comprehension For the modeling of readingwe made use of letter trigrams as cues These cues abstractaway from the actual visual patterns that fall on the retinapatterns that are already transformed at the retina beforebeing sent to the visual cortex Our hypothesis is that lettertrigrams represent those high-level cells or cell assemblies inthe visual system that are critical for reading andwe thereforeleave the modeling possibly with deep learning networksof how patterns on the retina are transformed into lettertrigrams for further research

As a consequence of the high level of abstraction of thetrigrams a word form is represented by a unique vectorspecifying which of a fixed set of letter trigrams is present inthe word Although one might consider modeling auditorycomprehension with phone triplets (triphones) replacing theletter trigrams of visual comprehension such an approachwould not do justice to the enormous variability of actualspeech Whereas the modeling of reading printed wordscan depart from the assumption that the pixels of a wordrsquosletters on a computer screen are in a fixed configurationindependently of where the word is shown on the screenthe speech signal of the same word type varies from token

to token as illustrated in Figure 7 for the English wordcrisis A survey of the Buckeye corpus [110] of spontaneousconversations recorded at Columbus Ohio [111] indicatesthat around 5 of the words are spoken with one syllablemissing and that a little over 20 of words have at least onephone missing

It is widely believed that the phoneme as an abstractunit of sound is essential for coming to grips with thehuge variability that characterizes the speech signal [112ndash114]However the phoneme as theoretical linguistic constructis deeply problematic [93] and for many spoken formscanonical phonemes do not do justice to the phonetics ofthe actual sounds [115] Furthermore if words are defined assequences of phones the problem arises what representationsto posit for words with two ormore reduced variants Addingentries for reduced forms to the lexicon turns out not toafford better overal recognition [116] Although exemplarmodels have been put forward to overcome this problem[117] we take a different approach here and following Arnoldet al [18] lay out a discriminative approach to auditorycomprehension

The cues that we make use of to represent the acousticsignal are the Frequency Band Summary Features (FBSFs)introduced by Arnold et al [18] as input cues FBSFs summa-rize the information present in the spectrogram of a speechsignal The algorithm that derives FBSFs first chunks theinput at the minima of the Hilbert amplitude envelope of thesignalrsquos oscillogram (see the upper panel of Figure 8) Foreach chunk the algorithm distinguishes 21 frequency bandsin the MEL scaled spectrum and intensities are discretizedinto 5 levels (lower panel in Figure 8) for small intervalsof time For each chunk and for each frequency band inthese chunks a discrete feature is derived that specifies chunknumber frequency band number and a description of thetemporal variation in the band bringing together minimummaximum median initial and final intensity values The21 frequency bands are inspired by the 21 receptive areason the cochlear membrane that are sensitive to differentranges of frequencies [118] Thus a given FBSF is a proxyfor cell assemblies in the auditory cortex that respond to aparticular pattern of changes over time in spectral intensityThe AcousticNDLCodeR package [119] for R [120] wasemployed to extract the FBSFs from the audio files

We tested ldl on 20 hours of speech sampled from theaudio files of the ucla library broadcast newsscapedata a vast repository of multimodal TV news broadcasts

Complexity 19

0 5000 10000 15000 20000 25000 30000 35000 40000minus6000

minus4000

minus2000

0

2000

4000

6000

8000

0 5000 10000 15000 20000 25000 30000 35000 40000minus6000

minus4000

minus2000

0

2000

4000

6000

8000

Figure 7Oscillogram for twodifferent realizations of theword crisiswith different degrees of reduction in theNewsScape archive [kraız](left)and [khraısıs](right)

00 01 02 03 04 05 06time

minus020minus015minus010minus005

000005010015

ampl

itude

1

21

frequ

ency

ban

d

Figure 8 Oscillogram of the speech signal for a realization of the word economic with Hilbert envelope (in orange) outlining the signal isshown on the top panel The lower panel depicts the discretized MEL scaled spectrum of the signalThe vertical bars are the boundary pointsthat partition the signal into 4 chunks For one chunk horizontal lines (in red) highlight one of the frequency bands for which the FBSFsprovide summaries of the variation over time in that frequency band The FBSF for the highlighted band is ldquoband11-start3-median3-min2-max5-end3-part2rdquo

provided to us by the Distributed Little Red Hen Lab Theaudio files of this resource were automatically classified asclean for relatively clean parts where there is speech withoutbackground noise or music and noisy for speech snippetswhere background noise or music is present Here we reportresults for 20 hours of clean speech to a total of 131673word tokens (representing 4779word types) with in all 40639distinct FBSFs

The FBSFs for the word tokens are brought together ina matrix C119886 with dimensions 131673 audio tokens times 40639FBSFs The targeted semantic vectors are taken from the Smatrix which is expanded to a matrix with 131673 rows onefor each audio token and 4609 columns the dimension of

the semantic vectors Although the transformation matrix Fcould be obtained by calculating C1015840S the calculation of C1015840is numerically expensive To reduce computational costs wecalculated F as follows (the transpose of a square matrix Xdenoted by X119879 is obtained by replacing the upper triangleof the matrix by the lower triangle and vice versa Thus( 3 87 2 )119879 = ( 3 78 2 ) For a nonsquare matrix the transpose isobtained by switching rows and columnsThus a 4times2 matrixbecomes a 2 times 4 matrix when transposed)

CF = S

C119879CF = C119879S

20 Complexity

(C119879C)minus1 (C119879C) F = (C119879C)minus1 C119879SF = (C119879C)minus1 C119879S

(17)

In this way matrix inversion is required for a much smallermatrixC119879C which is a square matrix of size 40639 times 40639

To evaluate model performance we compared the esti-mated semantic vectors with the targeted semantic vectorsusing the Pearson correlation We therefore calculated the131673 times 131673 correlation matrix for all possible pairs ofestimated and targeted semantic vectors Precisionwas calcu-lated by comparing predicted vectors with the gold standardprovided by the targeted vectors Recognition was defined tobe successful if the correlation of the predicted vector withthe targeted gold vector was the highest of all the pairwisecorrelations of this predicted vector with any of the other goldsemantic vectors Precision defined as the proportion of cor-rect recognitions divided by the total number of audio tokenswas at 3361 For the correctly identified words the meancorrelation was 072 and for the incorrectly identified wordsit was 055 To place this in perspective a naive discrimi-nation model with discrete lexomes as output performed at12 and a deep convolution network Mozilla DeepSpeech(httpsgithubcommozillaDeepSpeech based onHannun et al [9]) performed at 6 The low performanceofMozilla DeepSpeech is due primarily due to its dependenceon a languagemodelWhenpresentedwith utterances insteadof single words it performs remarkably well

44 Discussion A problem with naive discriminative learn-ing that has been puzzling for a long time is that measuresbased on ndl performed well as predictors of processingtimes [54 58] whereas accuracy for lexical decisions waslow An accuracy of 27 for a model performing an 11480-classification task is perhaps reasonable but the lack ofprecision is unsatisfactory when the goal is to model humanvisual lexicality decisions Bymoving fromndl to ldlmodelaccuracy is substantially improved (to 59) At the sametime predictions for reaction times improved considerablyas well As lexicality decisions do not require word identifica-tion further improvement in predicting decision behavior isexpected to be possible by considering not only whether thepredicted semantic is closest to the targeted vector but alsomeasures such as how densely the space around the predictedsemantic vector is populated

Accuracy for auditory comprehension is lower for thedata we considered above at around 33 Interestingly aseries of studies indicates that recognizing isolated wordstaken out of running speech is a nontrivial task also forhuman listeners [121ndash123] Correct identification by nativespeakers of 1000 randomly sampled word tokens from aGerman corpus of spontaneous conversational speech rangedbetween 21 and 44 [18] For both human listeners andautomatic speech recognition systems recognition improvesconsiderably when words are presented in their naturalcontext Given that ldl with FBSFs performs very wellon isolated word recognition it seems worth investigating

further whether the present approach can be developed into afully-fledged model of auditory comprehension that can takefull utterances as input For a blueprint of how we plan toimplement such a model see Baayen et al [124]

5 Speech Production

This section examines whether we can predict wordsrsquo formsfrom the semantic vectors of S If this is possible for thepresent dataset with reasonable accuracy we have a proofof concept that discriminative morphology is feasible notonly for comprehension but also for speech productionThe first subsection introduces the computational imple-mentation The next subsection reports on the modelrsquosperformance which is evaluated first for monomorphemicwords then for inflected words and finally for derivedwords Section 53 provides further evidence for the pro-duction network by showing that as the support from thesemantics for the triphones becomes weaker the amount oftime required for articulating the corresponding segmentsincreases

51 Computational Implementation For a productionmodelsome representational format is required for the output thatin turn drives articulation In what follows we make useof triphones as output features Triphones capture part ofthe contextual dependencies that characterize speech andthat render problematic the phoneme as elementary unit ofa phonological calculus [93] Triphones are in many waysnot ideal in that they inherit the limitations that come withdiscrete units Other output features structured along thelines of gestural phonology [125] or time series of movementsof key articulators registered with electromagnetic articulog-raphy or ultrasound are on our list for further explorationFor now we use triphones as a convenience construct andwe will show that given the support from the semantics forthe triphones the sequence of phones can be reconstructedwith high accuracy As some models of speech productionassemble articulatory targets from phone segments (eg[126]) the present implementation can be seen as a front-endfor this type of model

Before putting the model to the test we first clarify theway the model works by means of our toy lexicon with thewords one two and three Above we introduced the semanticmatrix S (13) which we repeat here for convenience

S = one two threeonetwothree

( 100201031001

040110 ) (18)

as well as an C indicator matrix specifying which triphonesoccur in which words (12) As in what follows this matrixspecifies the triphones targeted for productionwe henceforthrefer to this matrix as the Tmatrix

Complexity 21

T

= wV wVn Vn tu tu Tr Tri rionetwothree

( 100100

100010

010001

001001 ) (19)

For production our interest is in thematrixG that transformsthe row vectors of S into the row vectors of T ie we need tosolve

SG = T (20)

Given G we can predict for any semantic vector s the vectorof triphones t that quantifies the support for the triphonesprovided by s simply by multiplying s with G

t = sG (21)

As before the transformation matrix G is straightforwardto estimate Let S1015840 denote the Moore-Penrose generalizedinverse of S Since

S1015840SG = S1015840T (22)

we have

G = S1015840T (23)

Given G we can predict the triphone matrix T from thesemantic matrix S

SG = T (24)

For the present example the inverse of S and S1015840 is

S1015840 = one two threeonetwothree

( 110minus021minus009minus029107minus008

minus041minus002104 ) (25)

and the transformation matrix G is

G = wV wVn Vn tu tu Tr Tri ri

onetwothree

( 110minus021minus009110minus021minus009

110minus021minus009minus029107minus008

minus029107minus008minus041minus002104

minus041minus002104minus041minus002104 ) (26)

For this simple example T is virtually identical to T Forrealistic data T will not be identical to T but will be anapproximation of it that is optimal in the least squares senseThe triphones with the strongest support are expected to bethe most likely to be the triphones defining a wordrsquos form

We made use of the S matrix which we derived fromthe tasa corpus as described in Section 3 The majority ofcolumns of the 23561 times 23561 matrix S show very smalldeviations from zero and hence are uninformative As beforewe reduced the number of columns of S by removing columnswith very low variance Here one option is to remove allcolumns with a variance below a preset threshold 120579 Howeverto avoid adding a threshold as a free parameter we setthe number of columns retained to the number of differenttriphones in a given dataset a number which is around 119899 =4500 In summary S denotes a 119908times119899 matrix that specifies foreach of 119908 words an 119899-dimensional semantic vector

52 Model Performance

521 Performance on Monolexomic Words We first exam-ined model performance on a dataset comprising monolex-omicwords that did not carry any inflectional exponentsThisdataset of 3987words comprised three irregular comparatives

(elder less andmore) two irregular superlatives (least most)28 irregular past tense forms and one irregular past participle(smelt) For this set of words we constructed a 3987 times 4446matrix of semantic vectors S119898 (a submatrix of the S matrixintroduced above in Section 3) and a 3987 times 4446 triphonematrix T119898 (a submatrix of T) The number of colums of S119898was set to the number of different triphones (the columndimension of T119898) Those column vectors of S119898 were retainedthat had the 4446 highest variances We then estimated thetransformation matrix G and used this matrix to predict thetriphones that define wordsrsquo forms

We evaluated model performance in two ways First weinspected whether the triphones with maximal support wereindeed the targeted triphones This turned out to be the casefor all words Targeted triphones had an activation valueclose to one and nontargeted triphones an activation closeto zero As the triphones are not ordered we also investi-gated whether the sequence of phones could be constructedcorrectly for these words To this end we set a thresholdof 099 extracted all triphones with an activation exceedingthis threshold and used the all simple paths (a path issimple if the vertices it visits are not visited more than once)function from the igraph package [127] to calculate all pathsstarting with any left-edge triphone in the set of extractedtriphones From the resulting set of paths we selected the

22 Complexity

longest path which invariably was perfectly aligned with thesequence of triphones that defines wordsrsquo forms

We also evaluated model performance with a secondmore general heuristic algorithm that also makes use of thesame algorithm from graph theory Our algorithm sets up agraph with vertices collected from the triphones that are bestsupported by the relevant semantic vectors and considers allpaths it can find that lead from an initial triphone to a finaltriphoneThis algorithm which is presented inmore detail inthe appendix and which is essential for novel complex wordsproduced the correct form for 3982 out of 3987 words Itselected a shorter form for five words int for intent lin forlinnen mis for mistress oint for ointment and pin for pippinThe correct forms were also found but ranked second due toa simple length penalty that is implemented in the algorithm

From these analyses it is clear that mapping nearly4000 semantic vectors on their corresponding triphone pathscan be accomplished with very high accuracy for Englishmonolexomic words The question to be addressed nextis how well this approach works for complex words Wefirst address inflected forms and limit ourselves here to theinflected variants of the present set of 3987 monolexomicwords

522 Performance on Inflected Words Following the classicdistinction between inflection and word formation inflectedwords did not receive semantic vectors of their own Never-theless we can create semantic vectors for inflected words byadding the semantic vector of an inflectional function to thesemantic vector of its base However there are several ways inwhich themapping frommeaning to form for inflectedwordscan be set up To explain this we need some further notation

Let S119898 and T119898 denote the submatrices of S and T thatcontain the semantic and triphone vectors of monolexomicwords Assume that a subset of 119896 of these monolexomicwords is attested in the training corpus with inflectionalfunction 119886 Let T119886 denote the matrix with the triphonevectors of these inflected words and let S119886 denote thecorresponding semantic vectors To obtain S119886 we take thepertinent submatrix S119898119886 from S119898 and add the semantic vectors119886 of the affix

S119886 = S119898119886 + i⨂ s119886 (27)

Here i is a unit vector of length 119896 and ⨂ is the generalizedKronecker product which in (27) stacks 119896 copies of s119886 row-wise As a first step we could define a separate mapping G119886for each inflectional function 119886

S119886G119886 = T119886 (28)

but in this set-up learning of inflected words does not benefitfrom the knowledge of the base words This can be remediedby a mapping for augmented matrices that contain the rowvectors for both base words and inflected words[S119898

S119886]G119886 = [T119898

T119886] (29)

The dimension of G119886 (length of semantic vector by lengthof triphone vector) remains the same so this option is notmore costly than the preceding one Nevertheless for eachinflectional function a separate large matrix is required Amuch more parsimonious solution is to build augmentedmatrices for base words and all inflected words jointly

[[[[[[[[[[S119898S1198861S1198862S119886119899

]]]]]]]]]]G = [[[[[[[[[[

T119898T1198861T1198862T119886119899

]]]]]]]]]](30)

The dimension of G is identical to that of G119886 but now allinflectional functions are dealt with by a single mappingIn what follows we report the results obtained with thismapping

We selected 6595 inflected variants which met the cri-terion that the frequency of the corresponding inflectionalfunction was at least 50 This resulted in a dataset with 91comparatives 97 superlatives 2401 plurals 1333 continuousforms (eg walking) 859 past tense forms and 1086 formsclassed as perfective (past participles) as well as 728 thirdperson verb forms (eg walks) Many forms can be analyzedas either past tenses or as past participles We followed theanalyses of the treetagger which resulted in a dataset inwhichboth inflectional functions are well-attested

Following (30) we obtained an (augmented) 10582 times5483 semantic matrix S whereas before we retained the 5483columns with the highest column variance The (augmented)triphone matrix T for this dataset had the same dimensions

Inspection of the activations of the triphones revealedthat targeted triphones had top activations for 85 of themonolexomic words and 86 of the inflected words Theproportion of words with at most one intruding triphone was97 for both monolexomic and inflected words The graph-based algorithm performed with an overall accuracy of 94accuracies broken down bymorphology revealed an accuracyof 99 for the monolexomic words and an accuracy of 92for inflected words One source of errors for the productionalgorithm is inconsistent coding in the celex database Forinstance the stem of prosper is coded as having a final schwafollowed by r but the inflected forms are coded without ther creating a mismatch between a partially rhotic stem andcompletely non-rhotic inflected variants

We next put model performance to a more stringent testby using 10-fold cross-validation for the inflected variantsFor each fold we trained on all stems and 90 of all inflectedforms and then evaluated performance on the 10of inflectedforms that were not seen in training In this way we can ascer-tain the extent to which our production system (network plusgraph-based algorithm for ordering triphones) is productiveWe excluded from the cross-validation procedure irregularinflected forms forms with celex phonological forms withinconsistent within-paradigm rhoticism as well as forms thestem of which was not available in the training set Thus

Complexity 23

cross-validation was carried out for a total of 6236 inflectedforms

As before the semantic vectors for inflected words wereobtained by addition of the corresponding content and inflec-tional semantic vectors For each training set we calculatedthe transformationmatrixG from the S andTmatrices of thattraining set For an out-of-bag inflected form in the test setwe calculated its semantic vector and multiplied this vectorwith the transformation matrix (using (21)) to obtain thepredicted triphone vector t

The proportion of forms that were predicted correctlywas 062 The proportion of forms ranked second was 025Forms that were incorrectly ranked first typically were otherinflected forms (including bare stems) that happened toreceive stronger support than the targeted form Such errorsare not uncommon in spoken English For instance inthe Buckeye corpus [110] closest is once reduced to closFurthermore Dell [90] classified forms such as concludementfor conclusion and he relax for he relaxes as (noncontextual)errors

The algorithm failed to produce the targeted form for3 of the cases Examples of the forms produced instead arethe past tense for blazed being realized with [zId] insteadof [zd] the plural mouths being predicted as having [Ts] ascoda rather than [Ds] and finest being reduced to finst Thevoiceless production formouth does not follow the dictionarynorm but is used as attested by on-line pronunciationdictionaries Furthermore the voicing alternation is partlyunpredictable (see eg [128] for final devoicing in Dutch)and hence model performance here is not unreasonable Wenext consider model accuracy for derived words

523 Performance on Derived Words When building thevector space model we distinguished between inflectionand word formation Inflected words did not receive theirown semantic vectors By contrast each derived word wasassigned its own lexome together with a lexome for itsderivational function Thus happiness was paired with twolexomes happiness and ness Since the semantic matrix forour dataset already contains semantic vectors for derivedwords we first investigated how well forms are predictedwhen derived words are assessed along with monolexomicwords without any further differentiation between the twoTo this end we constructed a semantic matrix S for 4885words (rows) by 4993 (columns) and constructed the trans-formation matrix G from this matrix and the correspondingtriphone matrix T The predicted form vectors of T =SG supported the targeted triphones above all other tri-phones without exception Furthermore the graph-basedalgorithm correctly reconstructed 99 of all forms withonly 5 cases where it assigned the correct form secondrank

Next we inspected how the algorithm performs whenthe semantic vectors of derived forms are obtained fromthe semantic vectors of their base words and those of theirderivational lexomes instead of using the semantic vectors ofthe derived words themselves To allow subsequent evalua-tion by cross-validation we selected those derived words that

contained an affix that occured at least 30 times in our data set(again (38) agent (177) ful (45) instrument (82) less(54) ly (127) and ness (57)) to a total of 770 complex words

We first combined these derived words with the 3987monolexomic words For the resulting 4885 words the Tand S matrices were constructed from which we derivedthe transformation matrix G and subsequently the matrixof predicted triphone strengths T The proportion of wordsfor which the targeted triphones were the best supportedtriphones was 096 and the graph algorithm performed withan accuracy of 989

In order to assess the productivity of the system weevaluated performance on derived words by means of 10-fold cross-validation For each fold we made sure that eachaffix was present proportionally to its frequency in the overalldataset and that a derived wordrsquos base word was included inthe training set

We first examined performance when the transformationmatrix is estimated from a semantic matrix that contains thesemantic vectors of the derived words themselves whereassemantic vectors for unseen derived words are obtained bysummation of the semantic vectors of base and affix It turnsout that this set-up results in a total failure In the presentframework the semantic vectors of derived words are tooidiosyncratic and too finely tuned to their own collocationalpreferences They are too scattered in semantic space tosupport a transformation matrix that supports the triphonesof both stem and affix

We then used exactly the same cross-validation procedureas outlined above for inflected words constructing semanticvectors for derived words from the semantic vectors of baseand affix and calculating the transformation matrix G fromthese summed vectors For unseen derivedwordsGwas usedto transform the semantic vectors for novel derived words(obtained by summing the vectors of base and affix) intotriphone space

For 75 of the derived words the graph algorithmreconstructed the targeted triphones For 14 of the derivednouns the targeted form was not retrieved These includecases such as resound which the model produced with [s]instead of the (unexpected) [z] sewer which the modelproduced with R instead of only R and tumbler where themodel used syllabic [l] instead of nonsyllabic [l] given in thecelex target form

53 Weak Links in the Triphone Graph and Delays in SpeechProduction An important property of the model is that thesupport for trigrams changes where a wordrsquos graph branchesout for different morphological variants This is illustratedin Figure 9 for the inflected form blending Support for thestem-final triphone End is still at 1 but then blend can endor continue as blends blended or blending The resultinguncertainty is reflected in the weights on the edges leavingEnd In this example the targeted ing form is driven by theinflectional semantic vector for continuous and hence theedge to ndI is best supported For other forms the edgeweights will be different and hence other paths will be bettersupported

24 Complexity

1

1

1051

024023

03

023

1

001 001

1

02

03

023

002

1 0051

bl

blE

lEn

EndndI

dIN

IN

INI

NIN

dId

Id

IdI

dII

IIN

ndz

dz

dzI

zIN

nd

EnI

nIN

lEI

EIN

Figure 9 The directed graph for blending Vertices represent triphones including triphones such as IIN that were posited by the graphalgorithm to bridge potentially relevant transitions that are not instantiated in the training data Edges are labelled with their edge weightsThe graph the edge weights of which are specific to the lexomes blend and continuous incorporates not only the path for blending butalso the paths for blend blends and blended

The uncertainty that arises at branching points in thetriphone graph is of interest in the light of several exper-imental results For instance inter keystroke intervals intyping become longer at syllable and morph boundaries[129 130] Evidence from articulography suggests variabilityis greater at morph boundaries [131] Longer keystroke exe-cution times and greater articulatory variability are exactlywhat is expected under reduced edge support We thereforeexamined whether the edge weights at the first branchingpoint are predictive for lexical processing To this end weinvestigated the acoustic duration of the segment at the centerof the first triphone with a reduced LDL edge weight (in thepresent example d in ndI henceforth lsquobranching segmentrsquo)for those words in our study that are attested in the Buckeyecorpus [110]

The dataset we extracted from the Buckeye corpus com-prised 15105 tokens of a total of 1327 word types collectedfrom 40 speakers For each of these words we calculatedthe relative duration of the branching segment calculatedby dividing segment duration by word duration For thepurposes of statistical evaluation the distribution of thisrelative duration was brought closer to normality bymeans ofa logarithmic transformationWe fitted a generalized additivemixed model [132] to log relative duration with randomintercepts for word and speaker log LDL edge weight aspredictor of interest and local speech rate (Phrase Rate)log neighborhood density (Ncount) and log word frequencyas control variables A summary of this model is presentedin Table 7 As expected an increase in LDL Edge Weight

goes hand in hand with a reduction in the duration of thebranching segment In other words when the edge weight isreduced production is slowed

The adverse effects of weaknesses in a wordrsquos path inthe graph where the path branches is of interest againstthe discussion about the function of weak links in diphonetransitions in the literature on comprehension [133ndash138] Forcomprehension it has been argued that bigram or diphonelsquotroughsrsquo ie low transitional probabilities in a chain of hightransitional probabilities provide points where sequencesare segmented and parsed into their constituents Fromthe perspective of discrimination learning however low-probability transitions function in exactly the opposite wayfor comprehension [54 124] Naive discriminative learningalso predicts that troughs should give rise to shorter process-ing times in comprehension but not because morphologicaldecomposition would proceed more effectively Since high-frequency boundary bigrams and diphones are typically usedword-internally across many words they have a low cuevalidity for thesewords Conversely low-frequency boundarybigrams are much more typical for very specific base+affixcombinations and hence are better discriminative cues thatafford enhanced activation of wordsrsquo lexomes This in turngives rise to faster processing (see also Ramscar et al [78] forthe discriminative function of low-frequency bigrams in low-frequency monolexomic words)

There is remarkable convergence between the directedgraphs for speech production such as illustrated in Figure 9and computational models using temporal self-organizing

Complexity 25

Table 7 Statistics for the partial effects of a generalized additive mixed model fitted to the relative duration of edge segments in the Buckeyecorpus The trend for log frequency is positive accelerating TPRS thin plate regression spline

A parametric coefficients Estimate Std Error t-value p-valueIntercept -19165 00592 -323908 lt 00001Log LDL Edge Weight -01358 00400 -33960 00007Phrase Rate 00047 00021 22449 00248Log Ncount 01114 00189 58837 lt 00001B smooth terms edf Refdf F-value p-valueTPRS log Frequency 19007 19050 75495 00004random intercepts word 11498839 13230000 314919 lt 00001random intercepts speaker 302995 390000 345579 lt 00001

maps (TSOMs [139ndash141]) TSOMs also use error-drivenlearning and in recent work attention is also drawn toweak edges in wordsrsquo paths in the TSOM and the conse-quences thereof for lexical processing [142] We are not usingTSOMs however one reason being that in our experiencethey do not scale up well to realistically sized lexiconsA second reason is that we are pursuing the hypothesisthat form serves meaning and that self-organization ofform by itself is not necessary This hypothesis is likelytoo strong especially as we do not provide a rationale forthe trigram and triphone units that we use to representaspects of form It is worth noting that spatial organizationof form features is not restricted to TSOMs Although inour approach the spatial organization of triphone features isleft unspecified such an organization can be easily enforced(see [85] for further details) by using an algorithm suchas graphopt which self-organizes the vertices of a graphinto a two-dimensional plane Figure 9 was obtained withthis algorithm (Graphopt was developed for the layout oflarge graphs httpwwwschmuhlorggraphopt andis implemented in the igraph package Graphopt uses basicprinciples of physics to iteratively determine an optimallayout Each node in the graph is given both mass and anelectric charge and edges between nodes are modeled asspringsThis sets up a system inwhich there are attracting andrepelling forces between the vertices of the graph and thisphysical system is simulated until it reaches an equilibrium)

54 Discussion Our production network reconstructsknown wordsrsquo forms with a high accuracy 999 formonolexomic words 92 for inflected words and 99 forderived words For novel complex words accuracy under10-fold cross-validation was 62 for inflected words and 75for derived words

The drop in performance for novel forms is perhapsunsurprising given that speakers understand many morewords than they themselves produce even though they hearnovel forms on a fairly regular basis as they proceed throughlife [79 143]

However we also encountered several technical problemsthat are not due to the algorithm but to the representationsthat the algorithm has had to work with First it is surprisingthat accuracy is as high as it is given that the semanticvectors are constructed from a small corpus with a very

simple discriminative algorithm Second we encounteredinconsistencies in the phonological forms retrieved from thecelex database inconsistencies that in part are due to theuse of discrete triphones Third many cases where the modelpredictions do not match the targeted triphone sequence thetargeted forms have a minor irregularity (eg resound with[z] instead of [s]) Fourth several of the typical errors that themodel makes are known kinds of speech errors or reducedforms that one might encounter in engaged conversationalspeech

It is noteworthy that themodel is almost completely data-driven The triphones are derived from celex and otherthan somemanual corrections for inconsistencies are derivedautomatically from wordsrsquo phone sequences The semanticvectors are based on the tasa corpus and were not in any wayoptimized for the production model Given the triphone andsemantic vectors the transformation matrix is completelydetermined No by-hand engineering of rules and exceptionsis required nor is it necessary to specify with hand-codedlinks what the first segment of a word is what its secondsegment is etc as in the weaver model [37] It is only in theheuristic graph algorithm that three thresholds are requiredin order to avoid that graphs become too large to remaincomputationally tractable

The speech production system that emerges from thisapproach comprises first of all an excellent memory forforms that have been encountered before Importantly thismemory is not a static memory but a dynamic one It is not arepository of stored forms but it reconstructs the forms fromthe semantics it is requested to encode For regular unseenforms however a second network is required that projects aregularized semantic space (obtained by accumulation of thesemantic vectors of content and inflectional or derivationalfunctions) onto the triphone output space Importantly itappears that no separate transformations or rules are requiredfor individual inflectional or derivational functions

Jointly the two networks both of which can also betrained incrementally using the learning rule ofWidrow-Hoff[44] define a dual route model attempts to build a singleintegrated network were not successfulThe semantic vectorsof derived words are too idiosyncratic to allow generalizationfor novel forms It is the property of the semantic vectorsof derived lexomes described in Section 323 namely thatthey are close to but not inside the cloud of their content

26 Complexity

lexomes which makes them productive Novel forms do notpartake in the idiosyncracies of lexicalized existing wordsbut instead remain bound to base and derivational lexomesIt is exactly this property that makes them pronounceableOnce produced a novel form will then gravitate as experi-ence accumulates towards the cloud of semantic vectors ofits morphological category meanwhile also developing itsown idiosyncracies in pronunciation including sometimeshighly reduced pronunciation variants (eg tyk for Dutchnatuurlijk ([natyrlk]) [111 144 145] It is perhaps possibleto merge the two routes into one but for this much morefine-grained semantic vectors are required that incorporateinformation about discourse and the speakerrsquos stance withrespect to the adressee (see eg [115] for detailed discussionof such factors)

6 Bringing in Time

Thus far we have not considered the role of time The audiosignal comes in over time longerwords are typically readwithmore than one fixation and likewise articulation is a temporalprocess In this section we briefly outline how time can bebrought into the model We do so by discussing the readingof proper names

Proper names (morphologically compounds) pose achallenge to compositional theories as the people referredto by names such as Richard Dawkins Richard NixonRichardThompson and SandraThompson are not obviouslysemantic composites of each other We therefore assignlexomes to names irrespective of whether the individu-als referred to are real or fictive alive or dead Further-more we assume that when the personal name and familyname receive their own fixations both names are under-stood as pertaining to the same named entity which istherefore coupled with its own unique lexome Thus forthe example sentences in Table 8 the letter trigrams ofJohn are paired with the lexome JohnClark and like-wise the letter trigrams of Clark are paired with this lex-ome

By way of illustration we obtained thematrix of semanticvectors training on 100 randomly ordered tokens of thesentences of Table 8 We then constructed a trigram cuematrix C specifying for each word (John Clark wrote )which trigrams it contains In parallel a matrix L specify-ing for each word its corresponding lexomes (JohnClarkJohnClark write ) was set up We then calculatedthe matrix F by solving CF = L and used F to cal-culate estimated (predicted) semantic vectors L Figure 10presents the correlations of the estimated semantic vectorsfor the word forms John John Welsch Janet and Clarkwith the targeted semantic vectors for the named entitiesJohnClark JohnWelsch JohnWiggam AnneHastieand JaneClark For the sequentially read words John andWelsch the semantic vector generated by John and thatgenerated by Welsch were summed to obtain the integratedsemantic vector for the composite name

Figure 10 upper left panel illustrates that upon readingthe personal name John there is considerable uncertainty

about which named entity is at issue When subsequentlythe family name is read (upper right panel) uncertainty isreduced and John Welsch now receives full support whereasother named entities have correlations close to zero or evennegative correlations As in the small world of the presenttoy example Janet is a unique personal name there is nouncertainty about what named entity is at issue when Janetis read For the family name Clark on its own by contrastJohn Clark and Janet Clark are both viable options

For the present example we assumed that every ortho-graphic word received one fixation However depending onwhether the eye lands far enough into the word namessuch as John Clark and compounds such as graph theorycan also be read with a single fixation For single-fixationreading learning events should comprise the joint trigrams ofthe two orthographic words which are then simultaneouslycalibrated against the targeted lexome (JohnClark graph-theory)

Obviously the true complexities of reading such asparafoveal preview are not captured by this example Never-theless as a rough first approximation this example showsa way forward for integrating time into the discriminativelexicon

7 General Discussion

We have presented a unified discrimination-driven modelfor visual and auditory word comprehension and wordproduction the discriminative lexicon The layout of thismodel is visualized in Figure 11 Input (cochlea and retina)and output (typing and speaking) systems are presented inlight gray the representations of themodel are shown in darkgray For auditory comprehension we make use of frequencyband summary (FBS) features low-level cues that we deriveautomatically from the speech signal The FBS features serveas input to the auditory network F119886 that maps vectors of FBSfeatures onto the semantic vectors of S For reading lettertrigrams represent the visual input at a level of granularitythat is optimal for a functional model of the lexicon (see [97]for a discussion of lower-level visual features) Trigrams arerelatively high-level features compared to the FBS featuresFor features implementing visual input at a much lower levelof visual granularity using histograms of oriented gradients[146] see Linke et al [15] Letter trigrams are mapped by thevisual network F119900 onto S

For reading we also implemented a network heredenoted by K119886 that maps trigrams onto auditory targets(auditory verbal images) represented by triphones Theseauditory triphones in turn are mapped by network H119886onto the semantic vectors S This network is motivated notonly by our observation that for predicting visual lexicaldecision latencies triphones outperform trigrams by a widemargin Network H119886 is also part of the control loop forspeech production see Hickok [147] for discussion of thiscontrol loop and the necessity of auditory targets in speechproduction Furthermore the acoustic durations with whichEnglish speakers realize the stem vowels of English verbs[148] as well as the duration of word final s in English [149]

Complexity 27

Table 8 Example sentences for the reading of proper names To keep the example simple inflectional lexomes are not taken into account

John Clark wrote great books about antsJohn Clark published a great book about dangerous antsJohn Welsch makes great photographs of airplanesJohn Welsch has a collection of great photographs of airplanes landingJohn Wiggam will build great harpsichordsJohn Wiggam will be a great expert in tuning harpsichordsAnne Hastie has taught statistics to great second year studentsAnne Hastie has taught probability to great first year studentsJanet Clark teaches graph theory to second year studentsJohnClark write great book about antJohnClark publish a great book about dangerous antJohnWelsch make great photograph of airplaneJohnWelsch have a collection of great photograph airplane landingJohnWiggam future build great harpsichordJohnWiggam future be a great expert in tuning harpsichordAnneHastie have teach statistics to great second year studentAnneHastie have teach probability to great first year studentJanetClark teach graphtheory to second year student

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

John

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

John Welsch

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

Janet

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

Clark

Pear

son

corr

elat

ion

minus0

50

00

51

0

Figure 10 Correlations of estimated semantic vectors for John JohnWelsch Janet andClarkwith the targeted semantic vectors of JohnClarkJohnWelsch JohnWiggam AnneHastie and JaneClark

can be predicted from ndl networks using auditory targets todiscriminate between lexomes [150]

Speech production is modeled by network G119886 whichmaps semantic vectors onto auditory targets which in turnserve as input for articulation Although current modelsof speech production assemble articulatory targets fromphonemes (see eg [126 147]) a possibility that is certainlycompatible with our model when phones are recontextual-ized as triphones we think that it is worthwhile investigatingwhether a network mapping semantic vectors onto vectors ofarticulatory parameters that in turn drive the articulators canbe made to work especially since many reduced forms arenot straightforwardly derivable from the phone sequences oftheir lsquocanonicalrsquo forms [111 144]

We have shown that the networks of our model havereasonable to high production and recognition accuraciesand also generate a wealth of well-supported predictions forlexical processing Central to all networks is the hypothesisthat the relation between form and meaning can be modeleddiscriminatively with large but otherwise surprisingly simplelinear networks the underlyingmathematics of which (linearalgebra) is well-understood Given the present results it isclear that the potential of linear algebra for understandingthe lexicon and lexical processing has thus far been severelyunderestimated (the model as outlined in Figure 11 is amodular one for reasons of computational convenience andinterpretabilityThe different networks probably interact (seefor some evidence [47]) One such interaction is explored

28 Complexity

typing speaking

orthographic targets

trigrams

auditory targets

triphones

semantic vectors

TASA

auditory cues

FBS features

orthographic cues

trigrams

cochlea retina

To Ta

Ha

GoGa

KaS

Fa Fo

CoCa

Figure 11 Overview of the discriminative lexicon Input and output systems are presented in light gray the representations of the model areshown in dark gray Each of these representations is a matrix with its own set of features Arcs are labeled with the discriminative networks(transformationmatrices) thatmapmatrices onto each otherThe arc from semantic vectors to orthographic vectors is in gray as thismappingis not explored in the present study

in Baayen et al [85] In their production model for Latininflection feedback from the projected triphone paths backto the semantics (synthesis by analysis) is allowed so thatof otherwise equally well-supported paths the path that bestexpresses the targeted semantics is selected For informationflows between systems in speech production see Hickok[147]) In what follows we touch upon a series of issues thatare relevant for evaluating our model

Incremental learning In the present study we esti-mated the weights on the connections of these networkswith matrix operations but importantly these weights canalso be estimated incrementally using the learning rule ofWidrow and Hoff [44] further improvements in accuracyare expected when using the Kalman filter [151] As allmatrices can be updated incrementally (and this holds aswell for the matrix with semantic vectors which are alsotime-variant) and in theory should be updated incrementally

whenever information about the order of learning events isavailable the present theory has potential for studying lexicalacquisition and the continuing development of the lexiconover the lifetime [79]

Morphology without compositional operations Wehave shown that our networks are predictive for a rangeof experimental findings Important from the perspective ofdiscriminative linguistics is that there are no compositionaloperations in the sense of Frege and Russell [1 2 152] Thework on compositionality in logic has deeply influencedformal linguistics (eg [3 153]) and has led to the beliefthat the ldquoarchitecture of the language facultyrdquo is groundedin a homomorphism between a calculus (or algebra) ofsyntactic representations and a calculus (or algebra) based onsemantic primitives Within this tradition compositionalityarises when rules combining representations of form arematched with rules combining representations of meaning

Complexity 29

The approach of Marelli and Baroni [60] who derive thesemantic vector of a complex word from the semantic vectorof its base and a dedicated affix-specific mapping from thepertinent base words to the corresponding derived wordsis in the spirit of compositionality in that a change in form(through affixation) goes hand in hand with a change inmeaning (modeled with a linear mapping in semantic space)

The perspective on morphological processing developedin the present study breaks with this tradition Althoughsemantic vectors for inflected words are obtained by summa-tion of the semantic vectors of the base word and those ofits inflectional functions (see [85] for modeling of the richverbal paradigms of Latin) the form vectors correspondingto the resulting semantic vectors are not obtained by thesummation of vectors for chunks of form nor by any otheroperations on forms Attempts to align chunks of wordsrsquo formwith their semantic structures mdash typically represented bygraphs constructed over semantic primitives (see eg [154])mdash are destined to end in quagmires of arbitrary decisionsabout the number of inflectional classes required (Estonian30 or 300) what the chunks should look like (does Germanhave a morpheme d in its articles and demonstratives derdie das diese ) whether Semitic morphology requiresoperations on form on one or two tiers [155ndash157] andhow many affixal position slots are required for Athebascanlanguages such asNavajo [158] A language that is particularlypainful for compositional theories is Yelı Dnye an Indo-Pacific language spoken on Rossel island (in the south-east ofPapuaNewGuinea)The language has a substantial inventoryof over a 1000 function words used for verb agreementTypically these words are monosyllables that simultaneouslyspecify values for negation tense aspect person and numberof subject deixis evidentiality associated motion and coun-terfactuality Furthermore verbs typically have no less thaneight different stems the use of which depends on particularconstellations of inflectional functions [159 160] Yelı Dnyethus provides a striking counterexample to the hypothesisthat a calculus of meaning would be paralleled by a calculusof form

Whereas the theory of linear discriminative learningradically rejects the idea that morphological processing isgrounded in compositionality as defined in mainstream for-mal linguistic theory it does allow for the conjoint expressionin form of lexical meanings and inflectional functions andfor forms to map onto such conjoint semantic vectors Butjust asmultiple inflectional functions can be expressed jointlyin a single word multiple words can map onto nonconjointsemantic vectors as was illustrated above for named entityrecognition John Wiggam can be understood to refer to avery specific person without this person being a composi-tional function of the proper name John and the surnameWiggam In both these cases there is no isomorphy betweeninflectional functions and lexical meanings on the one handand chunks of form on the other hand

Of course conjoint vectors for lexical meanings andinflectional functions can be implemented andmade to workonly because a Latin verb form such as sapıvissemus isunderstood when building the semantic space S to expresslexomes for person (first) number (plural) voice (active)

tense (pluperfect) mood (subjunctive) and lexical meaning(to know) (see [85] for computational modeling details)Although this could loosely be described as a decomposi-tional approach to inflected words we prefer not to use theterm lsquodecompositionalrsquo as in formal linguistics this termas outlined above has a very different meaning A moreadequate terminology would be that the system is conjointrealizational for production and conjoint inferential for com-prehension (we note here that we have focused on temporallyconjoint realization and inference leaving temporally disjointrealization and inference for sequences of words and possiblycompounds and fixed expression for further research)

Naive versus linear discriminative learning There isone important technical difference between the learningengine of the current study ldl and naive discriminativelearning (ndl) Because ndl makes the simplifying assump-tion that outcomes are orthogonal the weights from the cuesto a given outcome are independent of the weights from thecues to the other outcomes This independence assumptionwhich motivates the word naive in naive discriminativelearning makes it possible to very efficiently update networkweights during incremental learning In ldl by contrast thisindependence no longer holds learning is no longer naiveWe therefore refer to our networks not as ndl networksbut simply as linear discriminative learning networks Actualincremental updating of ldl networks is computationallymore costly but further numerical optimization is possibleand implementations are underway

Scaling An important property of the present approachis that good results are obtained already for datasets ofmodestsize Our semantic matrix is derived from the tasa corpuswhich with only 10 million words is dwarfed by the 28 billionwords of the ukWaC corpus used by Marelli and Baroni [60]more words than any individual speaker can ever encounterin their lifetime Similarly Arnold et al [18] report goodresults for word identification when models are trained on20 hours of speech which contrasts with the huge volumesof speech required for training deep learning networks forspeech recognition Although performance increases some-whatwithmore training data [161] it is clear that considerableheadway can be made with relatively moderate volumesof speech It thus becomes feasible to train models onfor instance Australian English and New Zealand Englishusing already existing speech corpora and to implementnetworks that make precise quantitative predictions for howunderstanding is mediated by listenersrsquo expectations abouttheir interlocutors (see eg [162])

In this study we have obtained good results with asemantic matrix with a dimensionality of around 4000lexomes By itself we find it surprising that a semantic vectorspace can be scaffolded with a mere 4000 words But thisfindingmay also shed light on the phenomenon of childhoodamnesia [163 164] Bauer and Larkina [165] pointed outthat young children have autobiographical memories thatthey may not remember a few years later They argue thatthe rate of forgetting diminishes as we grow older andstabilizes in adulthood Part of this process of forgettingmay relate to the changes in the semantic matrix Cruciallywe expect the correlational structure of lexomes to change

30 Complexity

considerably over time as experience accumulates If this isindeed the case the semantic vectors for the lexomes foryoung children are different from the corresponding lexomesfor older children which will again differ from those ofadults As a consequence autobiographical memories thatwere anchored to the lexomes at a young age can no longerbe accessed as the semantic vectors to which these memorieswere originally anchored no longer exist

No examplars Given how we estimate the transforma-tion matrices with which we navigate between form andmeaning it might seem that our approach is in essence animplementation of an exemplar-based theory of the mentallexicon After all for instance the C119886 matrix specifies foreach auditory word form (exemplar) its pertinent frequencyband summary features However the computational imple-mentation of the present study should not be confused withthe underlying algorithmic conceptualization The underly-ing conceptualization is that learning proceeds incrementallytrial by trial with the weights of transformation networksbeing updated with the learning rule of Widrow and Hoff[44] In the mean trial-by-trial updating with the Widrow-Hoff learning rule results in the same expected connectionweights as obtained by the present estimates using thegeneralized inverse The important conceptual advantage ofincremental learning is however that there is no need to storeexemplars By way of example for auditory comprehensionacoustic tokens are given with the input to the learningsystemThese tokens the ephemeral result of interactingwiththe environment leave their traces in the transformationmatrix F119886 but are not themselves stored The same holdsfor speech production We assume that speakers dynamicallyconstruct the semantic vectors from their past experiencesrather than retrieving these semantic vectors from somedictionary-like fixed representation familiar from currentfile systems The dynamic nature of these semantic vectorsemerged loud and clear from the modeling of novel complexwords for speech production under cross-validation In otherwords our hypothesis is that all input and output vectors areephemeral in the sense that they represent temporary statesof the cognitive system that are not themselves represented inthat system but that leave their traces in the transformationmatrices that jointly characterize and define the cognitivesystem that we approximate with the discriminative lexicon

This dynamic perspective on the mental lexicon asconsisting of several coupled networks that jointly bringthe cognitive system into states that in turn feed into fur-ther networks (not modeled here) for sentence integration(comprehension) and articulation (production) raises thequestion about the status of units in this theory Units playan important role when constructing numeric vectors forform and meaning Units for letter trigrams and phonetrigrams are of course context-enriched letter and phoneunits Content lexomes as well as lexomes for derivationaland inflectional functions are crutches for central semanticfunctions ranging from functions realizing relatively concreteonomasiological functions (lsquothis is a dog not a catrsquo) toabstract situational functions allowing entities events andstates to be specified on the dimensions of time spacequantity aspect etc However crucial to the central thrust of

the present study there are no morphological symbols (unitscombining form andmeaning) normorphological formunitssuch as stems and exponents of any kind in the modelAbove we have outlined the many linguistic considerationsthat have led us not to want to incorporate such units aspart of our theory Importantly as there are no hidden layersin our model it is not possible to seek for lsquosubsymbolicrsquoreflexes of morphological units in patterns of activation overhidden layers Here we depart from earlier connectionistmodels which rejected symbolic representations but retainedthe belief that there should be morphological representationalbeit subsymbolic ones Seidenberg and Gonnerman [166]for instance discuss morphological representations as beinglsquointerlevel representationsrsquo that are emergent reflections ofcorrelations among orthography phonology and semanticsThis is not to say that themore agglutinative a language is themore the present model will seem to the outside observer tobe operating withmorphemes But themore a language tendstowards fusional or polysyntheticmorphology the less strongthis impression will be

Model complexity Next consider the question of howdifficult lexical processing actually is Sidestepping the com-plexities of the neural embedding of lexical processing [126147] here we narrow down this question to algorithmiccomplexity State-of-the-art models in psychology [37 114167] implementmany layers of hierarchically organized unitsand many hold it for an established fact that such units arein some sense psychologically real (see eg [19 91 168])However empirical results can be equally well understoodwithout requiring the theoretical construct of the morpheme(see eg [47 54 58 166 169 170]) The present studyillustrates this point for lsquoboundaryrsquo effects in written andoral production When paths in the triphone graph branchuncertainty increases and processing slows down Althoughobserved lsquoboundaryrsquo effects are compatible with a post-Bloomfieldian lexicon the very same effects arise in ourmodel even though morphemes do not figure in any way inour computations

Importantly the simplicity of the linear mappings in ourmodel of the discriminative lexicon allows us to sidestep theproblem that typically arise in computationally implementedfull-scale hierarchical systems namely that errors arisingat lower levels propagate to higher levels The major stepsforward made in recent end-to-end models in languageengineering suggest that end-to-end cognitive models arealso likely to perform much better One might argue thatthe hidden layers of deep learning networks represent thetraditional lsquohidden layersrsquo of phonemes morphemes andword forms mediating between form and meaning in lin-guistic models and offshoots thereof in psychology Howeverthe case of baboon lexical learning [12 13] illustrates thatthis research strategy is not without risk and that simplerdiscriminative networks can substantially outperform deeplearning networks [15] We grant however that it is conceiv-able that model predictions will improve when the presentlinear networks are replaced by deep learning networks whenpresented with the same input and output representationsused here (see Zhao et al [171] for a discussion of lossfunctions for deep learning targeting numeric instead of

Complexity 31

categorical output vectors) What is unclear is whethersuch networks will improve our understandingmdashthe currentlinear networks offer the analyst connection weights betweeninput and output representations that are straightforwardlylinguistically interpretable Where we think deep learningnetworks really will come into their own is bridging the gapbetween for instance the cochlea and retina on the one handand the heuristic representations (letter trigrams) that wehave used as a starting point for our functional theory of themappings between form and meaning

Network flexibility An important challenge for deeplearning networks designed to model human lexical process-ing is to keep the networks flexible and open to rapid updat-ing This openness to continuous learning is required notonly by findings on categorization [40 41 41 42] includingphonetic categorization [172 173] but is also visible in timeseries of reaction times Baayen and Hendrix [17] reportedimproved prediction of visual lexical decision reaction timesin the British Lexicon Project when a naive discriminativelearning network is updated from trial to trial simulating thewithin-experiment learning that goes on as subjects proceedthrough the experiment In natural speech the phenomenonof phonetic convergence between interlocutors [174] likewisebears witness to a lexicon that is constantly recalibratedin interaction with its environment (for the many ways inwhich interlocutorsrsquo speech aligns see [175]) Likewise therapidity with which listeners adapt to foreign accents [176]within a few minutes shows that lexical networks exhibitfast local optimization As deep learning networks typicallyrequire huge numbers of cycles through the training dataonce trained they are at risk of not having the requiredflexibility for fast local adaptation This risk is reduced for(incremental) wide learning as illustrated by the phoneticconvergence evidenced by the model of Arnold et al [18] forauditory comprehension (a very similar point was made byLevelt [177] in his critique of connectionist models in thenineties)

Open questions This initial study necessarily leavesmany questions unanswered Compounds and inflectedderived words have not been addressed and morphologicaloperations such as reduplication infixation and nonconcate-native morphology provide further testing grounds for thepresent approach We also need to address the processingof compounds with multiple constituents ubiquitous inlanguages as different as German Mandarin and EstonianFurthermore in actual utterances words undergo assimila-tion at their boundaries and hence a crucial test case forthe discriminative lexicon is to model the production andcomprehension of words in multi-word utterances For ourproduction model we have also completely ignored stresssyllabification and pitch which however likely has renderedthe production task more difficult than necessary

A challenge for discriminative morphology is the devel-opment of proper semantic matrices For languages withrich morphology such as Estonian current morphologicalparsers will identify case-inflected words as nominativesgenitives or partitives (among others) but these labelsfor inflectional forms do not correspond to the semanticfunctions encoded in these forms (see eg [178 179]) What

is required are computational tools that detect the appropriateinflectional lexomes for these words in the sentences orutterances in which they are embedded

A related problem specifically for the speech productionmodel is how to predict the strongly reduced word formsthat are rampant in conversational speech [111] Our auditorycomprehension network F119886 is confronted with reduced formsin training and the study of Arnold et al [18] indicates thatpresent identification accuracymay approximate human per-formance for single-word recognition Whereas for auditorycomprehension we are making progress what is required forspeech production is a much more detailed understanding ofthe circumstances under which speakers actually use reducedword forms The factors driving reduction may be in partcaptured by the semantic vectors but we anticipate thataspects of the unfolding discourse play an important role aswell which brings us back to the unresolved question of howto best model not isolated words but words in context

An important challenge for our theory which formalizesincremental implicit learning is how to address and modelthe wealth of explicit knowledge that speakers have abouttheir language We play with the way words sound in poetrywe are perplexed by wordsrsquo meanings and reinterpret themso that they make more sense (eg the folk etymology [180]reflected in modern English crawfish a crustacean and nota fish originally Middle English crevis compare ecrevissein French) we teach morphology in schools and we evenfind that making morphological relations explicit may bebeneficial for aphasic patients [181] The rich culture of worduse cannot but interact with the implicit system but howexactly this interaction unfolds andwhat its consequences areat both sides of the consciousunconscious divide is presentlyunclear

Although many questions remain our model is remark-ably successful in capturing many aspects of lexical andmorphological processing in both comprehension and pro-duction Since themodel builds on a combination of linguisti-callymotivated lsquosmartrsquo low-level representations for form andmeaning and large but otherwise straightforward andmathe-matically well-understood two-layer linear networks (to beclear networks without any hidden layers) the conclusionseems justified that algorithmically the ldquomental lexiconrdquomaybe much simpler than previously thought

Appendix

A Vectors Matrices and MatrixMultiplication

Figure 12 presents two data points 119886 and 119887 with coordinates(3 4) and (-2 -3) Arrows drawn from the origin to thesepoints are shown in blue with a solid and dashed linerespectively We refer to these points as vectors Vectors aredenoted with lower case letters in bold and we place the 119909coordinate of a point next to the 119910 coordinate

a = (3 4) b = (minus2 minus3) (A1)

32 Complexity

minus4 minus2 0 2 4

minus4

minus2

02

4

x

y

a

b

x

y

Figure 12 Points 119886 and 119887 can be transformed into points 119901 and 119902 bya linear transformation

We can represent a and b jointly by placing them next to eachother in a matrix For matrices we use capital letters in boldfont

A = ( 3 4minus2 minus3) (A2)

In addition to the points 119886 and 119887 Figure 12 also shows twoother datapoints 119909 and 119910 The matrix for these two points is

B = (minus5 24 minus1) (A3)

Let us assume that points 119886 and 119887 represent the forms oftwo words 1205961 and 1205962 and that the points 119909 and 119910 representthe meanings of these two words (below we discuss in detailhow exactly wordsrsquo forms and meanings can be representedas vectors of numbers) As morphology is the study of therelation between wordsrsquo forms and their meanings we areinterested in how to transform points 119886 and b into points119909 and 119910 and likewise in how to transform points 119909 and119910 back into points 119886 and 119887 The first transformation isrequired for comprehension and the second transformationfor production

We begin with the transformation for comprehensionFormally we have matrix A which we want to transforminto matrix B A way of mapping A onto B is by means ofa linear transformation using matrix multiplication For 2 times2 matrices matrix multiplication is defined as follows(1198861 11988621198871 1198872) (1199091 11990921199101 1199102) = (11988611199091 + 11988621199101 11988611199092 + 1198862119910211988711199091 + 11988721199101 11988711199092 + 11988721199102) (A4)

Such a mapping of A onto B is given by a matrix F

F = ( 1 2minus2 minus1) (A5)

and it is straightforward to verify that indeed( 3 4minus2 minus3) ( 1 2minus2 minus1) = (minus5 24 minus1) (A6)

Using matrix notation we can write

AF = B (A7)

Given A and B how do we obtain F The answeris straightforward but we need two further concepts theidentity matrix and the matrix inverse For multiplication ofnumbers the identity multiplication is multiplication with 1For matrices multiplication with the identity matrix I leavesthe locations of the points unchangedThe identitymatrix hasones on the main diagonal and zeros elsewhere It is easilyverified that indeed(1 00 1) (1199091 11990921199101 1199102) = (1199091 11990921199101 1199102) (1 00 1) = (1199091 11990921199101 1199102) (A8)

For numbers the inverse of multiplication with 119904 is dividingby 119904 (1119904 ) (119904119909) = 119909 (A9)

For matrices the inverse of a square matrix X is that matrixXminus1 such that their product is the identity matrix

Xminus1X = XXminus1 = I (A10)

For nonsquare matrices the inverse Yminus1 is defined such that

Yminus1 (YX) = (XY) 119884minus1 = X (A11)

We find the square matrix F that maps the square matrices Aonto B as follows

AF = B

Aminus1AF = Aminus1B

IF = Aminus1B

F = Aminus1B(A12)

Since for the present example A and its inverse happen to beidentical (Aminus1 = A) we obtain (A6)

Fmaps wordsrsquo form vectors onto wordsrsquo semantic vectorsLet us now consider the reverse and see how we can obtainwordsrsquo form vectors from wordsrsquo semantic vectors That isgiven B we want to find that matrix G which maps B ontoA ie

BG = A (A13)

Solving this equation proceeds exactly as above

BG = A

Bminus1BG = Bminus1A

IG = Bminus1A

G = Bminus1A(A14)

Complexity 33

The inverse of B is

Bminus1 = (13 2343 53) (A15)

and hence we have

(13 2343 53) ( 3 4minus2 minus3) = (minus13 minus2323 13 ) = G (A16)

The inverse of a matrix needs not exist A square matrixthat does not have an inverse is referred to as a singularmatrix Almost all matrices with which we work in theremainder of this study are singular For singular matricesan approximation of the inverse can be used such as theMoore-Penrose generalized inverse (In R the generalizedinverse is available in theMASS package function ginv Thenumeric library that is used by ginv is LAPACK availableat httpwwwnetliborglapack) In this study wedenote the generalized inverse of a matrix X by X1015840

The examples presented here are restricted to 2 times 2matrices but themathematics generalize tomatrices of largerdimensions When multiplying two matrices X and Y theonly constraint is that the number of columns ofX is the sameas the number of rows ofYThe resultingmatrix has the samenumber of rows as X and the same number of columns as Y

In the present study the rows of matrices represent wordsand columns the features of thesewords For instance the rowvectors of A can represent wordsrsquo form features and the rowvectors of and B the corresponding semantic features Thefirst row vector of A is mapped onto the first row vector ofB as follows (3 4) ( 1 2minus2 minus1) = (minus5 2) (A17)

A transformationmatrix such as F can be represented as atwo-layer networkThe network corresponding to F is shownin Figure 13 When the input vector (3 4) is presented to thenetwork the nodes 1198941 and 1198942 are set to 3 and 4 respectively Toobtain the activations 1199001 = minus5 and 1199002 = 2 for the output vector(minus5 2) we cumulate the evidence at the input weighted bythe connection strengths specified on the edges of the graphThus for 1199001 we obtain the value 3 times 1 + 4 times minus2 = minus5 and for1199002 we have 3 times 2 + 4 times minus1 = 2 In other words the networkmaps the input vector (3 4) onto the output vector (minus5 2)

An important property of linear maps is that they areproductive We can present a novel form vector say

s = (2 2) (A18)

to the network and it will map this vector onto a novelsemantic vector Using matrices this new semantic vector isstraightforwardly calculated(2 2) ( 1 2minus2 minus1) = (minus2 2) (A19)

1 2 minus2 minus1

i1 i2

o1 o2

semantic vectors come out here

form features go in here

connection weightsare the transformationmatrix

Figure 13 The linear network corresponding to the transformationmatrix F There are two input nodes 1198941 and 1198942 (in blue)and twooutput nodes 1199001 and 1199002 in red When the input vector (3 4) ispresented to the network the value at 1199001 is 3 times 1 + 4 times minus2 = minus5and that at 1199002 is 3 times 2 + 4 times minus1 = 2 The network maps the point(3 4) onto the point (minus5 2)B Graph-Based Triphone Sequencing

The hypothesis underlying graph-based triphone sequencingis that in the directed graph that has triphones as verticesand an edge between any two vertices that overlap (iesegments 2 and 3 of triphone 1 are identical to segments 1and 2 of triphone 2) the path from a wordrsquos initial triphone(the triphone starting with ) to a wordrsquos final triphone (thetriphone ending with a ) is the path receiving the bestsupport of all possible paths from the initial to the finaltriphone Unfortunately the directed graph containing allvertices and edges is too large tomake computations tractablein a reasonable amount of computation time The followingalgorithm is a heuristic algorithm that first collects potentiallyrelevant vertices and edges and then calculates all pathsstarting fromany initial vertex using theall simple pathsfunction from the igraph package [127] As a first step allvertices that are supported by the stem and whole wordwith network support exceeding 01 are selected To theresulting set those vertices are added that are supported bythe affix which is conditional on these vertices having anetwork support exceeding 095 A directed graph can nowbe constructed and for each initial vertex all paths startingat these vertices are calculated The subset of those pathsreaching a final vertex is selected and the support for eachpath is calculated As longer paths trivially will have greatersupport the support for a path is weighted for its lengthsimply by dividing the raw support by the path length Pathsare ranked by weighted path support and the path withmaximal support is selected as the acoustic image drivingarticulation

It is possible that no path from an initial vertex to a finalvertex is found due to critical boundary triphones not beinginstantiated in the data on which the model was trainedThis happens when the model is assessed on novel inflected

34 Complexity

nobo

dyso

met

hing

ever

ythi

ngno

thin

gev

eryo

nean

ythi

ngan

yone

som

eone

hers

him his

via

than I

exce

ptou

tside

thei

rsw

hoev

erth

ence

min

ein

side

amon

gst its

thei

rou

rm

yyo

uyo

ur her

she

your

selv

esw

hate

ver us we it

they he

them m

em

yself

him

self

hers

elfou

rselv

esyo

urse

lfits

elfth

emse

lves

abou

tw

ithou

tw

ithin

beyo

ndam

ong

betw

een

thro

ugho

utdu

ring

since by

from in of

with at fo

r tobe

low

abov

e up offdo

wn

upon on

agai

nst

tow

ard

tow

ards

arou

ndne

aral

ong

unde

rbe

hind

besid

eth

roug

hon

toin

toov

erac

ross

nobodysomethingeverythingnothingeveryoneanythinganyonesomeonehershimhisviathanIexceptoutsidetheirswhoeverthencemineinsideamongstitstheirourmyyouyourhersheyourselveswhateverusweittheyhethemmemyselfhimselfherselfourselvesyourselfitselfthemselvesaboutwithoutwithinbeyondamongbetweenthroughoutduringsincebyfrominofwithatfortobelowaboveupoffdownupononagainsttowardtowardsaroundnearalongunderbehindbesidethroughontointooveracross

minus01 0 01 02Value

Color Keyand Histogram

050

015

0025

00C

ount

Figure 14 Heatmap for the correlations of the semantic vectors of pronouns and prepositions Note that quantifier pronouns form a groupof their own that prepositions and the other pronouns form distinguisable groups and that within groups further subclusters are visible egforwe and us she and her and across and over All diagonal elements represent correlations equal to 1 (this is not brought out by the heatmapwhich color-codes diagonal elements with the color for the highest correlation of the range we specified 025)

and derived words under cross-validationTherefore verticesand edges are added to the graph for all nonfinal verticesthat are at the end of any of the paths starting at any initialtriphone Such novel vertices and edges are assigned zeronetwork support Typically paths from the initial vertex tothe final vertex can now be constructed and path support isevaluated as above

For an algorithm that allows triphone paths to includecycles which may be the case in languages with much richer

morphology than English see Baayen et al [85] and for codethe R packageWpmWithLdl described therein

C A Heatmap for Function Words(Pronouns and Prepositions)

The heatmap for the correlations of the semantic vectors ofpronouns and prepositions is presented in Figure 14

Complexity 35

Data Availability

With the exception of the data from theDistributed Little RedHen Lab all primary data are publicly available from the citedsources

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The authors are indebted to Geoff Hollis Jessie Nixon ChrisWestbury and LuisMienhardt for their constructive feedbackon earlier versions of this manuscript This research wassupported by an ERC advancedGrant (no 742545) to the firstauthor

References

[1] G Frege ldquoBegriffsschrift a formula language modeled uponthat of arithmetic for pure thoughtrdquo in From Frege to GodelA Source Book in Mathematical Logic 1879-1931 pp 1ndash82 1967

[2] B RussellAn Inquiry intoMeaning andTruth Allen andUnwinLondon UK 1942

[3] R Montague ldquoThe proper treatment of quantification in ordi-nary englishrdquo in Approaches to Natural Language pp 221ndash242Springer 1973

[4] N Chomsky Syntactic Structures vol 33 Mouton The Hague1957

[5] W J M Levelt Formal Grammars in Linguistics And Psycholin-guistics Volume 1 An Introduction to the Theory of FormalLanguages and Automata Volume 2 Applications in LinguisticTheory Volume 3 Psycholinguistic Applications John BenjaminsPublishing 2008

[6] D F Kleinschmidt and T Florian Jaeger ldquoRobust speechperception Recognize the familiar generalize to the similar andadapt to the novelrdquo Psychological Review vol 122 no 2 pp 148ndash203 2015

[7] L Breiman ldquoRandom forestsrdquo Machine Learning vol 45 no 1pp 5ndash32 2001

[8] C Strobl A-L Boulesteix A Zeileis and T Hothorn ldquoBiasin random forest variable importance measures illustrationssources and a solutionrdquo BMC Bioinformatics vol 8 2007

[9] A Hannun C Case J Casper et al ldquoDeep speechScaling up end-to-end speech recognitionrdquo 2014httpsarxivorgabs14125567

[10] Z Tuske P Golik R Schluter and H Ney ldquoAcoustic modelingwith deep neural networks using raw time signal for lvcsrrdquo inProceedings of the INTERSPEECH pp 890ndash894 2014

[11] J Schmidhuber ldquoDeep learning in neural networks anoverviewrdquo Neural Networks vol 61 pp 85ndash117 2015

[12] T Hannagan J C Ziegler S Dufau J Fagot and J GraingerldquoDeep learning of orthographic representations in baboonsrdquoPLoS ONE vol 9 no 1 2014

[13] J Grainger S Dufau M Montant J C Ziegler and J FagotldquoOrthographic processing in baboons (Papio papio)rdquo Sciencevol 336 no 6078 pp 245ndash248 2012

[14] D Scarf K Boy A U Reinert J Devine O Gunturkun andM Colombo ldquoOrthographic processing in pigeons (columba

livia)rdquo Proceedings of the National Academy of Sciences vol 113no 40 pp 11272ndash11276 2016

[15] M Linke F Broker M Ramscar and R H Baayen ldquoArebaboons learning ldquoorthographicrdquo representations Probablynotrdquo PLoS ONE vol 12 no 8 2017

[16] D H Hubel and T N Wiesel ldquoReceptive fields binocularinteraction and functional architecture in the catrsquos visualcortexrdquo The Journal of Physiology vol 160 pp 106ndash154 1962

[17] R H Baayen and P Hendrix ldquoTwo-layer networks non-linear separation and human learningrdquo in From Semantics toDialectometry Festschrift in honor of JohnNerbonne Tributes 32M Wieling M Kroon G Van Noord and G Bouma Eds pp13ndash22 College Publications 2017

[18] D Arnold F Tomaschek K Sering F Lopez and R H BaayenldquoWords from spontaneous conversational speech can be rec-ognized with human-like accuracy by an error-driven learningalgorithm that discriminates between meanings straight fromsmart acoustic features bypassing the phoneme as recognitionunitrdquo PLoS ONE vol 12 no 4 2017

[19] P Zwitserlood ldquoProcessing and representation of morphologi-cal complexity in native language comprehension and produc-tionrdquo in The Construction of Words G E Booij Ed pp 583ndash602 Springer 2018

[20] G Booij ldquoConstruction morphologyrdquo Language and LinguisticsCompass vol 4 no 7 pp 543ndash555 2010

[21] J P Blevins ldquoWord-based morphologyrdquo Journal of Linguisticsvol 42 no 3 pp 531ndash573 2006

[22] G Stump Inflectional Morphology A Theory of ParadigmStructure Cambridge University Press 2001

[23] R Beard ldquoOn the extent and nature of irregularity in thelexiconrdquo Lingua vol 42 no 4 pp 305ndash341 1977

[24] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1974

[25] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1991

[26] C Hockett ldquoThe origin of speechrdquo Scientific American vol 203pp 89ndash97 1960

[27] M Erelt Estonian Language M Erelt Ed Estonian AcademyPublishers Tallinn Estonia 2003

[28] P H Matthews Grammatical Theory in the United States fromBloomfield to Chomsky Cambridge University Press Cam-bridge 1993

[29] N Chomsky andMHalleTheSound Pattern of English Harperand Row NY USA 1968

[30] T Landauer and S Dumais ldquoA solution to Platorsquos problem Thelatent semantic analysis theory of acquisition induction andrepresentation of knowledgerdquo Psychological Review vol 104 no2 pp 211ndash240 1997

[31] W Weaver ldquoTranslationrdquo in Machine Translation of LanguagesFourteen Essays W N Locke and A D Booth Eds pp 15ndash23MIT Press Cambridge 1955

[32] J R Firth Selected Papers of J R Firth 1952-59 IndianaUniversity Press 1968

[33] K Lund and C Burgess ldquoProducing high-dimensional seman-tic spaces from lexical co-occurrencerdquo Behavior Research Meth-ods Instruments and Computers vol 28 no 2 pp 203ndash2081996

[34] C Shaoul and C Westbury ldquoExploring lexical co-occurrencespace using HiDExrdquo Behavior Research Methods vol 42 no 2pp 393ndash413 2010

36 Complexity

[35] T Mikolov I Sutskever K Chen G Corrado and J DeanldquoDistributed representations of words and phrases and theircompositionalityrdquo inAdvances in Neural Information ProcessingSystems pp 3111ndash3119 2013

[36] A Roelofs ldquoThe WEAVER model of word-form encoding inspeech productionrdquo Cognition vol 64 no 3 pp 249ndash284 1997

[37] W J M Levelt A Roelofs and A S Meyer ldquoA theory of lexicalaccess in speech productionrdquoBehavioral and Brain Sciences vol22 no 1 pp 1ndash38 1999

[38] M Taft ldquoInteractive-activation as a Framework for Under-standing Morphological Processingrdquo Language and CognitiveProcesses vol 9 no 3 pp 271ndash294 1994

[39] R Schreuder and R H Baayen ldquoModeling morphologicalprocessingrdquo inMorphological Aspects of Language Processing LB Feldman Ed pp 131ndash154 Lawrence ErlbaumHillsdale NewJersey USA 1995

[40] B C Love D L Medin and T M Gureckis ldquoSUSTAIN ANetwork Model of Category Learningrdquo Psychological Reviewvol 111 no 2 pp 309ndash332 2004

[41] C J Marsolek ldquoWhat antipriming reveals about primingrdquoTrends in Cognitive Sciences vol 12 no 5 pp 176ndash181 2008

[42] M Ramscar and R Port ldquoCategorization (without categories)rdquoin Handbook of Cognitive Linguistics E Dabrowska and DDivjak Eds pp 75ndash99 De Gruyter Berlin Germany 2015

[43] R A Rescorla and A R Wagner ldquoA theory of Pavlovianconditioning Variations in the effectiveness of reinforcementand nonreinforcementrdquo in Classical Conditioning II CurrentResearch and Theory A H Black and W F Prokasy Eds pp64ndash99 Appleton Century Crofts NY USA 1972

[44] B Widrow and M E Hoff ldquoAdaptive switching circuitsrdquo 1960WESCON Convention Record Part IV pp 96ndash104 1960

[45] R H Baayen R Schreuder and R Sproat ldquoMorphology inthe Mental Lexicon A Computational Model for Visual WordRecognitionrdquo in Lexicon Development for Speech and LanguageProcessing F van Eynde and D Gibbon Eds pp 267ndash291Kluwer Academic Publishers 2000

[46] M Taft ldquoA morphological-decomposition model of lexicalrepresentationrdquo Linguistics vol 26 no 4 pp 657ndash667 1988

[47] MWHarmandM S Seidenberg ldquoComputing themeanings ofwords in reading Cooperative division of labor between visualand phonological processesrdquo Psychological Review vol 111 no3 pp 662ndash720 2004

[48] P C Trimmer J M McNamara A Houston and J A RMarshall ldquoDoes natural selection favour the Rescorla-Wagnerrulerdquo Journal of Theoretical Biology vol 302 pp 39ndash52 2012

[49] R R Miller R C Barnet and N J Grahame ldquoAssessment ofthe Rescorla-Wagner modelrdquo Psychological Bulletin vol 117 no3 pp 363ndash386 1995

[50] L J Kamin ldquoPredictability surprise attention and condition-ingrdquo in Punishment and Aversive Behavior B A Campbell andRMChurch Eds pp 276ndash296Appleton-Century-CroftsNYUSA 1969

[51] R A Rescorla ldquoPavlovian conditioning Itrsquos not what you thinkit isrdquo American Psychologist vol 43 no 3 pp 151ndash160 1988

[52] M Ramscar D Yarlett M Dye K Denny and K ThorpeldquoThe Effects of Feature-Label-Order and Their Implications forSymbolic Learningrdquo Cognitive Science vol 34 no 6 pp 909ndash957 2010

[53] M Ramscar and D Yarlett ldquoLinguistic self-correction in theabsence of feedback A new approach to the logical problem oflanguage acquisitionrdquo Cognitive Science vol 31 no 6 pp 927ndash960 2007

[54] R H Baayen P Milin D F ETHurđevic P Hendrix and MMarelli ldquoAn Amorphous Model for Morphological Process-ing in Visual Comprehension Based on Naive DiscriminativeLearningrdquo Psychological Review vol 118 no 3 pp 438ndash4812011

[55] F M Del Prado Martın R Bertram T Haikio R Schreuderand R H Baayen ldquoMorphological family size in a morphologi-cally rich languageThe case of Finnish compared to Dutch andHebrewrdquo Journal of Experimental Psychology LearningMemoryand Cognition vol 30 pp 1271ndash1278 2004

[56] P Milin D F Durdevic and F M Del Prado Martın ldquoThesimultaneous effects of inflectional paradigms and classes onlexical recognition Evidence from Serbianrdquo Journal of Memoryand Language vol 60 no 1 pp 50ndash64 2009

[57] B K Bergen ldquoThe psychological reality of phonaesthemesrdquoLanguage vol 80 no 2 pp 290ndash311 2004

[58] P Milin L B Feldman M Ramscar P Hendrix and R HBaayen ldquoDiscrimination in lexical decisionrdquo PLoS ONE vol 12no 2 2017

[59] R H Baayen P Milin and M Ramscar ldquoFrequency in lexicalprocessingrdquo Aphasiology vol 30 no 11 pp 1174ndash1220 2016

[60] M Marelli and M Baroni ldquoAffixation in semantic space Mod-eling morpheme meanings with compositional distributionalsemanticsrdquo Psychological Review vol 122 no 3 pp 485ndash5152015

[61] T Sering PMilin andRH Baayen ldquoLanguage comprehensionas amultiple label classification problemrdquo StatisticaNeerlandicapp 1ndash15 2018

[62] R Kaye and RWilson Linear Algebra Oxford University Press1998

[63] S H Ivens and B L Koslin Demands for Reading LiteracyRequire New Accountability Methods Touchstone Applied Sci-ence Associates 1991

[64] T K Landauer P W Foltz and D Laham ldquoAn introduction tolatent semantic analysisrdquoDiscourse Processes vol 25 no 2-3 pp259ndash284 1998

[65] H Schmid ldquoImprovements in Part-of-Speech Tagging with anApplication to Germanrdquo in Proceedings of the ACL SIGDAT-Workshop pp 13ndash25 Dublin Ireland 1995

[66] R H Baayen R Piepenbrock and L Gulikers The CELEXLexical Database (CD-ROM) Linguistic Data ConsortiumUniversity of Pennsylvania Philadelphia PA 1995

[67] J Mitchell and M Lapata ldquoVector-based models of semanticcompositionrdquo in Proceedings of the ACL pp 236ndash244 2008

[68] A Lazaridou M Marelli R Zamparelli and M BaronildquoCompositionally derived representations of morphologicallycomplex words in distributional semanticsrdquo in Proceedings ofthe ACL (1) pp 1517ndash1526 2013

[69] R Cotterell H Schutze and J Eisner ldquoMorphological smooth-ing and extrapolation of word embeddingsrdquo in Proceedings ofthe 54th Annual Meeting of the Association for ComputationalLinguistics vol 1 pp 1651ndash1660 2016

[70] B D Zeller S Pado and J Snajder ldquoTowards semantic valida-tion of a derivational lexiconrdquo in Proceedings of the COLING2014 the 25th International Conference on Computational Lin-guistics Technical Papers pp 1728ndash1739 2014

[71] H Marchand The Categories and Types of Present-Day EnglishWord Formation A Synchronic-Diachronic Approach BeckrsquoscheVerlagsbuchhandlung Munchen Germany 1969

[72] L Bauer EnglishWord-Formation Cambridge University PressCambridge 1983

Complexity 37

[73] I PlagWord-Formation in English CambridgeUniversity PressCambridge 2003

[74] R Beard Lexeme-Morpheme Base Morphology A GeneralTheory of Inflection and Word Formation State University ofNew York Press Albany NY USA 1995

[75] K Geeraert J Newman and R H Baayen ldquoIdiom VariationExperimental Data and a Blueprint of a Computational ModelrdquoTopics in Cognitive Science vol 9 no 3 pp 653ndash669 2017

[76] C Shaoul S Bitschau N Schilling et al ldquondl2 Naive discrim-inative learning an implementation in R R packagerdquo 2015

[77] P Milin D Divjak and R H Baayen ldquoA learning perspectiveon individual differences in skilled reading Exploring andexploiting orthographic and semantic discrimination cuesrdquoJournal of Experimental Psychology Learning Memory andCognition vol 43 no 11 pp 1730ndash1751 2017

[78] M Ramscar P Hendrix C Shaoul P Milin and R H BaayenldquoNonlinear dynamics of lifelong learning the myth of cognitivedeclinerdquo Topics in Cognitive Science vol 6 pp 5ndash42 2014

[79] M Ramscar C C Sun P Hendrix and R H Baayen ldquoTheMis-measurement of Mind Life-Span Changes in Paired-Associate-Learning Scores Reflect the ldquoCostrdquo of Learning Not CognitiveDeclinerdquo Psychological Science vol 28 no 8 pp 1171ndash1179 2017

[80] G desRosiers and D Ivison ldquoPaired associate learning Form 1and form 2 of the Wechsler memory scalerdquo Archives of ClinicalNeuropsychology vol 3 no 1 pp 47ndash67 1988

[81] E Bruni N K Tran andM Baroni ldquoMultimodal distributionalsemanticsrdquo Journal of Artificial Intelligence Research vol 49 pp1ndash47 2014

[82] W N Venables and B D RipleyModern Applied Statistics withS-PLUS Springer NY USA 2002

[83] A B Warriner V Kuperman and M Brysbaert ldquoNorms ofvalence arousal and dominance for 13915 English lemmasrdquoBehavior Research Methods vol 45 no 4 pp 1191ndash1207 2013

[84] D Kastovsky ldquoProductivity in word formationrdquo Linguistics vol24 no 3 pp 585ndash600 1986

[85] R H Baayen Y Chuang and J P Blevins ldquoInflectionalmorphology with linear mappingsrdquoTheMental Lexicon vol 13no 2 pp 232ndash270 2018

[86] Burnage CELEX A Guide for Users Centre for Lexical Infor-mation Nijmegen The Netherlands 1988

[87] C McBride-Chang ldquoModels of Speech Perception and Phono-logical Processing in Readingrdquo Child Development vol 67 no4 pp 1836ndash1856 1996

[88] G C Van Orden H Kloos et al ldquoThe question of phonologyand readingrdquo inThe Science of Reading A Handbook pp 61ndash782005

[89] D Jared and K OrsquoDonnell ldquoSkilled adult readers activate themeanings of high-frequency words using phonology Evidencefrom eye trackingrdquoMemoryampCognition vol 45 no 2 pp 334ndash346 2017

[90] G S Dell ldquoA Spreading-Activation Theory of Retrieval inSentence Productionrdquo Psychological Review vol 93 no 3 pp283ndash321 1986

[91] A Marantz ldquoNo escape from morphemes in morphologicalprocessingrdquo Language and Cognitive Processes vol 28 no 7 pp905ndash916 2013

[92] M Bozic W D Marslen-Wilson E A Stamatakis M HDavis and L K Tyler ldquoDifferentiating morphology formand meaning Neural correlates of morphological complexityrdquoCognitive Neuroscience vol 19 no 9 pp 1464ndash1475 2007

[93] R F Port andA P Leary ldquoAgainst formal phonologyrdquoLanguagevol 81 no 4 pp 927ndash964 2005

[94] J P Blevins ldquoStems and paradigmsrdquo Language vol 79 no 4 pp737ndash767 2003

[95] R Forsyth and D Holmes ldquoFeature-finding for test classifica-tionrdquo Literary and Linguistic Computing vol 11 no 4 pp 163ndash174 1996

[96] H Pham and R H Baayen ldquoVietnamese compounds showan anti-frequency effect in visual lexical decisionrdquo LanguageCognition and Neuroscience vol 30 no 9 pp 1077ndash1095 2015

[97] L Cohen and S Dehaene ldquoVentral and dorsal contributions toword readingrdquo inTheCognitive Neurosciences M S GazzanigaEd pp 789ndash804 Massachusetts Institute of Technology 2009

[98] E Keuleers P Lacey K Rastle and M Brysbaert ldquoThe BritishLexicon Project Lexical decision data for 28730 monosyllabicand disyllabic English wordsrdquo Behavior Research Methods vol44 no 1 pp 287ndash304 2012

[99] R K Wagner J K Torgesen and C A Rashotte ldquoDevel-opment of reading-related phonological processing abilitiesnew evidence of bidirectional causality from a latent variablelongitudinal studyrdquoDevelopmental Psychology vol 30 no 1 pp73ndash87 1994

[100] K F EWong andH-C Chen ldquoOrthographic and phonologicalprocessing in reading Chinese text Evidence from eye fixa-tionsrdquo Language and Cognitive Processes vol 14 no 5-6 pp461ndash480 1999

[101] R L Newman D Jared and C A Haigh ldquoDoes phonologyplay a role when skilled readers read high-frequency wordsEvidence from ERPsrdquo Language and Cognitive Processes vol 27no 9 pp 1361ndash1384 2012

[102] D Jared J Ashby S J Agauas and B A Levy ldquoPhonologicalactivation of word meanings in grade 5 readersrdquo Journal ofExperimental Psychology LearningMemory andCognition vol42 no 4 pp 524ndash541 2016

[103] T Bitan A Kaftory A Meiri-Leib Z Eviatar and O PelegldquoPhonological ambiguity modulates resolution of semanticambiguity during reading An fMRI study of Hebrewrdquo Neu-ropsychology vol 31 no 7 pp 759ndash777 2017

[104] D Jared and S Bainbridge ldquoReading homophone puns Evi-dence from eye trackingrdquo Canadian Journal of ExperimentalPsychology vol 71 no 1 pp 2ndash13 2017

[105] S Amenta M Marelli and S Sulpizio ldquoFrom sound tomeaning Phonology-to-Semantics mapping in visual wordrecognitionrdquo Psychonomic Bulletin amp Review vol 24 no 3 pp887ndash893 2017

[106] M Perrone-Bertolotti J Kujala J R Vidal et al ldquoHow silent issilent reading intracerebral evidence for top-down activationof temporal voice areas during readingrdquo The Journal of Neuro-science vol 32 no 49 pp 17554ndash17562 2012

[107] B Yao P Belin andC Scheepers ldquoSilent reading of direct versusindirect speech activates voice-selective areas in the auditorycortexrdquo Journal of Cognitive Neuroscience vol 23 no 10 pp3146ndash3152 2011

[108] M Coltheart B Curtis P Atkins and M Haller ldquoMod-els of Reading Aloud Dual-Route and Parallel-Distributed-Processing Approachesrdquo Psychological Review vol 100 no 4pp 589ndash608 1993

[109] M Coltheart ldquoModeling reading The dual-route approachrdquo inThe Science of Reading A Handbook pp 6ndash23 2005

38 Complexity

[110] M Pitt K Johnson E Hume S Kiesling and W RaymondldquoThe Buckeye corpus of conversational speech Labeling con-ventions and a test of transcriber reliabilityrdquo Speech Communi-cation vol 45 no 1 pp 89ndash95 2005

[111] K Johnson ldquoMassive reduction in conversational AmericanEnglishrdquo in Proceedings of the Spontaneous Speech Data andAnalysis Proceedings of the 1st Session of the 10th InternationalSymposium pp 29ndash54 The National International Institute forJapanese Language Tokyo Japan 2004

[112] C Phillips ldquoLevels of representation in the electrophysiology ofspeech perceptionrdquo Cognitive Science vol 25 no 5 pp 711ndash7312001

[113] R L Diehl A J Lotto and L L Holt ldquoSpeech perceptionrdquoAnnual Review of Psychology vol 55 pp 149ndash179 2004

[114] D Norris and J M McQueen ldquoShortlist B a Bayesian model ofcontinuous speech recognitionrdquo Psychological Review vol 115no 2 pp 357ndash395 2008

[115] S Hawkins ldquoRoles and representations of systematic finephonetic detail in speech understandingrdquo Journal of Phoneticsvol 31 no 3-4 pp 373ndash405 2003

[116] CCucchiarini andH Strik ldquoAutomatic Phonetic TranscriptionAn overviewrdquo in Proceedings of the 15th ICPhS pp 347ndash350Barcelona Spain 2003

[117] K JohnsonTheAuditoryPerceptual Basis for Speech Segmenta-tion Ohio State University Working Papers in Linguistics 1997

[118] H Fletcher ldquoAuditory patternsrdquo Reviews of Modern Physics vol12 no 1 pp 47ndash65 1940

[119] D Arnold ldquoAcousticndlcoder Coding sound files for use withndl R package version 101rdquo 2017

[120] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2016

[121] J Pickett and I Pollack ldquoIntelligibility of excerpts from fluentspeech Effects of rate of utterance and duration of excerptrdquoLanguage and Speech vol 6 pp 151ndash164 1963

[122] L Shockey ldquoPerception of reduced forms by non-native speak-ers of Englishrdquo in Sound Patterns of Spontaneous Speech DDuez Ed pp 97ndash100 ESCA Aix 1998

[123] M Ernestus H Baayen and R Schreuder ldquoThe recognition ofreduced word formsrdquo Brain and Language vol 81 no 1-3 pp162ndash173 2002

[124] R H Baayen C Shaoul J Willits and M Ramscar ldquoCom-prehension without segmentation a proof of concept withnaive discriminative learningrdquo Language Cognition and Neu-roscience vol 31 no 1 pp 106ndash128 2016

[125] C P Browman and L Goldstein ldquoArticulatory phonology Anoverviewrdquo Phonetica vol 49 no 3-4 pp 155ndash180 1992

[126] J A Tourville and F H Guenther ldquoThe DIVA model A neuraltheory of speech acquisition and productionrdquo Language andCognitive Processes vol 26 no 7 pp 952ndash981 2011

[127] G Csardi and T Nepusz ldquoThe igraph software package forcomplex network researchrdquo InterJournal Complex Systems vol1695 no 5 pp 1ndash9 2006

[128] M Ernestus and R H Baayen ldquoPredicting the unpredictableInterpreting neutralized segments in Dutchrdquo Language vol 79no 1 pp 5ndash38 2003

[129] R Weingarten G Nottbusch and U Will ldquoMorphemes sylla-bles and graphemes in written word productionrdquo inMultidisci-plinary Approaches to Speech Production T Pechmann and CHabel Eds pp 529ndash572 Mouton de Gruyter Berlin Germany2004

[130] R Bertram F E Toslashnnessen S Stromqvist J Hyona andP Niemi ldquoCascaded processing in written compound wordproductionrdquo Frontiers in Human Neuroscience vol 9 pp 1ndash102015

[131] T Cho ldquoEffects of morpheme boundaries on intergesturaltiming Evidence from Koreanrdquo Phonetica vol 58 no 3 pp129ndash162 2001

[132] S N Wood Generalized Additive Models Chapman ampHallCRC NY USA 2017

[133] M Seidenberg ldquoSublexical structures in visual word recogni-tion Access units or orthographic redundancyrdquo in Attentionand Performance XII M Coltheart Ed pp 245ndash264 LawrenceErlbaum Associates Hove 1987

[134] J M McQueen ldquoSegmentation of continuous speech usingphonotacticsrdquo Journal of Memory and Language vol 39 no 1pp 21ndash46 1998

[135] J B Hay Causes and Consequences of Word Structure Rout-ledge NY USA London UK 2003

[136] J B Hay ldquoFrom speech perception to morphology Affix-ordering revisitedrdquo Language vol 78 pp 527ndash555 2002

[137] J B Hay and R H Baayen ldquoPhonotactics parsing and produc-tivityrdquo Italian Journal of Linguistics vol 15 no 1 pp 99ndash1302003

[138] J Hay J Pierrehumbert and M Beckman ldquoSpeech perceptionwell-formedness and the statistics of the lexiconrdquo Papers inlaboratory phonology VI pp 58ndash74 2004

[139] M Ferro C Marzi and V Pirrelli ldquoA self-organizing modelof word storage and processing implications for morphologylearningrdquo Lingue e Linguaggio vol 10 no 2 pp 209ndash226 2011

[140] F ChersiM Ferro G Pezzulo andV Pirrelli ldquoTopological self-organization and prediction learning support both action andlexical chains in the brainrdquo Topics in Cognitive Science vol 6no 3 pp 476ndash491 2014

[141] V Pirrelli M Ferro and C Marzi ldquoComputational complexityof abstractive morphologyrdquo in Understanding and MeasuringMorphological Complexity M Bearman D Brown and G GCorbett Eds pp 141ndash166 Oxford University Press 2015

[142] C Marzi M Ferro and V Pirrelli ldquoIs inflectional irregularitydysfunctional to human processingrdquo in Abstract Booklet TheMental Lexicon 2018 V Kuperman Ed University of AlbertaEdmonton 2018

[143] E Keuleers M Stevens P Mandera and M Brysbaert ldquoWordknowledge in the crowd Measuring vocabulary size and wordprevalence in a massive online experimentrdquo The QuarterlyJournal of Experimental Psychology vol 68 no 8 pp 1665ndash16922015

[144] M Ernestus Voice Assimilation And Segment Reduction inCasual Dutch A Corpus-Based Study of the Phonology-PhoneticsInterface LOT Utrecht the Netherlands 2000

[145] R Kemps M Ernestus R Schreuder and H Baayen ldquoProcess-ing reducedword formsThe suffix restoration effectrdquoBrain andLanguage vol 90 no 1-3 pp 117ndash127 2004

[146] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) vol 1 pp 886ndash893 June 2005

[147] G Hickok ldquoThe architecture of speech production and the roleof the phoneme in speech processingrdquo Language Cognition andNeuroscience vol 29 no 1 pp 2ndash20 2014

[148] B V Tucker M Sims and R H Baayen Opposing Forceson Acoustic Duration Manuscript University of Alberta andUniversity of Tubingen 2018

Complexity 39

[149] I Plag J Homann and G Kunter ldquoHomophony and mor-phology The acoustics of word-final S in Englishrdquo Journal ofLinguistics vol 53 no 1 pp 181ndash216 2017

[150] F Tomaschek I Plag M Ernestus and R H BaayenModeling the Duration of Word-Final s in English withNaive Discriminative Learning Manuscript University ofSiegenTubingenNijmegen 2018

[151] R E Kalman ldquoA new approach to linear filtering and predictionproblemsrdquo Journal of Fluids Engineering vol 82 no 1 pp 35ndash451960

[152] B Russell ldquoOn denotingrdquo Mind vol 14 no 56 pp 479ndash4931905

[153] N Hornstein Logical Form FromGB toMinimalism Blackwell1995

[154] R Jackendoff Semantic StructuresMITPress Cambridge 1990[155] J J McCarthy ldquoA prosodic theory of non-concatenative mor-

phologyrdquo Linguistic Inquiry vol 12 pp 373ndash418 1981[156] A Ussishkin ldquoA Fixed Prosodic Theory of Nonconcatenative

Templaticmorphologyrdquo Natural Language amp Linguistic Theoryvol 23 no 1 pp 169ndash218 2005

[157] A Ussishkin ldquoAffix-favored contrast inequity and psycholin-guistic grounding for non-concatenative morphologyrdquo Mor-phology vol 16 no 1 pp 107ndash125 2006

[158] RW Young andWMorganTheNavajo Language A Grammarand Colloquial Dictionary University of New Mexico Press1980

[159] S C Levinson and A Majid ldquoThe island of time Yelı Dnye thelanguage of Rossel islandrdquo Frontiers in psychology vol 4 2013

[160] S C Levinson ldquoThe language of space in Yelı Dnyerdquo inGrammars of Space Explorations in Cognitive Diversity pp 157ndash203 Cambridge University Press 2006

[161] E Shafaei-Bajestan and R H Baayen ldquoWide Learning forAuditory Comprehensionrdquo in Proceedings of the Interspeech2018 2018

[162] J Hay and K Drager ldquoStuffed toys and speech perceptionrdquoLinguistics vol 48 no 4 pp 865ndash892 2010

[163] V Henri and C Henri ldquoOn our earliest recollections ofchildhoodrdquo Psychological Review vol 2 pp 215-216 1895

[164] S Freud ldquoChildhood and concealing memoriesrdquo in The BasicWritings of Sigmund Freud The Modern Library NY USA19051953

[165] P J Bauer and M Larkina ldquoThe onset of childhood amnesiain childhood A prospective investigation of the course anddeterminants of forgetting of early-life eventsrdquoMemory vol 22no 8 pp 907ndash924 2014

[166] M S Seidenberg and L M Gonnerman ldquoExplaining deriva-tional morphology as the convergence of codesrdquo Trends inCognitive Sciences vol 4 no 9 pp 353ndash361 2000

[167] M Coltheart K Rastle C Perry R Langdon and J ZieglerldquoDRC a dual route cascaded model of visual word recognitionand reading aloudrdquoPsychological Review vol 108 no 1 pp 204ndash256 2001

[168] M V Butz and E F Kutter How the Mind Comes into BeingIntroducing Cognitive Science from a Functional and Computa-tional Perspective Oxford University Press 2016

[169] M W Harm and M S Seidenberg ldquoPhonology readingacquisition and dyslexia Insights from connectionist modelsrdquoPsychological Review vol 106 no 3 pp 491ndash528 1999

[170] L M Gonnerman M S Seidenberg and E S AndersenldquoGraded semantic and phonological similarity effects in prim-ing Evidence for a distributed connectionist approach to

morphologyrdquo Journal of Experimental Psychology General vol136 no 2 pp 323ndash345 2007

[171] H Zhao O Gallo I Frosio and J Kautz ldquoLoss Functions forImage Restoration With Neural Networksrdquo IEEE Transactionson Computational Imaging vol 3 no 1 pp 47ndash57 2017

[172] C Clarke and P Luce ldquoPerceptual adaptation to speakercharacteristics Vot boundaries in stop voicing categorizationrdquoin Proceedings of the ISCA Workshop on Plasticity in SpeechPerception 2005

[173] DNorris JMMcQueen andACutler ldquoPerceptual learning inspeechrdquo Cognitive Psychology vol 47 no 2 pp 204ndash238 2003

[174] A Schweitzer and N Lewandowski ldquoSocial factors in conver-gence of f1 and f2 in spontaneous speechrdquo in Proceedings ofthe 10th International Seminar on Speech Production CologneGermany 2014

[175] M J Pickering and S Garrod ldquoToward a mechanistic psychol-ogy of dialoguerdquo Behavioral and Brain Sciences vol 27 no 2 pp169ndash190 2004

[176] C M Clarke and M F Garrett ldquoRapid adaptation to foreign-accented Englishrdquo The Journal of the Acoustical Society ofAmerica vol 116 no 6 pp 3647ndash3658 2004

[177] W J Levelt ldquoDie konnektionistische moderdquo Sprache UndKognition vol 10 no 2 pp 61ndash72 1991

[178] A Kostic ldquoInformational load constraints on processinginflected morphologyrdquo in Morphological Aspects of LanguageProcessing L B Feldman Ed Lawrence Erlbaum Inc Publish-ers New Jersey USA 1995

[179] J P BlevinsWord andParadigmMorphology OxfordUniversityPress 2016

[180] E Forstemann Uber Deutsche volksetymologie Zeitschrift furvergleichende Sprachforschung auf dem Gebiete des DeutschenGriechischen und Lateinischen 1852

[181] K Nault Morphological Therapy Protocal [Ph D thesis] Uni-versity of Alberta Edmonton 2010

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 8: The Discriminative Lexicon: A Unified Computational Model ...downloads.hindawi.com/journals/complexity/2019/4895891.pdf · as developed within computational linguistics and com-puter

8 Complexity

predictor of that sameword occurring in that sentenceWhenthe focus is on semantic similarity it is useful to set the maindiagonal of this matrix to zero However for more generalassociation strengths between words the diagonal elementsare informative and should be retained

Unlike standard models of distributional semantics wedid not carry out any dimension reduction using forinstance singular value decomposition Because we do notwork with latent variables each dimension of our semanticspace is given by a column vector of S and hence eachdimension is linked to a specific lexomeThus the rowvectorsof S specify for each of the lexomes how well this lexomediscriminates between all the lexomes known to the model

Although semantic vectors of length 23562 can providegood results vectors of this length can be challenging forstatistical evaluation Fortunately it turns out that many ofthe column vectors of S are characterized by extremely smallvariance Such columns can be removed from the matrixwithout loss of accuracy In practice we have found it sufficestoworkwith approximately 4 to 5 thousand columns selectedcalculating column variances and using only those columnswith a variance exceeding a threshold value

32 Validation of the Semantic Vector Space As shown byBaayen et al [59] and Milin et al [58 77] measures basedon matrices such as S are predictive for behavioral measuressuch as reaction times in the visual lexical decision taskas well as for self-paced reading latencies In what followswe first validate the semantic vectors of S on two data setsone data set with accuracies in paired associate learningand one dataset with semantic similarity ratings We thenconsidered specifically the validity of semantic vectors forinflectional and derivational functions by focusing first onthe correlational structure of the pertinent semantic vectorsfollowed by an examination of the predictivity of the semanticvectors for semantic plausibility and semantic transparencyratings for derived words

321 PairedAssociate Learning Thepaired associate learning(PAL) task is a widely used test in psychology for evaluatinglearning and memory Participants are given a list of wordpairs tomemorize Subsequently at testing they are given thefirst word and have to recall the second wordThe proportionof correctly recalled words is the accuracy measure on thePAL task Accuracy on the PAL test decreases with age whichhas been attributed to cognitive decline over the lifetimeHowever Ramscar et al [78] and Ramscar et al [79] provideevidence that the test actually measures the accumulationof lexical knowledge In what follows we use the data onPAL performance reported by desRosiers and Ivison [80]We fitted a linear mixed model to accuracy in the PAL taskas a function of the Pearson correlation 119903 of paired wordsrsquosemantic vectors in S (but with weights on the main diagonalincluded and using the 4275 columns with highest variance)with random intercepts for word pairs sex and age as controlvariables and crucially an interaction of 119903 by age Given thefindings of Ramscar and colleagues we expect to find that theslope of 119903 (which is always negative) as a predictor of PAL

00 01 02 03 04 05r

010

2030

4050

simila

rity

ratin

g

Figure 1 Similarity rating in the MEN dataset as a function of thecorrelation 119903 between the row vectors in S of the words in the testpairs

accuracy increases with age (indicating decreasing accuracy)Table 1 shows that this prediction is born out For age group20ndash29 (the reference level of age) the slope for 119903 is estimatedat 031 For the next age level this slope is adjusted upward by012 and as age increases these upward adjustments likewiseincrease to 021 024 and 032 for age groups 40ndash49 50-59 and 60-69 respectively Older participants know theirlanguage better and hence are more sensitive to the semanticsimilarity (or lack thereof) of thewords thatmake up PAL testpairs For the purposes of the present study the solid supportfor 119903 as a predictor for PAL accuracy contributes to validatingthe row vectors of S as semantic vectors

322 Semantic Relatedness Ratings We also examined per-formance of S on the MEN test collection [81] that providesfor 3000 word pairs crowd sourced ratings of semanticsimilarity For 2267 word pairs semantic vectors are availablein S for both words Figure 1 shows that there is a nonlinearrelation between the correlation of wordsrsquo semantic vectorsin S and the ratings The plot shows as well that for lowcorrelations the variability in the MEN ratings is larger Wefitted a Gaussian location scale additive model to this dataset summarized in Table 2 which supported 119903 as a predictorfor both mean MEN rating and the variability in the MENratings

To put this performance in perspective wecollected the latent semantic analysis (LSA) similarityscores for the MEN word pairs using the website athttplsacoloradoedu The Spearman correlationfor LSA scores andMEN ratings was 0697 and that for 119903 was0704 (both 119901 lt 00001) Thus our semantic vectors performon a par with those of LSA a well-established older techniquethat still enjoys wide use in psycholinguistics Undoubtedlyoptimized techniques from computational linguistics such asword2vec will outperform our model The important pointhere is that even with training on full sentences rather thanusing small windows and even when including function

Complexity 9

Table 1 Linear mixed model fit to paired associate learning scores with the correlation 119903 of the row vectors in S of the paired words aspredictor Treatment coding was used for factors For the youngest age group 119903 is not predictive but all other age groups show increasinglylarge slopes compared to the slope of the youngest age group The range of 119903 is [minus488 minus253] hence larger values for the coefficients of 119903 andits interactions imply worse performance on the PAL task

A parametric coefficients Estimate Std Error t-value p-valueintercept 36220 06822 53096 lt 00001119903 03058 01672 18293 00691age=39 02297 01869 12292 02207age=49 04665 01869 24964 00135age=59 06078 01869 32528 00014age=69 08029 01869 42970 lt 00001sex=male -01074 00230 -46638 lt 00001119903age=39 01167 00458 25490 00117119903age=49 02090 00458 45640 lt 00001119903age=59 02463 00458 53787 lt 00001119903age=69 03239 00458 70735 lt 00001B smooth terms edf Refdf F-value p-valuerandom intercepts word pair 178607 180000 1282283 lt 00001

Table 2 Summary of a Gaussian location scale additive model fitted to the similarity ratings in the MEN dataset with as predictor thecorrelation 119903 of wordrsquos semantic vectors in the Smatrix TPRS thin plate regression spline b theminimum standard deviation for the logb linkfunction Location parameter estimating the mean scale parameter estimating the variance through the logb link function (120578 = log(120590 minus 119887))A parametric coefficients Estimate Std Error t-value p-valueIntercept [location] 252990 01799 1406127 lt 00001Intercept [scale] 21297 00149 1430299 lt 00001B smooth terms edf Refdf F-value p-valueTPRS 119903 [location] 83749 88869 27662399 lt 00001TPRS 119903 [scale] 59333 70476 875384 lt 00001

words performance of our linear network with linguisticallymotivated lexomes is sufficiently high to serve as a basis forfurther analysis

323 Correlational Structure of Inflectional and DerivationalVectors Now that we have established that the semantic vec-tor space S obtainedwith perhaps the simplest possible error-driven learning algorithm discriminating between outcomesgiven multiple cues (1) indeed captures semantic similaritywe next consider how the semantic vectors of inflectional andderivational lexomes cluster in this space Figure 2 presents aheatmap for the correlation matrix of the functional lexomesthat we implemented as described in Section 31

Nominal and verbal inflectional lexomes as well as adver-bial ly cluster in the lower left corner and tend to be eithernot correlated with derivational lexomes (indicated by lightyellow) or to be negatively correlated (more reddish colors)Some inflectional lexomes however are interspersed withderivational lexomes For instance comparative superla-tive and perfective form a small subcluster togetherwithin the group of derivational lexomes

The derivational lexomes show for a majority of pairssmall positive correlationsand stronger correlations in thecase of the verb-forming derivational lexomes mis over

under and undo which among themselves show thevstrongest positive correlations of allThe inflectional lexomescreating abstract nouns ion ence ity and ness also form asubcluster In other words Figure 2 shows that there is somestructure to the distribution of inflectional and derivationallexomes in the semantic vector space

Derived words (but not inflected words) have their owncontent lexomes and hence their own row vectors in S Do thesemantic vectors of these words cluster according to the theirformatives To address this question we extracted the rowvectors from S for a total of 3500 derivedwords for 31 differentformatives resulting in a 3500 times 23562 matrix 119863 sliced outof S To this matrix we added the semantic vectors for the 31formatives We then used linear discriminant analysis (LDA)to predict a lexomersquos formative from its semantic vector usingthe lda function from the MASS package [82] (for LDA towork we had to remove columns from D with very smallvariance as a consequence the dimension of the matrix thatwas the input to LDA was 3531 times 814) LDA accuracy for thisclassification task with 31 possible outcomes was 072 All 31derivational lexomes were correctly classified

When the formatives are randomly permuted thus break-ing the relation between the semantic vectors and theirformatives LDA accuracy was on average 0528 (range across

10 Complexity

SGPR

ESEN

TPE

RSIS

TEN

CE PLCO

NTI

NU

OU

SPA

ST LYG

ENPE

RSO

N3

PASS

IVE

CAN

FUTU

REU

ND

OU

ND

ERO

VER MIS

ISH

SELF IZ

EIN

STRU

MEN

TD

IS IST

OU

GH

T EE ISM

SUB

COM

PARA

TIV

ESU

PERL

ATIV

EPE

RFEC

TIV

EPE

RSO

N1

ION

ENCE IT

YN

ESS Y

SHA

LLO

RDIN

AL

OU

TAG

AIN

FUL

LESS

ABL

EAG

ENT

MEN

T ICO

US

NO

TIV

E

SGPRESENTPERSISTENCEPLCONTINUOUSPASTLYGENPERSON3PASSIVECANFUTUREUNDOUNDEROVERMISISHSELFIZEINSTRUMENTDISISTOUGHTEEISMSUBCOMPARATIVESUPERLATIVEPERFECTIVEPERSON1IONENCEITYNESSYSHALLORDINALOUTAGAINFULLESSABLEAGENTMENTICOUSNOTIVE

minus02 0 01 02Value

020

060

010

00Color Key

and Histogram

Cou

nt

Figure 2 Heatmap for the correlation matrix of the row vectors in S of derivational and inflectional function lexomes (individual words willbecome legible by zooming in on the figure at maximal magnification)

10 permutations 0519ndash0537) From this we conclude thatderived words show more clustering in the semantic space ofS than can be expected under randomness

How are the semantic vectors of derivational lexomespositioned with respect to the clusters of their derived wordsTo address this question we constructed heatmaps for thecorrelation matrices of the pertinent semantic vectors Anexample of such a heatmap is presented for ness in Figure 3Apart from the existence of clusters within the cluster of nesscontent lexomes it is striking that the ness lexome itself isfound at the very left edge of the dendrogram and at the very

left column and bottom row of the heatmapThe color codingindicates that surprisingly the ness derivational lexome isnegatively correlated with almost all content lexomes thathave ness as formative Thus the semantic vector of nessis not a prototype at the center of its cloud of exemplarsbut an antiprototype This vector is close to the cloud ofsemantic vectors but it is outside its periphery This patternis not specific to ness but is found for the other derivationallexomes as well It is intrinsic to our model

The reason for this is straightforward During learningalthough for a derived wordrsquos lexome 119894 and its derivational

Complexity 11

NES

Ssle

eple

ssne

ssas

sert

iven

ess

dire

ctne

ssso

undn

ess

bign

ess

perm

issiv

enes

sap

prop

riate

ness

prom

ptne

sssh

rew

dnes

sw

eakn

ess

dark

ness

wild

erne

ssbl

ackn

ess

good

ness

kind

ness

lone

lines

sfo

rgiv

enes

ssti

llnes

sco

nsci

ousn

ess

happ

ines

ssic

knes

saw

aren

ess

blin

dnes

sbr

ight

ness

wild

ness

mad

ness

grea

tnes

sso

ftnes

sth

ickn

ess

tend

erne

ssbi

ttern

ess

effec

tiven

ess

hard

ness

empt

ines

sw

illin

gnes

sfo

olish

ness

eage

rnes

sne

atne

sspo

liten

ess

fresh

ness

high

ness

nerv

ousn

ess

wet

ness

unea

sines

slo

udne

ssco

olne

ssid

lene

ssdr

ynes

sbl

uene

ssda

mpn

ess

fond

ness

liken

ess

clean

lines

ssa

dnes

ssw

eetn

ess

read

ines

scle

vern

ess

noth

ingn

ess

cold

ness

unha

ppin

ess

serio

usne

ssal

ertn

ess

light

ness

uglin

ess

flatn

ess

aggr

essiv

enes

sse

lfminusco

nsci

ousn

ess

ruth

lessn

ess

quie

tnes

sva

guen

ess

shor

tnes

sbu

sines

sho

lines

sop

enne

ssfit

ness

chee

rful

ness

earn

estn

ess

right

eous

ness

unpl

easa

ntne

ssqu

ickn

ess

corr

ectn

ess

disti

nctiv

enes

sun

cons

ciou

snes

sfie

rcen

ess

orde

rline

ssw

hite

ness

talln

ess

heav

ines

sw

icke

dnes

sto

ughn

ess

lawle

ssne

ssla

zine

ssnu

mbn

ess

sore

ness

toge

ther

ness

resp

onsiv

enes

scle

arne

ssne

arne

ssde

afne

ssill

ness

dizz

ines

sse

lfish

ness

fatn

ess

rest

lessn

ess

shyn

ess

richn

ess

thor

ough

ness

care

less

ness

tired

ness

fairn

ess

wea

rines

stig

htne

ssge

ntle

ness

uniq

uene

ssfr

iend

lines

sva

stnes

sco

mpl

eten

ess

than

kful

ness

slow

ness

drun

kenn

ess

wre

tche

dnes

sfr

ankn

ess

exac

tnes

she

lples

snes

sat

trac

tiven

ess

fulln

ess

roug

hnes

sm

eann

ess

hope

lessn

ess

drow

sines

ssti

ffnes

sclo

sene

ssgl

adne

ssus

eful

ness

firm

ness

rude

ness

bold

ness

shar

pnes

sdu

llnes

sw

eigh

tless

ness

stra

ngen

ess

smoo

thne

ss

NESSsleeplessnessassertivenessdirectnesssoundnessbignesspermissivenessappropriatenesspromptnessshrewdnessweaknessdarknesswildernessblacknessgoodnesskindnesslonelinessforgivenessstillnessconsciousnesshappinesssicknessawarenessblindnessbrightnesswildnessmadnessgreatnesssoftnessthicknesstendernessbitternesseffectivenesshardnessemptinesswillingnessfoolishnesseagernessneatnesspolitenessfreshnesshighnessnervousnesswetnessuneasinessloudnesscoolnessidlenessdrynessbluenessdampnessfondnesslikenesscleanlinesssadnesssweetnessreadinessclevernessnothingnesscoldnessunhappinessseriousnessalertnesslightnessuglinessflatnessaggressivenessselfminusconsciousnessruthlessnessquietnessvaguenessshortnessbusinessholinessopennessfitnesscheerfulnessearnestnessrighteousnessunpleasantnessquicknesscorrectnessdistinctivenessunconsciousnessfiercenessorderlinesswhitenesstallnessheavinesswickednesstoughnesslawlessnesslazinessnumbnesssorenesstogethernessresponsivenessclearnessnearnessdeafnessillnessdizzinessselfishnessfatnessrestlessnessshynessrichnessthoroughnesscarelessnesstirednessfairnesswearinesstightnessgentlenessuniquenessfriendlinessvastnesscompletenessthankfulnessslownessdrunkennesswretchednessfranknessexactnesshelplessnessattractivenessfullnessroughnessmeannesshopelessnessdrowsinessstiffnessclosenessgladnessusefulnessfirmnessrudenessboldnesssharpnessdullnessweightlessnessstrangenesssmoothness

Color Keyand Histogram

0 05minus05

Value

020

0040

0060

0080

00C

ount

Figure 3 Heatmap for the correlation matrix of lexomes for content words with ness as well as the derivational lexome of ness itself Thislexome is found at the very left edge of the dendrograms and is negatively correlated with almost all content lexomes (Individual words willbecome legible by zooming in on the figure at maximal magnification)

lexome ness are co-present cues the derivational lexomeoccurs in many other words 119895 and each time anotherword 119895 is encountered weights are reduced from ness to119894 As this happens for all content lexomes the derivationallexome is during learning slowly but steadily discriminatedaway from its content lexomes We shall see that this is animportant property for our model to capture morphologicalproductivity for comprehension and speech production

When the additive model of Mitchell and Lapata [67]is used to construct a semantic vector for ness ie when

the average vector is computed for the vectors obtained bysubtracting the vector of the derived word from that of thebase word the result is a vector that is embedded insidethe cluster of derived vectors and hence inherits semanticidiosyncracies from all these derived words

324 Semantic Plausibility and Transparency Ratings forDerived Words In order to obtain further evidence for thevalidity of inflectional and derivational lexomes we re-analyzed the semantic plausibility judgements for word pairs

12 Complexity

consisting of a base and a novel derived form (eg accentaccentable) reported byMarelli and Baroni [60] and availableat httpcliccimecunitnitcomposesFRACSSThe number of pairs available to us for analysis 236 wasrestricted compared to the original 2559 pairs due to theconstraints that the base words had to occur in the tasacorpus Furthermore we also explored the role of wordsrsquoemotional valence arousal and dominance as availablein Warriner et al [83] which restricted the number ofitems even further The reason for doing so is that humanjudgements such as those of age of acquisition may reflectdimensions of emotion [59]

A measure derived from the semantic vectors of thederivational lexomes activation diversity and the L1-normof the semantic vector (the sum of the absolute values of thevector elements ie its city-block distance) turned out tobe predictive for these plausibility ratings as documented inTable 3 and the left panel of Figure 4 Activation diversity is ameasure of lexicality [58] In an auditory word identificationtask for instance speech that gives rise to a low activationdiversity elicits fast rejections whereas speech that generateshigh activation diversity elicits higher acceptance rates but atthe cost of longer response times [18]

Activation diversity interacted with word length Agreater word length had a strong positive effect on ratedplausibility but this effect progressively weakened as acti-vation diversity increases In turn activation diversity hada positive effect on plausibility for shorter words and anegative effect for longer words Apparently the evidencefor lexicality that comes with higher L1-norms contributespositively when there is little evidence coming from wordlength As word length increases and the relative contributionof the formative in theword decreases the greater uncertaintythat comes with higher lexicality (a greater L1-norm impliesmore strong links to many other lexomes) has a detrimentaleffect on the ratings Higher arousal scores also contributedto higher perceived plausibility Valence and dominance werenot predictive and were therefore left out of the specificationof themodel reported here (word frequency was not includedas predictor because all derived words are neologisms withzero frequency addition of base frequency did not improvemodel fit 119901 gt 091)

The degree to which a complex word is semanticallytransparent with respect to its base word is of both practi-cal and theoretical interest An evaluation of the semantictransparency of complex words using transformation matri-ces is developed in Marelli and Baroni [60] Within thepresent framework semantic transparency can be examinedstraightforwardly by comparing the correlations of (i) thesemantic vectors of the base word to which the semanticvector of the affix is added with (ii) the semantic vector ofthe derived word itself The more distant the two vectors arethe more negative their correlation should be This is exactlywhat we find For ness for instance the six most negativecorrelations are business (119903 = minus066) effectiveness (119903 = minus051)awareness (119903 = minus050) loneliness (119903 = minus045) sickness (119903 =minus044) and consciousness (119903 = minus043) Although ness canhave an anaphoric function in discourse [84] words such asbusiness and consciousness have a much deeper and richer

semantics than just reference to a previously mentioned stateof affairs A simple comparison of the wordrsquos actual semanticvector in S (derived from the TASA corpus) and its semanticvector obtained by accumulating evidence over base and affix(ie summing the vectors of base and affix) brings this outstraightforwardly

However evaluating correlations for transparency isprone to researcher bias We therefore also investigated towhat extent the semantic transparency ratings collected byLazaridou et al [68] can be predicted from the activationdiversity of the semantic vector of the derivational lexomeThe original dataset of Lazaridou and colleagues comprises900 words of which 330 meet the criterion of occurringmore than 8 times in the tasa corpus and for which wehave semantic vectors available The summary of a Gaussianlocation-scale additivemodel is given in Table 4 and the rightpanel of Figure 4 visualizes the interaction of word lengthby activation diversity Although modulated by word lengthoverall transparency ratings decrease as activation diversityis increased Apparently the stronger the connections aderivational lexome has with other lexomes the less clearit becomes what its actual semantic contribution to a novelderived word is In other words under semantic uncertaintytransparency ratings decrease

4 Comprehension

Now that we have established that the present semanticvectors make sense even though they are based on a smallcorpus we next consider a comprehension model that hasform vectors as input and semantic vectors as output (a pack-age for R implementing the comprehension and productionalgorithms of linear discriminative learning is available athttpwwwsfsuni-tuebingendesimhbaayenpublicationsWpmWithLdl 10targz The package isdescribed in Baayen et al [85]) We begin with introducingthe central concepts underlying mappings from form tomeaning We then discuss visual comprehension and thenturn to auditory comprehension

41 Setting Up the Mapping Let C denote the cue matrix amatrix that specifies for each word (rows) the form cues ofthat word (columns) For a toy lexicon with the words onetwo and three the Cmatrix is

C

= wV wVn Vn tu tu Tr Tri rionetwothree

( 100100

100010

010001

001001 ) (12)

where we use the disc keyboard phonetic alphabet (theldquoDIstinct SingleCharacterrdquo representationwith one characterfor each phoneme was introduced by CELEX [86]) fortriphones the symbol denotes the word boundary Supposethat the semantic vectors for these words are the row vectors

Complexity 13

26 27

28

29

29

3 3

31

31

32

32

33

33

34

34

35

35

3

3

35

4

45 36

38

40

42

44

Activ

atio

n D

iver

sity

of th

e Der

ivat

iona

l Lex

ome

8 10 12 14 16 186Word Length

36

38

40

42

44

Activ

atio

n D

iver

sity

of th

e Der

ivat

iona

l Lex

ome

6 8 10 12 144Word Length

Figure 4 Interaction of word length by activation diversity in GAMs fitted to plausibility ratings for complex words (left) and to semantictransparency ratings (right)

Table 3 GAM fitted to plausibility ratings for derivational neologisms with the derivational lexomes able again agent ist less and notdata from Marelli and Baroni [60]

A parametric coefficients Estimate Std Error t-value p-valueIntercept -18072 27560 -06557 05127Word Length 05901 02113 27931 00057Activation Diversity 11221 06747 16632 00976Arousal 03847 01521 25295 00121Word Length times Activation Diversity -01318 00500 -26369 00089B smooth terms edf Refdf F-value p-valueRandom Intercepts Affix 33610 40000 556723 lt 00001

of the following matrix S

S = one two threeonetwothree

( 100201031001

040110 ) (13)

We are interested in a transformation matrix 119865 such that

CF = S (14)

The transformation matrix is straightforward to obtain LetC1015840 denote the Moore-Penrose generalized inverse of Cavailable in R as the ginv function in theMASS package [82]Then

F = C1015840S (15)

For the present example

F =one two three

wVwVnVntutuTrTriri

((((((((((((

033033033010010003003003

010010010050050003003003

013013013005005033033033

))))))))))))

(16)

and for this simple example CF is exactly equal to SIn the remainder of this section we investigate how well

this very simple end-to-end model performs for visual wordrecognition as well as auditory word recognition For visualword recognition we use the semantic matrix S developed inSection 3 but we consider two different cue matrices C oneusing letter trigrams (following [58]) and one using phonetrigramsA comparison of the performance of the twomodels

14 Complexity

Table 4 Gaussian location-scale additive model fitted to the semantic transparency ratings for derived words te tensor product smooth ands thin plate regression spline

A parametric coefficients Estimate Std Error t-value p-valueintercept [location] 55016 02564 214537 lt 00001intercept [scale] -04135 00401 -103060 lt 00001B smooth terms edf Refdf F-value p-valuete(activation diversity word length) [location] 53879 62897 237798 00008s(derivational lexome) [location] 143180 150000 3806254 lt 00001s(activation diversity) [scale] 21282 25993 284336 lt 00001s(word length) [scale] 17722 21403 1327311 lt 00001

sheds light on the role of wordsrsquo ldquosound imagerdquo on wordrecognition in reading (cf [87ndash89] for phonological effectsin visual word Recognition) For auditory word recognitionwe make use of the acoustic features developed in Arnold etal [18]

Although the features that we selected as cues are basedon domain knowledge they do not have an ontological statusin our framework in contrast to units such as phonemes andmorphemes in standard linguistic theories In these theoriesphonemes and morphemes are Russelian atomic units offormal phonological and syntactic calculi and considerableresearch has been directed towards showing that these unitsare psychologically real For instance speech errors involvingmorphemes have been interpreted as clear evidence for theexistence in the mind of morphemes [90] Research hasbeen directed at finding behavioral and neural correlates ofphonemes and morphemes [6 91 92] ignoring fundamentalcriticisms of both the phoneme and the morpheme astheoretical constructs [25 93 94]

The features that we use for representing aspects of formare heuristic features that have been selected or developedprimarily because they work well as discriminators We willgladly exchange the present features for other features if theseother features can be shown to afford higher discriminativeaccuracy Nevertheless the features that we use are groundedin domain knowledge For instance the letter trigrams usedfor modeling reading (see eg[58]) are motivated by thefinding in stylometry that letter trigrams are outstandingfeatures for discriminating between authorial hands [95] (inwhat follows we work with letter triplets and triphoneswhich are basically contextually enriched letter and phoneunits For languages with strong phonotactic restrictionssuch that the syllable inventory is quite small (eg Viet-namese) digraphs work appear to work better than trigraphs[96] Baayen et al [85] show that working with four-gramsmay enhance performance and current work in progress onmany other languages shows that for highly inflecting lan-guages with long words 4-grams may outperform 3-gramsFor computational simplicity we have not experimented withmixtures of units of different length nor with algorithmswithwhich such units might be determined) Letter n-grams arealso posited by Cohen and Dehaene [97] for the visual wordform system at the higher endof a hierarchy of neurons tunedto increasingly large visual features of words

Triphones the units that we use for representing thelsquoacoustic imagersquo or lsquoauditory verbal imageryrsquo of canonicalword forms have the same discriminative potential as lettertriplets but have as additional advantage that they do betterjustice compared to phonemes to phonetic contextual inter-dependencies such as plosives being differentiated primarilyby formant transitions in adjacent vowels The acousticfeatures that we use for modeling auditory comprehensionto be introduced in more detail below are motivated in partby the sensitivity of specific areas on the cochlea to differentfrequencies in the speech signal

Evaluation of model predictions proceeds by comparingthe predicted semantic vector s obtained by multiplying anobserved cue vector c with the transformation matrix F (s =cF) with the corresponding target row vector s of S A word 119894is counted as correctly recognized when s119894 is most stronglycorrelated with the target semantic vector s119894 of all targetvectors s119895 across all words 119895

Inflected words do not have their own semantic vectors inS We therefore created semantic vectors for inflected wordsby adding the semantic vectors of stem and affix and addedthese as additional row vectors to S before calculating thetransformation matrix F

We note here that there is only one transformationmatrix ie one discrimination network that covers allaffixes inflectional and derivational as well as derived andmonolexomic (simple) words This approach contrasts withthat of Marelli and Baroni [60] who pair every affix with itsown transformation matrix

42 Visual Comprehension For visual comprehension wefirst consider a model straightforwardly mapping form vec-tors onto semantic vectors We then expand the model withan indirect route first mapping form vectors onto the vectorsfor the acoustic image (derived from wordsrsquo triphone repre-sentations) and thenmapping the acoustic image vectors ontothe semantic vectors

The dataset on which we evaluated our models com-prised 3987 monolexomic English words 6595 inflectedvariants of monolexomic words and 898 derived words withmonolexomic base words to a total of 11480 words Thesecounts follow from the simultaneous constraints of (i) a wordappearing with sufficient frequency in TASA (ii) the wordbeing available in the British Lexicon Project (BLP [98]) (iii)

Complexity 15

the word being available in the CELEX database and theword being of the abovementioned morphological type Inthe present study we did not include inflected variants ofderived words nor did we include compounds

421 The Direct Route Straight from Orthography to Seman-tics To model single word reading as gauged by the visuallexical decision task we used letter trigrams as cues In ourdataset there were a total of 3465 unique trigrams resultingin a 11480 times 3465 orthographic cue matrix C The semanticvectors of the monolexomic and derived words were takenfrom the semantic weight matrix described in Section 3 Forinflected words semantic vectors were obtained by summa-tion of the semantic vectors of base words and inflectionalfunctions For the resulting semantic matrix S we retainedthe 5030 column vectors with the highest variances settingthe cutoff value for the minimal variance to 034times10minus7 FromS and C we derived the transformation matrix F which weused to obtain estimates s of the semantic vectors s in S

For 59 of the words s had the highest correlation withthe targeted semantic vector s (for the correctly predictedwords the mean correlation was 083 and the median was086 With respect to the incorrectly predicted words themean and median correlation were both 048) The accuracyobtained with naive discriminative learning using orthog-onal lexomes as outcomes instead of semantic vectors was27Thus as expected performance of linear discriminationlearning (ldl) is substantially better than that of naivediscriminative learning (ndl)

To assess whether this model is productive in the sensethat it canmake sense of novel complex words we considereda separate set of inflected and derived words which were notincluded in the original dataset For an unseen complexwordboth base word and the inflectional or derivational functionappeared in the training set but not the complex word itselfThe network F therefore was presented with a novel formvector which it mapped onto a novel vector in the semanticspace Recognition was successful if this novel vector wasmore strongly correlated with the semantic vector obtainedby summing the semantic vectors of base and inflectional orderivational function than with any other semantic vector

Of 553 unseen inflected words 43 were recognized suc-cessfully The semantic vectors predicted from their trigramsby the network F were overall well correlated in the meanwith the targeted semantic vectors of the novel inflectedwords (obtained by summing the semantic vectors of theirbase and inflectional function) 119903 = 067 119901 lt 00001The predicted semantic vectors also correlated well with thesemantic vectors of the inflectional functions (119903 = 061 119901 lt00001) For the base words the mean correlation dropped to119903 = 028 (119901 lt 00001)

For unseen derivedwords (514 in total) we also calculatedthe predicted semantic vectors from their trigram vectorsThe resulting semantic vectors s had moderate positivecorrelations with the semantic vectors of their base words(119903 = 040 119901 lt 00001) but negative correlations withthe semantic vectors of their derivational functions (119903 =minus013 119901 lt 00001) They did not correlate with the semantic

vectors obtained by summation of the semantic vectors ofbase and affix (119903 = 001 119901 = 056)

The reduced correlations for the derived words as com-pared to those for the inflected words likely reflects tosome extent that the model was trained on many moreinflected words (in all 6595) than derived words (898)whereas the number of different inflectional functions (7)was much reduced compared to the number of derivationalfunctions (24) However the negative correlations of thederivational semantic vectors with the semantic vectors oftheir derivational functions fit well with the observation inSection 323 that derivational functions are antiprototypesthat enter into negative correlations with the semantic vectorsof the corresponding derived words We suspect that theabsence of a correlation of the predicted vectors with thesemantic vectors obtained by integrating over the vectorsof base and derivational function is due to the semanticidiosyncracies that are typical for derivedwords For instanceausterity can denote harsh discipline or simplicity or apolicy of deficit cutting or harshness to the taste Anda worker is not just someone who happens to work butsomeone earning wages or a nonreproductive bee or waspor a thread in a computer program Thus even within theusages of one and the same derived word there is a lotof semantic heterogeneity that stands in stark contrast tothe straightforward and uniform interpretation of inflectedwords

422 The Indirect Route from Orthography via Phonology toSemantics Although reading starts with orthographic inputit has been shown that phonology actually plays a role duringthe reading process as well For instance developmentalstudies indicate that childrenrsquos reading development can bepredicted by their phonological abilities [87 99] Evidence ofthe influence of phonology on adultsrsquo reading has also beenreported [89 100ndash105] As pointed out by Perrone-Bertolottiet al [106] silent reading often involves an imagery speechcomponent we hear our own ldquoinner voicerdquo while readingSince written words produce a vivid auditory experiencealmost effortlessly they argue that auditory verbal imageryshould be part of any neurocognitive model of readingInterestingly in a study using intracranial EEG recordingswith four epileptic neurosurgical patients they observed thatsilent reading elicits auditory processing in the absence ofany auditory stimulation suggesting that auditory images arespontaneously evoked during reading (see also [107])

To explore the role of auditory images in silent readingwe operationalized sound images bymeans of phone trigrams(in our model phone trigrams are independently motivatedas intermediate targets of speech production see Section 5for further discussion) We therefore ran the model againwith phone trigrams instead of letter triplets The semanticmatrix S was exactly the same as above However a newtransformation matrix F was obtained for mapping a 11480times 5929 cue matrix C of phone trigrams onto S To our sur-prise the triphone model outperformed the trigram modelsubstantially Overall accuracy rose to 78 an improvementof almost 20

16 Complexity

In order to integrate this finding into our model weimplemented an additional network K that maps ortho-graphic vectors (coding the presence or absence of trigrams)onto phonological vectors (indicating the presence or absenceof triphones) The result is a dual route model for (silent)reading with a first route utilizing a network mappingstraight from orthography to semantics and a secondindirect route utilizing two networks one mapping fromtrigrams to triphones and the other subsequently mappingfrom triphones to semantics

The network K which maps trigram vectors onto tri-phone vectors provides good support for the relevant tri-phones but does not provide guidance as to their orderingUsing a graph-based algorithm detailed in the Appendix (seealso Section 52) we found that for 92 of the words thecorrect sequence of triphones jointly spanning the wordrsquossequence of letters received maximal support

We then constructed amatrix Twith the triphone vectorspredicted from the trigrams by networkK and defined a newnetwork H mapping these predicted triphone vectors ontothe semantic vectors SThemean correlation for the correctlyrecognized words was 090 and the median correlation was094 For words that were not recognized correctly the meanand median correlation were 045 and 043 respectively Wenow have for each word two estimated semantic vectors avector s1 obtained with network F of the first route and avector s2 obtained with network H from the auditory targetsthat themselves were predicted by the orthographic cuesThemean of the correlations of these pairs of vectors was 119903 = 073(119901 lt 00001)

To assess to what extent the networks that we definedare informative for actual visual word recognition we madeuse of the reaction times in the visual lexical decision taskavailable in the BLPAll 11480words in the current dataset arealso included in BLPWe derived threemeasures from each ofthe two networks

The first measure is the sum of the L1-norms of s1 ands2 to which we will refer as a wordrsquos total activation diversityThe total activation diversity is an estimate of the support fora wordrsquos lexicality as provided by the two routes The secondmeasure is the correlation of s1 and s2 to which we will referas a wordrsquos route congruency The third measure is the L1-norm of a lexomersquos column vector in S to which we referas its prior This last measure has previously been observedto be a strong predictor for lexical decision latencies [59] Awordrsquos prior is an estimate of a wordrsquos entrenchment and prioravailability

We fitted a generalized additive model to the inverse-transformed response latencies in the BLP with these threemeasures as key covariates of interest including as controlpredictors word length andword type (derived inflected andmonolexomic with derived as reference level) As shown inTable 5 inflected words were responded to more slowly thanderived words and the same holds for monolexomic wordsalbeit to a lesser extent Response latencies also increasedwith length Total activation diversity revealed a U-shapedeffect (Figure 5 left panel) For all but the lowest totalactivation diversities we find that response times increase as

total activation diversity increases This result is consistentwith previous findings for auditory word recognition [18]As expected given the results of Baayen et al [59] the priorwas a strong predictor with larger priors affording shorterresponse times There was a small effect of route congruencywhich emerged in a nonlinear interaction with the priorsuch that for large priors a greater route congruency affordedfurther reduction in response time (Figure 5 right panel)Apparently when the two routes converge on the samesemantic vector uncertainty is reduced and a faster responsecan be initiated

For comparison the dual-route model was implementedwith ndl as well Three measures were derived from thenetworks and used to predict the same response latenciesThese measures include activation activation diversity andprior all of which have been reported to be reliable predictorsfor RT in visual lexical decision [54 58 59] However sincehere we are dealing with two routes the three measures canbe independently derived from both routes We thereforesummed up the measures of the two routes obtaining threemeasures total activation total activation diversity and totalprior With word type and word length as control factorsTable 6 shows that thesemeasures participated in a three-wayinteraction presented in Figure 6 Total activation showed aU-shaped effect on RT that is increasingly attenuated as totalactivation diversity is increased (left panel) Total activationalso interacted with the total prior (right panel) For mediumrange values of total activation RTs increased with totalactivation and as expected decreased with the total priorThecenter panel shows that RTs decrease with total prior formostof the range of total activation diversity and that the effect oftotal activation diversity changes sign going from low to highvalues of the total prior

Model comparison revealed that the generalized additivemodel with ldl measures (Table 5) provides a substantiallyimproved fit to the data compared to the GAM using ndlmeasures (Table 6) with a difference of no less than 65101AIC units At the same time the GAM based on ldl is lesscomplex

Given the substantially better predictability of ldl mea-sures on human behavioral data one remaining question iswhether it is really the case that two routes are involved insilent reading After all the superior accuracy of the secondroute might be an artefact of a simple machine learningtechnique performing better for triphones than for trigramsThis question can be addressed by examining whether thefit of the GAM summarized in Table 5 improves or worsensdepending on whether the activation diversity of the first orthe second route is taken out of commission

When the GAM is provided access to just the activationdiversity of s2 the AIC increased by 10059 units Howeverwhen themodel is based only on the activation diversity of s1themodel fit increased by no less than 14790 AIC units Fromthis we conclude that at least for the visual lexical decisionlatencies in the BLP the second route first mapping trigramsto triphones and then mapping triphones onto semanticvectors plays the more important role

The superiority of the second route may in part bedue to the number of triphone features being larger

Complexity 17

minus0295

minus00175

026

part

ial e

ffect

1 2 3 4 50Total activation diversity

000

005

010

015

Part

ial e

ffect

05

10

15

20

Prio

r

02 04 06 08 1000Selfminuscorrelation

Figure 5 The partial effects of total activation diversity (left) and the interaction of route congruency and prior (right) on RT in the BritishLexicon Project

Table 5 Summary of a generalized additive model fitted to response latencies in visual lexical decision using measures based on ldl s thinplate regression spline smooth te tensor product smooth

A parametric coefficients Estimate Std Error t-value p-valueintercept -17774 00087 -205414 lt0001word typeInflected 01110 00059 18742 lt0001word typeMonomorphemic 00233 00054 4322 lt0001word length 00117 00011 10981 lt0001B smooth terms edf Refdf F p-values(total activation diversity) 6002 7222 236 lt0001te(route congruency prior) 14673 17850 2137 lt0001

02 06 10 14

NDL total activation

minus0171

0031

0233

part

ial e

ffect

minus4 minus2 0 2 4

10

15

20

25

30

35

40

NDL total prior

ND

L to

tal a

ctiv

atio

n di

vers

ity

minus0416

01915

0799

part

ial e

ffect

02 06 10 14

minus4

minus2

02

4

NDL total activation

ND

L to

tal p

rior

minus0158

01635

0485

part

ial e

ffect

10

15

20

25

30

35

40

ND

L to

tal a

ctiv

atio

n di

vers

ity

Figure 6The interaction of total activation and total activation diversity (left) of total prior by total activation diversity (center) and of totalactivation by total prior (right) on RT in the British Lexicon Project based on ndl

18 Complexity

Table 6 Summary of a generalized additive model fitted to response latencies in visual lexical decision using measures from ndl te tensorproduct smooth

A parametric coefficients Estimate Std Error t-value p-valueintercept -16934 00086 -195933 lt0001word typeInflected 00286 00052 5486 lt0001word typeMonomorphemic 00042 00055 0770 = 4word length 00066 00011 5893 lt0001B smooth terms edf Refdf F p-valuete(activation activation diversity prior) 4134 4972 1084 lt0001

than the number of trigram features (3465 trigrams ver-sus 5929 triphones) More features which mathematicallyamount to more predictors enable more precise map-pings Furthermore the heavy use made in English ofletter sequences such as ough (with 10 different pro-nunciations httpswwwdictionarycomesough)reduces semantic discriminability compared to the corre-sponding spoken forms It is noteworthy however that thebenefits of the triphone-to-semantics mapping are possibleonly thanks to the high accuracy with which orthographictrigram vectors are mapped onto phonological triphonevectors (92)

It is unlikely that the lsquophonological routersquo is always dom-inant in silent reading Especially in fast lsquodiagonalrsquo readingthe lsquodirect routersquomay bemore dominantThere is remarkablealthough for the present authors unexpected convergencewith the dual route model for reading aloud of Coltheartet al [108] and Coltheart [109] However while the text-to-phonology route of their model has as primary function toexplain why nonwords can be pronounced our results showthat both routes can actually be active when silently readingreal words A fundamental difference is however that in ourmodel wordsrsquo semantics play a central role

43 Auditory Comprehension For the modeling of readingwe made use of letter trigrams as cues These cues abstractaway from the actual visual patterns that fall on the retinapatterns that are already transformed at the retina beforebeing sent to the visual cortex Our hypothesis is that lettertrigrams represent those high-level cells or cell assemblies inthe visual system that are critical for reading andwe thereforeleave the modeling possibly with deep learning networksof how patterns on the retina are transformed into lettertrigrams for further research

As a consequence of the high level of abstraction of thetrigrams a word form is represented by a unique vectorspecifying which of a fixed set of letter trigrams is present inthe word Although one might consider modeling auditorycomprehension with phone triplets (triphones) replacing theletter trigrams of visual comprehension such an approachwould not do justice to the enormous variability of actualspeech Whereas the modeling of reading printed wordscan depart from the assumption that the pixels of a wordrsquosletters on a computer screen are in a fixed configurationindependently of where the word is shown on the screenthe speech signal of the same word type varies from token

to token as illustrated in Figure 7 for the English wordcrisis A survey of the Buckeye corpus [110] of spontaneousconversations recorded at Columbus Ohio [111] indicatesthat around 5 of the words are spoken with one syllablemissing and that a little over 20 of words have at least onephone missing

It is widely believed that the phoneme as an abstractunit of sound is essential for coming to grips with thehuge variability that characterizes the speech signal [112ndash114]However the phoneme as theoretical linguistic constructis deeply problematic [93] and for many spoken formscanonical phonemes do not do justice to the phonetics ofthe actual sounds [115] Furthermore if words are defined assequences of phones the problem arises what representationsto posit for words with two ormore reduced variants Addingentries for reduced forms to the lexicon turns out not toafford better overal recognition [116] Although exemplarmodels have been put forward to overcome this problem[117] we take a different approach here and following Arnoldet al [18] lay out a discriminative approach to auditorycomprehension

The cues that we make use of to represent the acousticsignal are the Frequency Band Summary Features (FBSFs)introduced by Arnold et al [18] as input cues FBSFs summa-rize the information present in the spectrogram of a speechsignal The algorithm that derives FBSFs first chunks theinput at the minima of the Hilbert amplitude envelope of thesignalrsquos oscillogram (see the upper panel of Figure 8) Foreach chunk the algorithm distinguishes 21 frequency bandsin the MEL scaled spectrum and intensities are discretizedinto 5 levels (lower panel in Figure 8) for small intervalsof time For each chunk and for each frequency band inthese chunks a discrete feature is derived that specifies chunknumber frequency band number and a description of thetemporal variation in the band bringing together minimummaximum median initial and final intensity values The21 frequency bands are inspired by the 21 receptive areason the cochlear membrane that are sensitive to differentranges of frequencies [118] Thus a given FBSF is a proxyfor cell assemblies in the auditory cortex that respond to aparticular pattern of changes over time in spectral intensityThe AcousticNDLCodeR package [119] for R [120] wasemployed to extract the FBSFs from the audio files

We tested ldl on 20 hours of speech sampled from theaudio files of the ucla library broadcast newsscapedata a vast repository of multimodal TV news broadcasts

Complexity 19

0 5000 10000 15000 20000 25000 30000 35000 40000minus6000

minus4000

minus2000

0

2000

4000

6000

8000

0 5000 10000 15000 20000 25000 30000 35000 40000minus6000

minus4000

minus2000

0

2000

4000

6000

8000

Figure 7Oscillogram for twodifferent realizations of theword crisiswith different degrees of reduction in theNewsScape archive [kraız](left)and [khraısıs](right)

00 01 02 03 04 05 06time

minus020minus015minus010minus005

000005010015

ampl

itude

1

21

frequ

ency

ban

d

Figure 8 Oscillogram of the speech signal for a realization of the word economic with Hilbert envelope (in orange) outlining the signal isshown on the top panel The lower panel depicts the discretized MEL scaled spectrum of the signalThe vertical bars are the boundary pointsthat partition the signal into 4 chunks For one chunk horizontal lines (in red) highlight one of the frequency bands for which the FBSFsprovide summaries of the variation over time in that frequency band The FBSF for the highlighted band is ldquoband11-start3-median3-min2-max5-end3-part2rdquo

provided to us by the Distributed Little Red Hen Lab Theaudio files of this resource were automatically classified asclean for relatively clean parts where there is speech withoutbackground noise or music and noisy for speech snippetswhere background noise or music is present Here we reportresults for 20 hours of clean speech to a total of 131673word tokens (representing 4779word types) with in all 40639distinct FBSFs

The FBSFs for the word tokens are brought together ina matrix C119886 with dimensions 131673 audio tokens times 40639FBSFs The targeted semantic vectors are taken from the Smatrix which is expanded to a matrix with 131673 rows onefor each audio token and 4609 columns the dimension of

the semantic vectors Although the transformation matrix Fcould be obtained by calculating C1015840S the calculation of C1015840is numerically expensive To reduce computational costs wecalculated F as follows (the transpose of a square matrix Xdenoted by X119879 is obtained by replacing the upper triangleof the matrix by the lower triangle and vice versa Thus( 3 87 2 )119879 = ( 3 78 2 ) For a nonsquare matrix the transpose isobtained by switching rows and columnsThus a 4times2 matrixbecomes a 2 times 4 matrix when transposed)

CF = S

C119879CF = C119879S

20 Complexity

(C119879C)minus1 (C119879C) F = (C119879C)minus1 C119879SF = (C119879C)minus1 C119879S

(17)

In this way matrix inversion is required for a much smallermatrixC119879C which is a square matrix of size 40639 times 40639

To evaluate model performance we compared the esti-mated semantic vectors with the targeted semantic vectorsusing the Pearson correlation We therefore calculated the131673 times 131673 correlation matrix for all possible pairs ofestimated and targeted semantic vectors Precisionwas calcu-lated by comparing predicted vectors with the gold standardprovided by the targeted vectors Recognition was defined tobe successful if the correlation of the predicted vector withthe targeted gold vector was the highest of all the pairwisecorrelations of this predicted vector with any of the other goldsemantic vectors Precision defined as the proportion of cor-rect recognitions divided by the total number of audio tokenswas at 3361 For the correctly identified words the meancorrelation was 072 and for the incorrectly identified wordsit was 055 To place this in perspective a naive discrimi-nation model with discrete lexomes as output performed at12 and a deep convolution network Mozilla DeepSpeech(httpsgithubcommozillaDeepSpeech based onHannun et al [9]) performed at 6 The low performanceofMozilla DeepSpeech is due primarily due to its dependenceon a languagemodelWhenpresentedwith utterances insteadof single words it performs remarkably well

44 Discussion A problem with naive discriminative learn-ing that has been puzzling for a long time is that measuresbased on ndl performed well as predictors of processingtimes [54 58] whereas accuracy for lexical decisions waslow An accuracy of 27 for a model performing an 11480-classification task is perhaps reasonable but the lack ofprecision is unsatisfactory when the goal is to model humanvisual lexicality decisions Bymoving fromndl to ldlmodelaccuracy is substantially improved (to 59) At the sametime predictions for reaction times improved considerablyas well As lexicality decisions do not require word identifica-tion further improvement in predicting decision behavior isexpected to be possible by considering not only whether thepredicted semantic is closest to the targeted vector but alsomeasures such as how densely the space around the predictedsemantic vector is populated

Accuracy for auditory comprehension is lower for thedata we considered above at around 33 Interestingly aseries of studies indicates that recognizing isolated wordstaken out of running speech is a nontrivial task also forhuman listeners [121ndash123] Correct identification by nativespeakers of 1000 randomly sampled word tokens from aGerman corpus of spontaneous conversational speech rangedbetween 21 and 44 [18] For both human listeners andautomatic speech recognition systems recognition improvesconsiderably when words are presented in their naturalcontext Given that ldl with FBSFs performs very wellon isolated word recognition it seems worth investigating

further whether the present approach can be developed into afully-fledged model of auditory comprehension that can takefull utterances as input For a blueprint of how we plan toimplement such a model see Baayen et al [124]

5 Speech Production

This section examines whether we can predict wordsrsquo formsfrom the semantic vectors of S If this is possible for thepresent dataset with reasonable accuracy we have a proofof concept that discriminative morphology is feasible notonly for comprehension but also for speech productionThe first subsection introduces the computational imple-mentation The next subsection reports on the modelrsquosperformance which is evaluated first for monomorphemicwords then for inflected words and finally for derivedwords Section 53 provides further evidence for the pro-duction network by showing that as the support from thesemantics for the triphones becomes weaker the amount oftime required for articulating the corresponding segmentsincreases

51 Computational Implementation For a productionmodelsome representational format is required for the output thatin turn drives articulation In what follows we make useof triphones as output features Triphones capture part ofthe contextual dependencies that characterize speech andthat render problematic the phoneme as elementary unit ofa phonological calculus [93] Triphones are in many waysnot ideal in that they inherit the limitations that come withdiscrete units Other output features structured along thelines of gestural phonology [125] or time series of movementsof key articulators registered with electromagnetic articulog-raphy or ultrasound are on our list for further explorationFor now we use triphones as a convenience construct andwe will show that given the support from the semantics forthe triphones the sequence of phones can be reconstructedwith high accuracy As some models of speech productionassemble articulatory targets from phone segments (eg[126]) the present implementation can be seen as a front-endfor this type of model

Before putting the model to the test we first clarify theway the model works by means of our toy lexicon with thewords one two and three Above we introduced the semanticmatrix S (13) which we repeat here for convenience

S = one two threeonetwothree

( 100201031001

040110 ) (18)

as well as an C indicator matrix specifying which triphonesoccur in which words (12) As in what follows this matrixspecifies the triphones targeted for productionwe henceforthrefer to this matrix as the Tmatrix

Complexity 21

T

= wV wVn Vn tu tu Tr Tri rionetwothree

( 100100

100010

010001

001001 ) (19)

For production our interest is in thematrixG that transformsthe row vectors of S into the row vectors of T ie we need tosolve

SG = T (20)

Given G we can predict for any semantic vector s the vectorof triphones t that quantifies the support for the triphonesprovided by s simply by multiplying s with G

t = sG (21)

As before the transformation matrix G is straightforwardto estimate Let S1015840 denote the Moore-Penrose generalizedinverse of S Since

S1015840SG = S1015840T (22)

we have

G = S1015840T (23)

Given G we can predict the triphone matrix T from thesemantic matrix S

SG = T (24)

For the present example the inverse of S and S1015840 is

S1015840 = one two threeonetwothree

( 110minus021minus009minus029107minus008

minus041minus002104 ) (25)

and the transformation matrix G is

G = wV wVn Vn tu tu Tr Tri ri

onetwothree

( 110minus021minus009110minus021minus009

110minus021minus009minus029107minus008

minus029107minus008minus041minus002104

minus041minus002104minus041minus002104 ) (26)

For this simple example T is virtually identical to T Forrealistic data T will not be identical to T but will be anapproximation of it that is optimal in the least squares senseThe triphones with the strongest support are expected to bethe most likely to be the triphones defining a wordrsquos form

We made use of the S matrix which we derived fromthe tasa corpus as described in Section 3 The majority ofcolumns of the 23561 times 23561 matrix S show very smalldeviations from zero and hence are uninformative As beforewe reduced the number of columns of S by removing columnswith very low variance Here one option is to remove allcolumns with a variance below a preset threshold 120579 Howeverto avoid adding a threshold as a free parameter we setthe number of columns retained to the number of differenttriphones in a given dataset a number which is around 119899 =4500 In summary S denotes a 119908times119899 matrix that specifies foreach of 119908 words an 119899-dimensional semantic vector

52 Model Performance

521 Performance on Monolexomic Words We first exam-ined model performance on a dataset comprising monolex-omicwords that did not carry any inflectional exponentsThisdataset of 3987words comprised three irregular comparatives

(elder less andmore) two irregular superlatives (least most)28 irregular past tense forms and one irregular past participle(smelt) For this set of words we constructed a 3987 times 4446matrix of semantic vectors S119898 (a submatrix of the S matrixintroduced above in Section 3) and a 3987 times 4446 triphonematrix T119898 (a submatrix of T) The number of colums of S119898was set to the number of different triphones (the columndimension of T119898) Those column vectors of S119898 were retainedthat had the 4446 highest variances We then estimated thetransformation matrix G and used this matrix to predict thetriphones that define wordsrsquo forms

We evaluated model performance in two ways First weinspected whether the triphones with maximal support wereindeed the targeted triphones This turned out to be the casefor all words Targeted triphones had an activation valueclose to one and nontargeted triphones an activation closeto zero As the triphones are not ordered we also investi-gated whether the sequence of phones could be constructedcorrectly for these words To this end we set a thresholdof 099 extracted all triphones with an activation exceedingthis threshold and used the all simple paths (a path issimple if the vertices it visits are not visited more than once)function from the igraph package [127] to calculate all pathsstarting with any left-edge triphone in the set of extractedtriphones From the resulting set of paths we selected the

22 Complexity

longest path which invariably was perfectly aligned with thesequence of triphones that defines wordsrsquo forms

We also evaluated model performance with a secondmore general heuristic algorithm that also makes use of thesame algorithm from graph theory Our algorithm sets up agraph with vertices collected from the triphones that are bestsupported by the relevant semantic vectors and considers allpaths it can find that lead from an initial triphone to a finaltriphoneThis algorithm which is presented inmore detail inthe appendix and which is essential for novel complex wordsproduced the correct form for 3982 out of 3987 words Itselected a shorter form for five words int for intent lin forlinnen mis for mistress oint for ointment and pin for pippinThe correct forms were also found but ranked second due toa simple length penalty that is implemented in the algorithm

From these analyses it is clear that mapping nearly4000 semantic vectors on their corresponding triphone pathscan be accomplished with very high accuracy for Englishmonolexomic words The question to be addressed nextis how well this approach works for complex words Wefirst address inflected forms and limit ourselves here to theinflected variants of the present set of 3987 monolexomicwords

522 Performance on Inflected Words Following the classicdistinction between inflection and word formation inflectedwords did not receive semantic vectors of their own Never-theless we can create semantic vectors for inflected words byadding the semantic vector of an inflectional function to thesemantic vector of its base However there are several ways inwhich themapping frommeaning to form for inflectedwordscan be set up To explain this we need some further notation

Let S119898 and T119898 denote the submatrices of S and T thatcontain the semantic and triphone vectors of monolexomicwords Assume that a subset of 119896 of these monolexomicwords is attested in the training corpus with inflectionalfunction 119886 Let T119886 denote the matrix with the triphonevectors of these inflected words and let S119886 denote thecorresponding semantic vectors To obtain S119886 we take thepertinent submatrix S119898119886 from S119898 and add the semantic vectors119886 of the affix

S119886 = S119898119886 + i⨂ s119886 (27)

Here i is a unit vector of length 119896 and ⨂ is the generalizedKronecker product which in (27) stacks 119896 copies of s119886 row-wise As a first step we could define a separate mapping G119886for each inflectional function 119886

S119886G119886 = T119886 (28)

but in this set-up learning of inflected words does not benefitfrom the knowledge of the base words This can be remediedby a mapping for augmented matrices that contain the rowvectors for both base words and inflected words[S119898

S119886]G119886 = [T119898

T119886] (29)

The dimension of G119886 (length of semantic vector by lengthof triphone vector) remains the same so this option is notmore costly than the preceding one Nevertheless for eachinflectional function a separate large matrix is required Amuch more parsimonious solution is to build augmentedmatrices for base words and all inflected words jointly

[[[[[[[[[[S119898S1198861S1198862S119886119899

]]]]]]]]]]G = [[[[[[[[[[

T119898T1198861T1198862T119886119899

]]]]]]]]]](30)

The dimension of G is identical to that of G119886 but now allinflectional functions are dealt with by a single mappingIn what follows we report the results obtained with thismapping

We selected 6595 inflected variants which met the cri-terion that the frequency of the corresponding inflectionalfunction was at least 50 This resulted in a dataset with 91comparatives 97 superlatives 2401 plurals 1333 continuousforms (eg walking) 859 past tense forms and 1086 formsclassed as perfective (past participles) as well as 728 thirdperson verb forms (eg walks) Many forms can be analyzedas either past tenses or as past participles We followed theanalyses of the treetagger which resulted in a dataset inwhichboth inflectional functions are well-attested

Following (30) we obtained an (augmented) 10582 times5483 semantic matrix S whereas before we retained the 5483columns with the highest column variance The (augmented)triphone matrix T for this dataset had the same dimensions

Inspection of the activations of the triphones revealedthat targeted triphones had top activations for 85 of themonolexomic words and 86 of the inflected words Theproportion of words with at most one intruding triphone was97 for both monolexomic and inflected words The graph-based algorithm performed with an overall accuracy of 94accuracies broken down bymorphology revealed an accuracyof 99 for the monolexomic words and an accuracy of 92for inflected words One source of errors for the productionalgorithm is inconsistent coding in the celex database Forinstance the stem of prosper is coded as having a final schwafollowed by r but the inflected forms are coded without ther creating a mismatch between a partially rhotic stem andcompletely non-rhotic inflected variants

We next put model performance to a more stringent testby using 10-fold cross-validation for the inflected variantsFor each fold we trained on all stems and 90 of all inflectedforms and then evaluated performance on the 10of inflectedforms that were not seen in training In this way we can ascer-tain the extent to which our production system (network plusgraph-based algorithm for ordering triphones) is productiveWe excluded from the cross-validation procedure irregularinflected forms forms with celex phonological forms withinconsistent within-paradigm rhoticism as well as forms thestem of which was not available in the training set Thus

Complexity 23

cross-validation was carried out for a total of 6236 inflectedforms

As before the semantic vectors for inflected words wereobtained by addition of the corresponding content and inflec-tional semantic vectors For each training set we calculatedthe transformationmatrixG from the S andTmatrices of thattraining set For an out-of-bag inflected form in the test setwe calculated its semantic vector and multiplied this vectorwith the transformation matrix (using (21)) to obtain thepredicted triphone vector t

The proportion of forms that were predicted correctlywas 062 The proportion of forms ranked second was 025Forms that were incorrectly ranked first typically were otherinflected forms (including bare stems) that happened toreceive stronger support than the targeted form Such errorsare not uncommon in spoken English For instance inthe Buckeye corpus [110] closest is once reduced to closFurthermore Dell [90] classified forms such as concludementfor conclusion and he relax for he relaxes as (noncontextual)errors

The algorithm failed to produce the targeted form for3 of the cases Examples of the forms produced instead arethe past tense for blazed being realized with [zId] insteadof [zd] the plural mouths being predicted as having [Ts] ascoda rather than [Ds] and finest being reduced to finst Thevoiceless production formouth does not follow the dictionarynorm but is used as attested by on-line pronunciationdictionaries Furthermore the voicing alternation is partlyunpredictable (see eg [128] for final devoicing in Dutch)and hence model performance here is not unreasonable Wenext consider model accuracy for derived words

523 Performance on Derived Words When building thevector space model we distinguished between inflectionand word formation Inflected words did not receive theirown semantic vectors By contrast each derived word wasassigned its own lexome together with a lexome for itsderivational function Thus happiness was paired with twolexomes happiness and ness Since the semantic matrix forour dataset already contains semantic vectors for derivedwords we first investigated how well forms are predictedwhen derived words are assessed along with monolexomicwords without any further differentiation between the twoTo this end we constructed a semantic matrix S for 4885words (rows) by 4993 (columns) and constructed the trans-formation matrix G from this matrix and the correspondingtriphone matrix T The predicted form vectors of T =SG supported the targeted triphones above all other tri-phones without exception Furthermore the graph-basedalgorithm correctly reconstructed 99 of all forms withonly 5 cases where it assigned the correct form secondrank

Next we inspected how the algorithm performs whenthe semantic vectors of derived forms are obtained fromthe semantic vectors of their base words and those of theirderivational lexomes instead of using the semantic vectors ofthe derived words themselves To allow subsequent evalua-tion by cross-validation we selected those derived words that

contained an affix that occured at least 30 times in our data set(again (38) agent (177) ful (45) instrument (82) less(54) ly (127) and ness (57)) to a total of 770 complex words

We first combined these derived words with the 3987monolexomic words For the resulting 4885 words the Tand S matrices were constructed from which we derivedthe transformation matrix G and subsequently the matrixof predicted triphone strengths T The proportion of wordsfor which the targeted triphones were the best supportedtriphones was 096 and the graph algorithm performed withan accuracy of 989

In order to assess the productivity of the system weevaluated performance on derived words by means of 10-fold cross-validation For each fold we made sure that eachaffix was present proportionally to its frequency in the overalldataset and that a derived wordrsquos base word was included inthe training set

We first examined performance when the transformationmatrix is estimated from a semantic matrix that contains thesemantic vectors of the derived words themselves whereassemantic vectors for unseen derived words are obtained bysummation of the semantic vectors of base and affix It turnsout that this set-up results in a total failure In the presentframework the semantic vectors of derived words are tooidiosyncratic and too finely tuned to their own collocationalpreferences They are too scattered in semantic space tosupport a transformation matrix that supports the triphonesof both stem and affix

We then used exactly the same cross-validation procedureas outlined above for inflected words constructing semanticvectors for derived words from the semantic vectors of baseand affix and calculating the transformation matrix G fromthese summed vectors For unseen derivedwordsGwas usedto transform the semantic vectors for novel derived words(obtained by summing the vectors of base and affix) intotriphone space

For 75 of the derived words the graph algorithmreconstructed the targeted triphones For 14 of the derivednouns the targeted form was not retrieved These includecases such as resound which the model produced with [s]instead of the (unexpected) [z] sewer which the modelproduced with R instead of only R and tumbler where themodel used syllabic [l] instead of nonsyllabic [l] given in thecelex target form

53 Weak Links in the Triphone Graph and Delays in SpeechProduction An important property of the model is that thesupport for trigrams changes where a wordrsquos graph branchesout for different morphological variants This is illustratedin Figure 9 for the inflected form blending Support for thestem-final triphone End is still at 1 but then blend can endor continue as blends blended or blending The resultinguncertainty is reflected in the weights on the edges leavingEnd In this example the targeted ing form is driven by theinflectional semantic vector for continuous and hence theedge to ndI is best supported For other forms the edgeweights will be different and hence other paths will be bettersupported

24 Complexity

1

1

1051

024023

03

023

1

001 001

1

02

03

023

002

1 0051

bl

blE

lEn

EndndI

dIN

IN

INI

NIN

dId

Id

IdI

dII

IIN

ndz

dz

dzI

zIN

nd

EnI

nIN

lEI

EIN

Figure 9 The directed graph for blending Vertices represent triphones including triphones such as IIN that were posited by the graphalgorithm to bridge potentially relevant transitions that are not instantiated in the training data Edges are labelled with their edge weightsThe graph the edge weights of which are specific to the lexomes blend and continuous incorporates not only the path for blending butalso the paths for blend blends and blended

The uncertainty that arises at branching points in thetriphone graph is of interest in the light of several exper-imental results For instance inter keystroke intervals intyping become longer at syllable and morph boundaries[129 130] Evidence from articulography suggests variabilityis greater at morph boundaries [131] Longer keystroke exe-cution times and greater articulatory variability are exactlywhat is expected under reduced edge support We thereforeexamined whether the edge weights at the first branchingpoint are predictive for lexical processing To this end weinvestigated the acoustic duration of the segment at the centerof the first triphone with a reduced LDL edge weight (in thepresent example d in ndI henceforth lsquobranching segmentrsquo)for those words in our study that are attested in the Buckeyecorpus [110]

The dataset we extracted from the Buckeye corpus com-prised 15105 tokens of a total of 1327 word types collectedfrom 40 speakers For each of these words we calculatedthe relative duration of the branching segment calculatedby dividing segment duration by word duration For thepurposes of statistical evaluation the distribution of thisrelative duration was brought closer to normality bymeans ofa logarithmic transformationWe fitted a generalized additivemixed model [132] to log relative duration with randomintercepts for word and speaker log LDL edge weight aspredictor of interest and local speech rate (Phrase Rate)log neighborhood density (Ncount) and log word frequencyas control variables A summary of this model is presentedin Table 7 As expected an increase in LDL Edge Weight

goes hand in hand with a reduction in the duration of thebranching segment In other words when the edge weight isreduced production is slowed

The adverse effects of weaknesses in a wordrsquos path inthe graph where the path branches is of interest againstthe discussion about the function of weak links in diphonetransitions in the literature on comprehension [133ndash138] Forcomprehension it has been argued that bigram or diphonelsquotroughsrsquo ie low transitional probabilities in a chain of hightransitional probabilities provide points where sequencesare segmented and parsed into their constituents Fromthe perspective of discrimination learning however low-probability transitions function in exactly the opposite wayfor comprehension [54 124] Naive discriminative learningalso predicts that troughs should give rise to shorter process-ing times in comprehension but not because morphologicaldecomposition would proceed more effectively Since high-frequency boundary bigrams and diphones are typically usedword-internally across many words they have a low cuevalidity for thesewords Conversely low-frequency boundarybigrams are much more typical for very specific base+affixcombinations and hence are better discriminative cues thatafford enhanced activation of wordsrsquo lexomes This in turngives rise to faster processing (see also Ramscar et al [78] forthe discriminative function of low-frequency bigrams in low-frequency monolexomic words)

There is remarkable convergence between the directedgraphs for speech production such as illustrated in Figure 9and computational models using temporal self-organizing

Complexity 25

Table 7 Statistics for the partial effects of a generalized additive mixed model fitted to the relative duration of edge segments in the Buckeyecorpus The trend for log frequency is positive accelerating TPRS thin plate regression spline

A parametric coefficients Estimate Std Error t-value p-valueIntercept -19165 00592 -323908 lt 00001Log LDL Edge Weight -01358 00400 -33960 00007Phrase Rate 00047 00021 22449 00248Log Ncount 01114 00189 58837 lt 00001B smooth terms edf Refdf F-value p-valueTPRS log Frequency 19007 19050 75495 00004random intercepts word 11498839 13230000 314919 lt 00001random intercepts speaker 302995 390000 345579 lt 00001

maps (TSOMs [139ndash141]) TSOMs also use error-drivenlearning and in recent work attention is also drawn toweak edges in wordsrsquo paths in the TSOM and the conse-quences thereof for lexical processing [142] We are not usingTSOMs however one reason being that in our experiencethey do not scale up well to realistically sized lexiconsA second reason is that we are pursuing the hypothesisthat form serves meaning and that self-organization ofform by itself is not necessary This hypothesis is likelytoo strong especially as we do not provide a rationale forthe trigram and triphone units that we use to representaspects of form It is worth noting that spatial organizationof form features is not restricted to TSOMs Although inour approach the spatial organization of triphone features isleft unspecified such an organization can be easily enforced(see [85] for further details) by using an algorithm suchas graphopt which self-organizes the vertices of a graphinto a two-dimensional plane Figure 9 was obtained withthis algorithm (Graphopt was developed for the layout oflarge graphs httpwwwschmuhlorggraphopt andis implemented in the igraph package Graphopt uses basicprinciples of physics to iteratively determine an optimallayout Each node in the graph is given both mass and anelectric charge and edges between nodes are modeled asspringsThis sets up a system inwhich there are attracting andrepelling forces between the vertices of the graph and thisphysical system is simulated until it reaches an equilibrium)

54 Discussion Our production network reconstructsknown wordsrsquo forms with a high accuracy 999 formonolexomic words 92 for inflected words and 99 forderived words For novel complex words accuracy under10-fold cross-validation was 62 for inflected words and 75for derived words

The drop in performance for novel forms is perhapsunsurprising given that speakers understand many morewords than they themselves produce even though they hearnovel forms on a fairly regular basis as they proceed throughlife [79 143]

However we also encountered several technical problemsthat are not due to the algorithm but to the representationsthat the algorithm has had to work with First it is surprisingthat accuracy is as high as it is given that the semanticvectors are constructed from a small corpus with a very

simple discriminative algorithm Second we encounteredinconsistencies in the phonological forms retrieved from thecelex database inconsistencies that in part are due to theuse of discrete triphones Third many cases where the modelpredictions do not match the targeted triphone sequence thetargeted forms have a minor irregularity (eg resound with[z] instead of [s]) Fourth several of the typical errors that themodel makes are known kinds of speech errors or reducedforms that one might encounter in engaged conversationalspeech

It is noteworthy that themodel is almost completely data-driven The triphones are derived from celex and otherthan somemanual corrections for inconsistencies are derivedautomatically from wordsrsquo phone sequences The semanticvectors are based on the tasa corpus and were not in any wayoptimized for the production model Given the triphone andsemantic vectors the transformation matrix is completelydetermined No by-hand engineering of rules and exceptionsis required nor is it necessary to specify with hand-codedlinks what the first segment of a word is what its secondsegment is etc as in the weaver model [37] It is only in theheuristic graph algorithm that three thresholds are requiredin order to avoid that graphs become too large to remaincomputationally tractable

The speech production system that emerges from thisapproach comprises first of all an excellent memory forforms that have been encountered before Importantly thismemory is not a static memory but a dynamic one It is not arepository of stored forms but it reconstructs the forms fromthe semantics it is requested to encode For regular unseenforms however a second network is required that projects aregularized semantic space (obtained by accumulation of thesemantic vectors of content and inflectional or derivationalfunctions) onto the triphone output space Importantly itappears that no separate transformations or rules are requiredfor individual inflectional or derivational functions

Jointly the two networks both of which can also betrained incrementally using the learning rule ofWidrow-Hoff[44] define a dual route model attempts to build a singleintegrated network were not successfulThe semantic vectorsof derived words are too idiosyncratic to allow generalizationfor novel forms It is the property of the semantic vectorsof derived lexomes described in Section 323 namely thatthey are close to but not inside the cloud of their content

26 Complexity

lexomes which makes them productive Novel forms do notpartake in the idiosyncracies of lexicalized existing wordsbut instead remain bound to base and derivational lexomesIt is exactly this property that makes them pronounceableOnce produced a novel form will then gravitate as experi-ence accumulates towards the cloud of semantic vectors ofits morphological category meanwhile also developing itsown idiosyncracies in pronunciation including sometimeshighly reduced pronunciation variants (eg tyk for Dutchnatuurlijk ([natyrlk]) [111 144 145] It is perhaps possibleto merge the two routes into one but for this much morefine-grained semantic vectors are required that incorporateinformation about discourse and the speakerrsquos stance withrespect to the adressee (see eg [115] for detailed discussionof such factors)

6 Bringing in Time

Thus far we have not considered the role of time The audiosignal comes in over time longerwords are typically readwithmore than one fixation and likewise articulation is a temporalprocess In this section we briefly outline how time can bebrought into the model We do so by discussing the readingof proper names

Proper names (morphologically compounds) pose achallenge to compositional theories as the people referredto by names such as Richard Dawkins Richard NixonRichardThompson and SandraThompson are not obviouslysemantic composites of each other We therefore assignlexomes to names irrespective of whether the individu-als referred to are real or fictive alive or dead Further-more we assume that when the personal name and familyname receive their own fixations both names are under-stood as pertaining to the same named entity which istherefore coupled with its own unique lexome Thus forthe example sentences in Table 8 the letter trigrams ofJohn are paired with the lexome JohnClark and like-wise the letter trigrams of Clark are paired with this lex-ome

By way of illustration we obtained thematrix of semanticvectors training on 100 randomly ordered tokens of thesentences of Table 8 We then constructed a trigram cuematrix C specifying for each word (John Clark wrote )which trigrams it contains In parallel a matrix L specify-ing for each word its corresponding lexomes (JohnClarkJohnClark write ) was set up We then calculatedthe matrix F by solving CF = L and used F to cal-culate estimated (predicted) semantic vectors L Figure 10presents the correlations of the estimated semantic vectorsfor the word forms John John Welsch Janet and Clarkwith the targeted semantic vectors for the named entitiesJohnClark JohnWelsch JohnWiggam AnneHastieand JaneClark For the sequentially read words John andWelsch the semantic vector generated by John and thatgenerated by Welsch were summed to obtain the integratedsemantic vector for the composite name

Figure 10 upper left panel illustrates that upon readingthe personal name John there is considerable uncertainty

about which named entity is at issue When subsequentlythe family name is read (upper right panel) uncertainty isreduced and John Welsch now receives full support whereasother named entities have correlations close to zero or evennegative correlations As in the small world of the presenttoy example Janet is a unique personal name there is nouncertainty about what named entity is at issue when Janetis read For the family name Clark on its own by contrastJohn Clark and Janet Clark are both viable options

For the present example we assumed that every ortho-graphic word received one fixation However depending onwhether the eye lands far enough into the word namessuch as John Clark and compounds such as graph theorycan also be read with a single fixation For single-fixationreading learning events should comprise the joint trigrams ofthe two orthographic words which are then simultaneouslycalibrated against the targeted lexome (JohnClark graph-theory)

Obviously the true complexities of reading such asparafoveal preview are not captured by this example Never-theless as a rough first approximation this example showsa way forward for integrating time into the discriminativelexicon

7 General Discussion

We have presented a unified discrimination-driven modelfor visual and auditory word comprehension and wordproduction the discriminative lexicon The layout of thismodel is visualized in Figure 11 Input (cochlea and retina)and output (typing and speaking) systems are presented inlight gray the representations of themodel are shown in darkgray For auditory comprehension we make use of frequencyband summary (FBS) features low-level cues that we deriveautomatically from the speech signal The FBS features serveas input to the auditory network F119886 that maps vectors of FBSfeatures onto the semantic vectors of S For reading lettertrigrams represent the visual input at a level of granularitythat is optimal for a functional model of the lexicon (see [97]for a discussion of lower-level visual features) Trigrams arerelatively high-level features compared to the FBS featuresFor features implementing visual input at a much lower levelof visual granularity using histograms of oriented gradients[146] see Linke et al [15] Letter trigrams are mapped by thevisual network F119900 onto S

For reading we also implemented a network heredenoted by K119886 that maps trigrams onto auditory targets(auditory verbal images) represented by triphones Theseauditory triphones in turn are mapped by network H119886onto the semantic vectors S This network is motivated notonly by our observation that for predicting visual lexicaldecision latencies triphones outperform trigrams by a widemargin Network H119886 is also part of the control loop forspeech production see Hickok [147] for discussion of thiscontrol loop and the necessity of auditory targets in speechproduction Furthermore the acoustic durations with whichEnglish speakers realize the stem vowels of English verbs[148] as well as the duration of word final s in English [149]

Complexity 27

Table 8 Example sentences for the reading of proper names To keep the example simple inflectional lexomes are not taken into account

John Clark wrote great books about antsJohn Clark published a great book about dangerous antsJohn Welsch makes great photographs of airplanesJohn Welsch has a collection of great photographs of airplanes landingJohn Wiggam will build great harpsichordsJohn Wiggam will be a great expert in tuning harpsichordsAnne Hastie has taught statistics to great second year studentsAnne Hastie has taught probability to great first year studentsJanet Clark teaches graph theory to second year studentsJohnClark write great book about antJohnClark publish a great book about dangerous antJohnWelsch make great photograph of airplaneJohnWelsch have a collection of great photograph airplane landingJohnWiggam future build great harpsichordJohnWiggam future be a great expert in tuning harpsichordAnneHastie have teach statistics to great second year studentAnneHastie have teach probability to great first year studentJanetClark teach graphtheory to second year student

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

John

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

John Welsch

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

Janet

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

Clark

Pear

son

corr

elat

ion

minus0

50

00

51

0

Figure 10 Correlations of estimated semantic vectors for John JohnWelsch Janet andClarkwith the targeted semantic vectors of JohnClarkJohnWelsch JohnWiggam AnneHastie and JaneClark

can be predicted from ndl networks using auditory targets todiscriminate between lexomes [150]

Speech production is modeled by network G119886 whichmaps semantic vectors onto auditory targets which in turnserve as input for articulation Although current modelsof speech production assemble articulatory targets fromphonemes (see eg [126 147]) a possibility that is certainlycompatible with our model when phones are recontextual-ized as triphones we think that it is worthwhile investigatingwhether a network mapping semantic vectors onto vectors ofarticulatory parameters that in turn drive the articulators canbe made to work especially since many reduced forms arenot straightforwardly derivable from the phone sequences oftheir lsquocanonicalrsquo forms [111 144]

We have shown that the networks of our model havereasonable to high production and recognition accuraciesand also generate a wealth of well-supported predictions forlexical processing Central to all networks is the hypothesisthat the relation between form and meaning can be modeleddiscriminatively with large but otherwise surprisingly simplelinear networks the underlyingmathematics of which (linearalgebra) is well-understood Given the present results it isclear that the potential of linear algebra for understandingthe lexicon and lexical processing has thus far been severelyunderestimated (the model as outlined in Figure 11 is amodular one for reasons of computational convenience andinterpretabilityThe different networks probably interact (seefor some evidence [47]) One such interaction is explored

28 Complexity

typing speaking

orthographic targets

trigrams

auditory targets

triphones

semantic vectors

TASA

auditory cues

FBS features

orthographic cues

trigrams

cochlea retina

To Ta

Ha

GoGa

KaS

Fa Fo

CoCa

Figure 11 Overview of the discriminative lexicon Input and output systems are presented in light gray the representations of the model areshown in dark gray Each of these representations is a matrix with its own set of features Arcs are labeled with the discriminative networks(transformationmatrices) thatmapmatrices onto each otherThe arc from semantic vectors to orthographic vectors is in gray as thismappingis not explored in the present study

in Baayen et al [85] In their production model for Latininflection feedback from the projected triphone paths backto the semantics (synthesis by analysis) is allowed so thatof otherwise equally well-supported paths the path that bestexpresses the targeted semantics is selected For informationflows between systems in speech production see Hickok[147]) In what follows we touch upon a series of issues thatare relevant for evaluating our model

Incremental learning In the present study we esti-mated the weights on the connections of these networkswith matrix operations but importantly these weights canalso be estimated incrementally using the learning rule ofWidrow and Hoff [44] further improvements in accuracyare expected when using the Kalman filter [151] As allmatrices can be updated incrementally (and this holds aswell for the matrix with semantic vectors which are alsotime-variant) and in theory should be updated incrementally

whenever information about the order of learning events isavailable the present theory has potential for studying lexicalacquisition and the continuing development of the lexiconover the lifetime [79]

Morphology without compositional operations Wehave shown that our networks are predictive for a rangeof experimental findings Important from the perspective ofdiscriminative linguistics is that there are no compositionaloperations in the sense of Frege and Russell [1 2 152] Thework on compositionality in logic has deeply influencedformal linguistics (eg [3 153]) and has led to the beliefthat the ldquoarchitecture of the language facultyrdquo is groundedin a homomorphism between a calculus (or algebra) ofsyntactic representations and a calculus (or algebra) based onsemantic primitives Within this tradition compositionalityarises when rules combining representations of form arematched with rules combining representations of meaning

Complexity 29

The approach of Marelli and Baroni [60] who derive thesemantic vector of a complex word from the semantic vectorof its base and a dedicated affix-specific mapping from thepertinent base words to the corresponding derived wordsis in the spirit of compositionality in that a change in form(through affixation) goes hand in hand with a change inmeaning (modeled with a linear mapping in semantic space)

The perspective on morphological processing developedin the present study breaks with this tradition Althoughsemantic vectors for inflected words are obtained by summa-tion of the semantic vectors of the base word and those ofits inflectional functions (see [85] for modeling of the richverbal paradigms of Latin) the form vectors correspondingto the resulting semantic vectors are not obtained by thesummation of vectors for chunks of form nor by any otheroperations on forms Attempts to align chunks of wordsrsquo formwith their semantic structures mdash typically represented bygraphs constructed over semantic primitives (see eg [154])mdash are destined to end in quagmires of arbitrary decisionsabout the number of inflectional classes required (Estonian30 or 300) what the chunks should look like (does Germanhave a morpheme d in its articles and demonstratives derdie das diese ) whether Semitic morphology requiresoperations on form on one or two tiers [155ndash157] andhow many affixal position slots are required for Athebascanlanguages such asNavajo [158] A language that is particularlypainful for compositional theories is Yelı Dnye an Indo-Pacific language spoken on Rossel island (in the south-east ofPapuaNewGuinea)The language has a substantial inventoryof over a 1000 function words used for verb agreementTypically these words are monosyllables that simultaneouslyspecify values for negation tense aspect person and numberof subject deixis evidentiality associated motion and coun-terfactuality Furthermore verbs typically have no less thaneight different stems the use of which depends on particularconstellations of inflectional functions [159 160] Yelı Dnyethus provides a striking counterexample to the hypothesisthat a calculus of meaning would be paralleled by a calculusof form

Whereas the theory of linear discriminative learningradically rejects the idea that morphological processing isgrounded in compositionality as defined in mainstream for-mal linguistic theory it does allow for the conjoint expressionin form of lexical meanings and inflectional functions andfor forms to map onto such conjoint semantic vectors Butjust asmultiple inflectional functions can be expressed jointlyin a single word multiple words can map onto nonconjointsemantic vectors as was illustrated above for named entityrecognition John Wiggam can be understood to refer to avery specific person without this person being a composi-tional function of the proper name John and the surnameWiggam In both these cases there is no isomorphy betweeninflectional functions and lexical meanings on the one handand chunks of form on the other hand

Of course conjoint vectors for lexical meanings andinflectional functions can be implemented andmade to workonly because a Latin verb form such as sapıvissemus isunderstood when building the semantic space S to expresslexomes for person (first) number (plural) voice (active)

tense (pluperfect) mood (subjunctive) and lexical meaning(to know) (see [85] for computational modeling details)Although this could loosely be described as a decomposi-tional approach to inflected words we prefer not to use theterm lsquodecompositionalrsquo as in formal linguistics this termas outlined above has a very different meaning A moreadequate terminology would be that the system is conjointrealizational for production and conjoint inferential for com-prehension (we note here that we have focused on temporallyconjoint realization and inference leaving temporally disjointrealization and inference for sequences of words and possiblycompounds and fixed expression for further research)

Naive versus linear discriminative learning There isone important technical difference between the learningengine of the current study ldl and naive discriminativelearning (ndl) Because ndl makes the simplifying assump-tion that outcomes are orthogonal the weights from the cuesto a given outcome are independent of the weights from thecues to the other outcomes This independence assumptionwhich motivates the word naive in naive discriminativelearning makes it possible to very efficiently update networkweights during incremental learning In ldl by contrast thisindependence no longer holds learning is no longer naiveWe therefore refer to our networks not as ndl networksbut simply as linear discriminative learning networks Actualincremental updating of ldl networks is computationallymore costly but further numerical optimization is possibleand implementations are underway

Scaling An important property of the present approachis that good results are obtained already for datasets ofmodestsize Our semantic matrix is derived from the tasa corpuswhich with only 10 million words is dwarfed by the 28 billionwords of the ukWaC corpus used by Marelli and Baroni [60]more words than any individual speaker can ever encounterin their lifetime Similarly Arnold et al [18] report goodresults for word identification when models are trained on20 hours of speech which contrasts with the huge volumesof speech required for training deep learning networks forspeech recognition Although performance increases some-whatwithmore training data [161] it is clear that considerableheadway can be made with relatively moderate volumesof speech It thus becomes feasible to train models onfor instance Australian English and New Zealand Englishusing already existing speech corpora and to implementnetworks that make precise quantitative predictions for howunderstanding is mediated by listenersrsquo expectations abouttheir interlocutors (see eg [162])

In this study we have obtained good results with asemantic matrix with a dimensionality of around 4000lexomes By itself we find it surprising that a semantic vectorspace can be scaffolded with a mere 4000 words But thisfindingmay also shed light on the phenomenon of childhoodamnesia [163 164] Bauer and Larkina [165] pointed outthat young children have autobiographical memories thatthey may not remember a few years later They argue thatthe rate of forgetting diminishes as we grow older andstabilizes in adulthood Part of this process of forgettingmay relate to the changes in the semantic matrix Cruciallywe expect the correlational structure of lexomes to change

30 Complexity

considerably over time as experience accumulates If this isindeed the case the semantic vectors for the lexomes foryoung children are different from the corresponding lexomesfor older children which will again differ from those ofadults As a consequence autobiographical memories thatwere anchored to the lexomes at a young age can no longerbe accessed as the semantic vectors to which these memorieswere originally anchored no longer exist

No examplars Given how we estimate the transforma-tion matrices with which we navigate between form andmeaning it might seem that our approach is in essence animplementation of an exemplar-based theory of the mentallexicon After all for instance the C119886 matrix specifies foreach auditory word form (exemplar) its pertinent frequencyband summary features However the computational imple-mentation of the present study should not be confused withthe underlying algorithmic conceptualization The underly-ing conceptualization is that learning proceeds incrementallytrial by trial with the weights of transformation networksbeing updated with the learning rule of Widrow and Hoff[44] In the mean trial-by-trial updating with the Widrow-Hoff learning rule results in the same expected connectionweights as obtained by the present estimates using thegeneralized inverse The important conceptual advantage ofincremental learning is however that there is no need to storeexemplars By way of example for auditory comprehensionacoustic tokens are given with the input to the learningsystemThese tokens the ephemeral result of interactingwiththe environment leave their traces in the transformationmatrix F119886 but are not themselves stored The same holdsfor speech production We assume that speakers dynamicallyconstruct the semantic vectors from their past experiencesrather than retrieving these semantic vectors from somedictionary-like fixed representation familiar from currentfile systems The dynamic nature of these semantic vectorsemerged loud and clear from the modeling of novel complexwords for speech production under cross-validation In otherwords our hypothesis is that all input and output vectors areephemeral in the sense that they represent temporary statesof the cognitive system that are not themselves represented inthat system but that leave their traces in the transformationmatrices that jointly characterize and define the cognitivesystem that we approximate with the discriminative lexicon

This dynamic perspective on the mental lexicon asconsisting of several coupled networks that jointly bringthe cognitive system into states that in turn feed into fur-ther networks (not modeled here) for sentence integration(comprehension) and articulation (production) raises thequestion about the status of units in this theory Units playan important role when constructing numeric vectors forform and meaning Units for letter trigrams and phonetrigrams are of course context-enriched letter and phoneunits Content lexomes as well as lexomes for derivationaland inflectional functions are crutches for central semanticfunctions ranging from functions realizing relatively concreteonomasiological functions (lsquothis is a dog not a catrsquo) toabstract situational functions allowing entities events andstates to be specified on the dimensions of time spacequantity aspect etc However crucial to the central thrust of

the present study there are no morphological symbols (unitscombining form andmeaning) normorphological formunitssuch as stems and exponents of any kind in the modelAbove we have outlined the many linguistic considerationsthat have led us not to want to incorporate such units aspart of our theory Importantly as there are no hidden layersin our model it is not possible to seek for lsquosubsymbolicrsquoreflexes of morphological units in patterns of activation overhidden layers Here we depart from earlier connectionistmodels which rejected symbolic representations but retainedthe belief that there should be morphological representationalbeit subsymbolic ones Seidenberg and Gonnerman [166]for instance discuss morphological representations as beinglsquointerlevel representationsrsquo that are emergent reflections ofcorrelations among orthography phonology and semanticsThis is not to say that themore agglutinative a language is themore the present model will seem to the outside observer tobe operating withmorphemes But themore a language tendstowards fusional or polysyntheticmorphology the less strongthis impression will be

Model complexity Next consider the question of howdifficult lexical processing actually is Sidestepping the com-plexities of the neural embedding of lexical processing [126147] here we narrow down this question to algorithmiccomplexity State-of-the-art models in psychology [37 114167] implementmany layers of hierarchically organized unitsand many hold it for an established fact that such units arein some sense psychologically real (see eg [19 91 168])However empirical results can be equally well understoodwithout requiring the theoretical construct of the morpheme(see eg [47 54 58 166 169 170]) The present studyillustrates this point for lsquoboundaryrsquo effects in written andoral production When paths in the triphone graph branchuncertainty increases and processing slows down Althoughobserved lsquoboundaryrsquo effects are compatible with a post-Bloomfieldian lexicon the very same effects arise in ourmodel even though morphemes do not figure in any way inour computations

Importantly the simplicity of the linear mappings in ourmodel of the discriminative lexicon allows us to sidestep theproblem that typically arise in computationally implementedfull-scale hierarchical systems namely that errors arisingat lower levels propagate to higher levels The major stepsforward made in recent end-to-end models in languageengineering suggest that end-to-end cognitive models arealso likely to perform much better One might argue thatthe hidden layers of deep learning networks represent thetraditional lsquohidden layersrsquo of phonemes morphemes andword forms mediating between form and meaning in lin-guistic models and offshoots thereof in psychology Howeverthe case of baboon lexical learning [12 13] illustrates thatthis research strategy is not without risk and that simplerdiscriminative networks can substantially outperform deeplearning networks [15] We grant however that it is conceiv-able that model predictions will improve when the presentlinear networks are replaced by deep learning networks whenpresented with the same input and output representationsused here (see Zhao et al [171] for a discussion of lossfunctions for deep learning targeting numeric instead of

Complexity 31

categorical output vectors) What is unclear is whethersuch networks will improve our understandingmdashthe currentlinear networks offer the analyst connection weights betweeninput and output representations that are straightforwardlylinguistically interpretable Where we think deep learningnetworks really will come into their own is bridging the gapbetween for instance the cochlea and retina on the one handand the heuristic representations (letter trigrams) that wehave used as a starting point for our functional theory of themappings between form and meaning

Network flexibility An important challenge for deeplearning networks designed to model human lexical process-ing is to keep the networks flexible and open to rapid updat-ing This openness to continuous learning is required notonly by findings on categorization [40 41 41 42] includingphonetic categorization [172 173] but is also visible in timeseries of reaction times Baayen and Hendrix [17] reportedimproved prediction of visual lexical decision reaction timesin the British Lexicon Project when a naive discriminativelearning network is updated from trial to trial simulating thewithin-experiment learning that goes on as subjects proceedthrough the experiment In natural speech the phenomenonof phonetic convergence between interlocutors [174] likewisebears witness to a lexicon that is constantly recalibratedin interaction with its environment (for the many ways inwhich interlocutorsrsquo speech aligns see [175]) Likewise therapidity with which listeners adapt to foreign accents [176]within a few minutes shows that lexical networks exhibitfast local optimization As deep learning networks typicallyrequire huge numbers of cycles through the training dataonce trained they are at risk of not having the requiredflexibility for fast local adaptation This risk is reduced for(incremental) wide learning as illustrated by the phoneticconvergence evidenced by the model of Arnold et al [18] forauditory comprehension (a very similar point was made byLevelt [177] in his critique of connectionist models in thenineties)

Open questions This initial study necessarily leavesmany questions unanswered Compounds and inflectedderived words have not been addressed and morphologicaloperations such as reduplication infixation and nonconcate-native morphology provide further testing grounds for thepresent approach We also need to address the processingof compounds with multiple constituents ubiquitous inlanguages as different as German Mandarin and EstonianFurthermore in actual utterances words undergo assimila-tion at their boundaries and hence a crucial test case forthe discriminative lexicon is to model the production andcomprehension of words in multi-word utterances For ourproduction model we have also completely ignored stresssyllabification and pitch which however likely has renderedthe production task more difficult than necessary

A challenge for discriminative morphology is the devel-opment of proper semantic matrices For languages withrich morphology such as Estonian current morphologicalparsers will identify case-inflected words as nominativesgenitives or partitives (among others) but these labelsfor inflectional forms do not correspond to the semanticfunctions encoded in these forms (see eg [178 179]) What

is required are computational tools that detect the appropriateinflectional lexomes for these words in the sentences orutterances in which they are embedded

A related problem specifically for the speech productionmodel is how to predict the strongly reduced word formsthat are rampant in conversational speech [111] Our auditorycomprehension network F119886 is confronted with reduced formsin training and the study of Arnold et al [18] indicates thatpresent identification accuracymay approximate human per-formance for single-word recognition Whereas for auditorycomprehension we are making progress what is required forspeech production is a much more detailed understanding ofthe circumstances under which speakers actually use reducedword forms The factors driving reduction may be in partcaptured by the semantic vectors but we anticipate thataspects of the unfolding discourse play an important role aswell which brings us back to the unresolved question of howto best model not isolated words but words in context

An important challenge for our theory which formalizesincremental implicit learning is how to address and modelthe wealth of explicit knowledge that speakers have abouttheir language We play with the way words sound in poetrywe are perplexed by wordsrsquo meanings and reinterpret themso that they make more sense (eg the folk etymology [180]reflected in modern English crawfish a crustacean and nota fish originally Middle English crevis compare ecrevissein French) we teach morphology in schools and we evenfind that making morphological relations explicit may bebeneficial for aphasic patients [181] The rich culture of worduse cannot but interact with the implicit system but howexactly this interaction unfolds andwhat its consequences areat both sides of the consciousunconscious divide is presentlyunclear

Although many questions remain our model is remark-ably successful in capturing many aspects of lexical andmorphological processing in both comprehension and pro-duction Since themodel builds on a combination of linguisti-callymotivated lsquosmartrsquo low-level representations for form andmeaning and large but otherwise straightforward andmathe-matically well-understood two-layer linear networks (to beclear networks without any hidden layers) the conclusionseems justified that algorithmically the ldquomental lexiconrdquomaybe much simpler than previously thought

Appendix

A Vectors Matrices and MatrixMultiplication

Figure 12 presents two data points 119886 and 119887 with coordinates(3 4) and (-2 -3) Arrows drawn from the origin to thesepoints are shown in blue with a solid and dashed linerespectively We refer to these points as vectors Vectors aredenoted with lower case letters in bold and we place the 119909coordinate of a point next to the 119910 coordinate

a = (3 4) b = (minus2 minus3) (A1)

32 Complexity

minus4 minus2 0 2 4

minus4

minus2

02

4

x

y

a

b

x

y

Figure 12 Points 119886 and 119887 can be transformed into points 119901 and 119902 bya linear transformation

We can represent a and b jointly by placing them next to eachother in a matrix For matrices we use capital letters in boldfont

A = ( 3 4minus2 minus3) (A2)

In addition to the points 119886 and 119887 Figure 12 also shows twoother datapoints 119909 and 119910 The matrix for these two points is

B = (minus5 24 minus1) (A3)

Let us assume that points 119886 and 119887 represent the forms oftwo words 1205961 and 1205962 and that the points 119909 and 119910 representthe meanings of these two words (below we discuss in detailhow exactly wordsrsquo forms and meanings can be representedas vectors of numbers) As morphology is the study of therelation between wordsrsquo forms and their meanings we areinterested in how to transform points 119886 and b into points119909 and 119910 and likewise in how to transform points 119909 and119910 back into points 119886 and 119887 The first transformation isrequired for comprehension and the second transformationfor production

We begin with the transformation for comprehensionFormally we have matrix A which we want to transforminto matrix B A way of mapping A onto B is by means ofa linear transformation using matrix multiplication For 2 times2 matrices matrix multiplication is defined as follows(1198861 11988621198871 1198872) (1199091 11990921199101 1199102) = (11988611199091 + 11988621199101 11988611199092 + 1198862119910211988711199091 + 11988721199101 11988711199092 + 11988721199102) (A4)

Such a mapping of A onto B is given by a matrix F

F = ( 1 2minus2 minus1) (A5)

and it is straightforward to verify that indeed( 3 4minus2 minus3) ( 1 2minus2 minus1) = (minus5 24 minus1) (A6)

Using matrix notation we can write

AF = B (A7)

Given A and B how do we obtain F The answeris straightforward but we need two further concepts theidentity matrix and the matrix inverse For multiplication ofnumbers the identity multiplication is multiplication with 1For matrices multiplication with the identity matrix I leavesthe locations of the points unchangedThe identitymatrix hasones on the main diagonal and zeros elsewhere It is easilyverified that indeed(1 00 1) (1199091 11990921199101 1199102) = (1199091 11990921199101 1199102) (1 00 1) = (1199091 11990921199101 1199102) (A8)

For numbers the inverse of multiplication with 119904 is dividingby 119904 (1119904 ) (119904119909) = 119909 (A9)

For matrices the inverse of a square matrix X is that matrixXminus1 such that their product is the identity matrix

Xminus1X = XXminus1 = I (A10)

For nonsquare matrices the inverse Yminus1 is defined such that

Yminus1 (YX) = (XY) 119884minus1 = X (A11)

We find the square matrix F that maps the square matrices Aonto B as follows

AF = B

Aminus1AF = Aminus1B

IF = Aminus1B

F = Aminus1B(A12)

Since for the present example A and its inverse happen to beidentical (Aminus1 = A) we obtain (A6)

Fmaps wordsrsquo form vectors onto wordsrsquo semantic vectorsLet us now consider the reverse and see how we can obtainwordsrsquo form vectors from wordsrsquo semantic vectors That isgiven B we want to find that matrix G which maps B ontoA ie

BG = A (A13)

Solving this equation proceeds exactly as above

BG = A

Bminus1BG = Bminus1A

IG = Bminus1A

G = Bminus1A(A14)

Complexity 33

The inverse of B is

Bminus1 = (13 2343 53) (A15)

and hence we have

(13 2343 53) ( 3 4minus2 minus3) = (minus13 minus2323 13 ) = G (A16)

The inverse of a matrix needs not exist A square matrixthat does not have an inverse is referred to as a singularmatrix Almost all matrices with which we work in theremainder of this study are singular For singular matricesan approximation of the inverse can be used such as theMoore-Penrose generalized inverse (In R the generalizedinverse is available in theMASS package function ginv Thenumeric library that is used by ginv is LAPACK availableat httpwwwnetliborglapack) In this study wedenote the generalized inverse of a matrix X by X1015840

The examples presented here are restricted to 2 times 2matrices but themathematics generalize tomatrices of largerdimensions When multiplying two matrices X and Y theonly constraint is that the number of columns ofX is the sameas the number of rows ofYThe resultingmatrix has the samenumber of rows as X and the same number of columns as Y

In the present study the rows of matrices represent wordsand columns the features of thesewords For instance the rowvectors of A can represent wordsrsquo form features and the rowvectors of and B the corresponding semantic features Thefirst row vector of A is mapped onto the first row vector ofB as follows (3 4) ( 1 2minus2 minus1) = (minus5 2) (A17)

A transformationmatrix such as F can be represented as atwo-layer networkThe network corresponding to F is shownin Figure 13 When the input vector (3 4) is presented to thenetwork the nodes 1198941 and 1198942 are set to 3 and 4 respectively Toobtain the activations 1199001 = minus5 and 1199002 = 2 for the output vector(minus5 2) we cumulate the evidence at the input weighted bythe connection strengths specified on the edges of the graphThus for 1199001 we obtain the value 3 times 1 + 4 times minus2 = minus5 and for1199002 we have 3 times 2 + 4 times minus1 = 2 In other words the networkmaps the input vector (3 4) onto the output vector (minus5 2)

An important property of linear maps is that they areproductive We can present a novel form vector say

s = (2 2) (A18)

to the network and it will map this vector onto a novelsemantic vector Using matrices this new semantic vector isstraightforwardly calculated(2 2) ( 1 2minus2 minus1) = (minus2 2) (A19)

1 2 minus2 minus1

i1 i2

o1 o2

semantic vectors come out here

form features go in here

connection weightsare the transformationmatrix

Figure 13 The linear network corresponding to the transformationmatrix F There are two input nodes 1198941 and 1198942 (in blue)and twooutput nodes 1199001 and 1199002 in red When the input vector (3 4) ispresented to the network the value at 1199001 is 3 times 1 + 4 times minus2 = minus5and that at 1199002 is 3 times 2 + 4 times minus1 = 2 The network maps the point(3 4) onto the point (minus5 2)B Graph-Based Triphone Sequencing

The hypothesis underlying graph-based triphone sequencingis that in the directed graph that has triphones as verticesand an edge between any two vertices that overlap (iesegments 2 and 3 of triphone 1 are identical to segments 1and 2 of triphone 2) the path from a wordrsquos initial triphone(the triphone starting with ) to a wordrsquos final triphone (thetriphone ending with a ) is the path receiving the bestsupport of all possible paths from the initial to the finaltriphone Unfortunately the directed graph containing allvertices and edges is too large tomake computations tractablein a reasonable amount of computation time The followingalgorithm is a heuristic algorithm that first collects potentiallyrelevant vertices and edges and then calculates all pathsstarting fromany initial vertex using theall simple pathsfunction from the igraph package [127] As a first step allvertices that are supported by the stem and whole wordwith network support exceeding 01 are selected To theresulting set those vertices are added that are supported bythe affix which is conditional on these vertices having anetwork support exceeding 095 A directed graph can nowbe constructed and for each initial vertex all paths startingat these vertices are calculated The subset of those pathsreaching a final vertex is selected and the support for eachpath is calculated As longer paths trivially will have greatersupport the support for a path is weighted for its lengthsimply by dividing the raw support by the path length Pathsare ranked by weighted path support and the path withmaximal support is selected as the acoustic image drivingarticulation

It is possible that no path from an initial vertex to a finalvertex is found due to critical boundary triphones not beinginstantiated in the data on which the model was trainedThis happens when the model is assessed on novel inflected

34 Complexity

nobo

dyso

met

hing

ever

ythi

ngno

thin

gev

eryo

nean

ythi

ngan

yone

som

eone

hers

him his

via

than I

exce

ptou

tside

thei

rsw

hoev

erth

ence

min

ein

side

amon

gst its

thei

rou

rm

yyo

uyo

ur her

she

your

selv

esw

hate

ver us we it

they he

them m

em

yself

him

self

hers

elfou

rselv

esyo

urse

lfits

elfth

emse

lves

abou

tw

ithou

tw

ithin

beyo

ndam

ong

betw

een

thro

ugho

utdu

ring

since by

from in of

with at fo

r tobe

low

abov

e up offdo

wn

upon on

agai

nst

tow

ard

tow

ards

arou

ndne

aral

ong

unde

rbe

hind

besid

eth

roug

hon

toin

toov

erac

ross

nobodysomethingeverythingnothingeveryoneanythinganyonesomeonehershimhisviathanIexceptoutsidetheirswhoeverthencemineinsideamongstitstheirourmyyouyourhersheyourselveswhateverusweittheyhethemmemyselfhimselfherselfourselvesyourselfitselfthemselvesaboutwithoutwithinbeyondamongbetweenthroughoutduringsincebyfrominofwithatfortobelowaboveupoffdownupononagainsttowardtowardsaroundnearalongunderbehindbesidethroughontointooveracross

minus01 0 01 02Value

Color Keyand Histogram

050

015

0025

00C

ount

Figure 14 Heatmap for the correlations of the semantic vectors of pronouns and prepositions Note that quantifier pronouns form a groupof their own that prepositions and the other pronouns form distinguisable groups and that within groups further subclusters are visible egforwe and us she and her and across and over All diagonal elements represent correlations equal to 1 (this is not brought out by the heatmapwhich color-codes diagonal elements with the color for the highest correlation of the range we specified 025)

and derived words under cross-validationTherefore verticesand edges are added to the graph for all nonfinal verticesthat are at the end of any of the paths starting at any initialtriphone Such novel vertices and edges are assigned zeronetwork support Typically paths from the initial vertex tothe final vertex can now be constructed and path support isevaluated as above

For an algorithm that allows triphone paths to includecycles which may be the case in languages with much richer

morphology than English see Baayen et al [85] and for codethe R packageWpmWithLdl described therein

C A Heatmap for Function Words(Pronouns and Prepositions)

The heatmap for the correlations of the semantic vectors ofpronouns and prepositions is presented in Figure 14

Complexity 35

Data Availability

With the exception of the data from theDistributed Little RedHen Lab all primary data are publicly available from the citedsources

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The authors are indebted to Geoff Hollis Jessie Nixon ChrisWestbury and LuisMienhardt for their constructive feedbackon earlier versions of this manuscript This research wassupported by an ERC advancedGrant (no 742545) to the firstauthor

References

[1] G Frege ldquoBegriffsschrift a formula language modeled uponthat of arithmetic for pure thoughtrdquo in From Frege to GodelA Source Book in Mathematical Logic 1879-1931 pp 1ndash82 1967

[2] B RussellAn Inquiry intoMeaning andTruth Allen andUnwinLondon UK 1942

[3] R Montague ldquoThe proper treatment of quantification in ordi-nary englishrdquo in Approaches to Natural Language pp 221ndash242Springer 1973

[4] N Chomsky Syntactic Structures vol 33 Mouton The Hague1957

[5] W J M Levelt Formal Grammars in Linguistics And Psycholin-guistics Volume 1 An Introduction to the Theory of FormalLanguages and Automata Volume 2 Applications in LinguisticTheory Volume 3 Psycholinguistic Applications John BenjaminsPublishing 2008

[6] D F Kleinschmidt and T Florian Jaeger ldquoRobust speechperception Recognize the familiar generalize to the similar andadapt to the novelrdquo Psychological Review vol 122 no 2 pp 148ndash203 2015

[7] L Breiman ldquoRandom forestsrdquo Machine Learning vol 45 no 1pp 5ndash32 2001

[8] C Strobl A-L Boulesteix A Zeileis and T Hothorn ldquoBiasin random forest variable importance measures illustrationssources and a solutionrdquo BMC Bioinformatics vol 8 2007

[9] A Hannun C Case J Casper et al ldquoDeep speechScaling up end-to-end speech recognitionrdquo 2014httpsarxivorgabs14125567

[10] Z Tuske P Golik R Schluter and H Ney ldquoAcoustic modelingwith deep neural networks using raw time signal for lvcsrrdquo inProceedings of the INTERSPEECH pp 890ndash894 2014

[11] J Schmidhuber ldquoDeep learning in neural networks anoverviewrdquo Neural Networks vol 61 pp 85ndash117 2015

[12] T Hannagan J C Ziegler S Dufau J Fagot and J GraingerldquoDeep learning of orthographic representations in baboonsrdquoPLoS ONE vol 9 no 1 2014

[13] J Grainger S Dufau M Montant J C Ziegler and J FagotldquoOrthographic processing in baboons (Papio papio)rdquo Sciencevol 336 no 6078 pp 245ndash248 2012

[14] D Scarf K Boy A U Reinert J Devine O Gunturkun andM Colombo ldquoOrthographic processing in pigeons (columba

livia)rdquo Proceedings of the National Academy of Sciences vol 113no 40 pp 11272ndash11276 2016

[15] M Linke F Broker M Ramscar and R H Baayen ldquoArebaboons learning ldquoorthographicrdquo representations Probablynotrdquo PLoS ONE vol 12 no 8 2017

[16] D H Hubel and T N Wiesel ldquoReceptive fields binocularinteraction and functional architecture in the catrsquos visualcortexrdquo The Journal of Physiology vol 160 pp 106ndash154 1962

[17] R H Baayen and P Hendrix ldquoTwo-layer networks non-linear separation and human learningrdquo in From Semantics toDialectometry Festschrift in honor of JohnNerbonne Tributes 32M Wieling M Kroon G Van Noord and G Bouma Eds pp13ndash22 College Publications 2017

[18] D Arnold F Tomaschek K Sering F Lopez and R H BaayenldquoWords from spontaneous conversational speech can be rec-ognized with human-like accuracy by an error-driven learningalgorithm that discriminates between meanings straight fromsmart acoustic features bypassing the phoneme as recognitionunitrdquo PLoS ONE vol 12 no 4 2017

[19] P Zwitserlood ldquoProcessing and representation of morphologi-cal complexity in native language comprehension and produc-tionrdquo in The Construction of Words G E Booij Ed pp 583ndash602 Springer 2018

[20] G Booij ldquoConstruction morphologyrdquo Language and LinguisticsCompass vol 4 no 7 pp 543ndash555 2010

[21] J P Blevins ldquoWord-based morphologyrdquo Journal of Linguisticsvol 42 no 3 pp 531ndash573 2006

[22] G Stump Inflectional Morphology A Theory of ParadigmStructure Cambridge University Press 2001

[23] R Beard ldquoOn the extent and nature of irregularity in thelexiconrdquo Lingua vol 42 no 4 pp 305ndash341 1977

[24] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1974

[25] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1991

[26] C Hockett ldquoThe origin of speechrdquo Scientific American vol 203pp 89ndash97 1960

[27] M Erelt Estonian Language M Erelt Ed Estonian AcademyPublishers Tallinn Estonia 2003

[28] P H Matthews Grammatical Theory in the United States fromBloomfield to Chomsky Cambridge University Press Cam-bridge 1993

[29] N Chomsky andMHalleTheSound Pattern of English Harperand Row NY USA 1968

[30] T Landauer and S Dumais ldquoA solution to Platorsquos problem Thelatent semantic analysis theory of acquisition induction andrepresentation of knowledgerdquo Psychological Review vol 104 no2 pp 211ndash240 1997

[31] W Weaver ldquoTranslationrdquo in Machine Translation of LanguagesFourteen Essays W N Locke and A D Booth Eds pp 15ndash23MIT Press Cambridge 1955

[32] J R Firth Selected Papers of J R Firth 1952-59 IndianaUniversity Press 1968

[33] K Lund and C Burgess ldquoProducing high-dimensional seman-tic spaces from lexical co-occurrencerdquo Behavior Research Meth-ods Instruments and Computers vol 28 no 2 pp 203ndash2081996

[34] C Shaoul and C Westbury ldquoExploring lexical co-occurrencespace using HiDExrdquo Behavior Research Methods vol 42 no 2pp 393ndash413 2010

36 Complexity

[35] T Mikolov I Sutskever K Chen G Corrado and J DeanldquoDistributed representations of words and phrases and theircompositionalityrdquo inAdvances in Neural Information ProcessingSystems pp 3111ndash3119 2013

[36] A Roelofs ldquoThe WEAVER model of word-form encoding inspeech productionrdquo Cognition vol 64 no 3 pp 249ndash284 1997

[37] W J M Levelt A Roelofs and A S Meyer ldquoA theory of lexicalaccess in speech productionrdquoBehavioral and Brain Sciences vol22 no 1 pp 1ndash38 1999

[38] M Taft ldquoInteractive-activation as a Framework for Under-standing Morphological Processingrdquo Language and CognitiveProcesses vol 9 no 3 pp 271ndash294 1994

[39] R Schreuder and R H Baayen ldquoModeling morphologicalprocessingrdquo inMorphological Aspects of Language Processing LB Feldman Ed pp 131ndash154 Lawrence ErlbaumHillsdale NewJersey USA 1995

[40] B C Love D L Medin and T M Gureckis ldquoSUSTAIN ANetwork Model of Category Learningrdquo Psychological Reviewvol 111 no 2 pp 309ndash332 2004

[41] C J Marsolek ldquoWhat antipriming reveals about primingrdquoTrends in Cognitive Sciences vol 12 no 5 pp 176ndash181 2008

[42] M Ramscar and R Port ldquoCategorization (without categories)rdquoin Handbook of Cognitive Linguistics E Dabrowska and DDivjak Eds pp 75ndash99 De Gruyter Berlin Germany 2015

[43] R A Rescorla and A R Wagner ldquoA theory of Pavlovianconditioning Variations in the effectiveness of reinforcementand nonreinforcementrdquo in Classical Conditioning II CurrentResearch and Theory A H Black and W F Prokasy Eds pp64ndash99 Appleton Century Crofts NY USA 1972

[44] B Widrow and M E Hoff ldquoAdaptive switching circuitsrdquo 1960WESCON Convention Record Part IV pp 96ndash104 1960

[45] R H Baayen R Schreuder and R Sproat ldquoMorphology inthe Mental Lexicon A Computational Model for Visual WordRecognitionrdquo in Lexicon Development for Speech and LanguageProcessing F van Eynde and D Gibbon Eds pp 267ndash291Kluwer Academic Publishers 2000

[46] M Taft ldquoA morphological-decomposition model of lexicalrepresentationrdquo Linguistics vol 26 no 4 pp 657ndash667 1988

[47] MWHarmandM S Seidenberg ldquoComputing themeanings ofwords in reading Cooperative division of labor between visualand phonological processesrdquo Psychological Review vol 111 no3 pp 662ndash720 2004

[48] P C Trimmer J M McNamara A Houston and J A RMarshall ldquoDoes natural selection favour the Rescorla-Wagnerrulerdquo Journal of Theoretical Biology vol 302 pp 39ndash52 2012

[49] R R Miller R C Barnet and N J Grahame ldquoAssessment ofthe Rescorla-Wagner modelrdquo Psychological Bulletin vol 117 no3 pp 363ndash386 1995

[50] L J Kamin ldquoPredictability surprise attention and condition-ingrdquo in Punishment and Aversive Behavior B A Campbell andRMChurch Eds pp 276ndash296Appleton-Century-CroftsNYUSA 1969

[51] R A Rescorla ldquoPavlovian conditioning Itrsquos not what you thinkit isrdquo American Psychologist vol 43 no 3 pp 151ndash160 1988

[52] M Ramscar D Yarlett M Dye K Denny and K ThorpeldquoThe Effects of Feature-Label-Order and Their Implications forSymbolic Learningrdquo Cognitive Science vol 34 no 6 pp 909ndash957 2010

[53] M Ramscar and D Yarlett ldquoLinguistic self-correction in theabsence of feedback A new approach to the logical problem oflanguage acquisitionrdquo Cognitive Science vol 31 no 6 pp 927ndash960 2007

[54] R H Baayen P Milin D F ETHurđevic P Hendrix and MMarelli ldquoAn Amorphous Model for Morphological Process-ing in Visual Comprehension Based on Naive DiscriminativeLearningrdquo Psychological Review vol 118 no 3 pp 438ndash4812011

[55] F M Del Prado Martın R Bertram T Haikio R Schreuderand R H Baayen ldquoMorphological family size in a morphologi-cally rich languageThe case of Finnish compared to Dutch andHebrewrdquo Journal of Experimental Psychology LearningMemoryand Cognition vol 30 pp 1271ndash1278 2004

[56] P Milin D F Durdevic and F M Del Prado Martın ldquoThesimultaneous effects of inflectional paradigms and classes onlexical recognition Evidence from Serbianrdquo Journal of Memoryand Language vol 60 no 1 pp 50ndash64 2009

[57] B K Bergen ldquoThe psychological reality of phonaesthemesrdquoLanguage vol 80 no 2 pp 290ndash311 2004

[58] P Milin L B Feldman M Ramscar P Hendrix and R HBaayen ldquoDiscrimination in lexical decisionrdquo PLoS ONE vol 12no 2 2017

[59] R H Baayen P Milin and M Ramscar ldquoFrequency in lexicalprocessingrdquo Aphasiology vol 30 no 11 pp 1174ndash1220 2016

[60] M Marelli and M Baroni ldquoAffixation in semantic space Mod-eling morpheme meanings with compositional distributionalsemanticsrdquo Psychological Review vol 122 no 3 pp 485ndash5152015

[61] T Sering PMilin andRH Baayen ldquoLanguage comprehensionas amultiple label classification problemrdquo StatisticaNeerlandicapp 1ndash15 2018

[62] R Kaye and RWilson Linear Algebra Oxford University Press1998

[63] S H Ivens and B L Koslin Demands for Reading LiteracyRequire New Accountability Methods Touchstone Applied Sci-ence Associates 1991

[64] T K Landauer P W Foltz and D Laham ldquoAn introduction tolatent semantic analysisrdquoDiscourse Processes vol 25 no 2-3 pp259ndash284 1998

[65] H Schmid ldquoImprovements in Part-of-Speech Tagging with anApplication to Germanrdquo in Proceedings of the ACL SIGDAT-Workshop pp 13ndash25 Dublin Ireland 1995

[66] R H Baayen R Piepenbrock and L Gulikers The CELEXLexical Database (CD-ROM) Linguistic Data ConsortiumUniversity of Pennsylvania Philadelphia PA 1995

[67] J Mitchell and M Lapata ldquoVector-based models of semanticcompositionrdquo in Proceedings of the ACL pp 236ndash244 2008

[68] A Lazaridou M Marelli R Zamparelli and M BaronildquoCompositionally derived representations of morphologicallycomplex words in distributional semanticsrdquo in Proceedings ofthe ACL (1) pp 1517ndash1526 2013

[69] R Cotterell H Schutze and J Eisner ldquoMorphological smooth-ing and extrapolation of word embeddingsrdquo in Proceedings ofthe 54th Annual Meeting of the Association for ComputationalLinguistics vol 1 pp 1651ndash1660 2016

[70] B D Zeller S Pado and J Snajder ldquoTowards semantic valida-tion of a derivational lexiconrdquo in Proceedings of the COLING2014 the 25th International Conference on Computational Lin-guistics Technical Papers pp 1728ndash1739 2014

[71] H Marchand The Categories and Types of Present-Day EnglishWord Formation A Synchronic-Diachronic Approach BeckrsquoscheVerlagsbuchhandlung Munchen Germany 1969

[72] L Bauer EnglishWord-Formation Cambridge University PressCambridge 1983

Complexity 37

[73] I PlagWord-Formation in English CambridgeUniversity PressCambridge 2003

[74] R Beard Lexeme-Morpheme Base Morphology A GeneralTheory of Inflection and Word Formation State University ofNew York Press Albany NY USA 1995

[75] K Geeraert J Newman and R H Baayen ldquoIdiom VariationExperimental Data and a Blueprint of a Computational ModelrdquoTopics in Cognitive Science vol 9 no 3 pp 653ndash669 2017

[76] C Shaoul S Bitschau N Schilling et al ldquondl2 Naive discrim-inative learning an implementation in R R packagerdquo 2015

[77] P Milin D Divjak and R H Baayen ldquoA learning perspectiveon individual differences in skilled reading Exploring andexploiting orthographic and semantic discrimination cuesrdquoJournal of Experimental Psychology Learning Memory andCognition vol 43 no 11 pp 1730ndash1751 2017

[78] M Ramscar P Hendrix C Shaoul P Milin and R H BaayenldquoNonlinear dynamics of lifelong learning the myth of cognitivedeclinerdquo Topics in Cognitive Science vol 6 pp 5ndash42 2014

[79] M Ramscar C C Sun P Hendrix and R H Baayen ldquoTheMis-measurement of Mind Life-Span Changes in Paired-Associate-Learning Scores Reflect the ldquoCostrdquo of Learning Not CognitiveDeclinerdquo Psychological Science vol 28 no 8 pp 1171ndash1179 2017

[80] G desRosiers and D Ivison ldquoPaired associate learning Form 1and form 2 of the Wechsler memory scalerdquo Archives of ClinicalNeuropsychology vol 3 no 1 pp 47ndash67 1988

[81] E Bruni N K Tran andM Baroni ldquoMultimodal distributionalsemanticsrdquo Journal of Artificial Intelligence Research vol 49 pp1ndash47 2014

[82] W N Venables and B D RipleyModern Applied Statistics withS-PLUS Springer NY USA 2002

[83] A B Warriner V Kuperman and M Brysbaert ldquoNorms ofvalence arousal and dominance for 13915 English lemmasrdquoBehavior Research Methods vol 45 no 4 pp 1191ndash1207 2013

[84] D Kastovsky ldquoProductivity in word formationrdquo Linguistics vol24 no 3 pp 585ndash600 1986

[85] R H Baayen Y Chuang and J P Blevins ldquoInflectionalmorphology with linear mappingsrdquoTheMental Lexicon vol 13no 2 pp 232ndash270 2018

[86] Burnage CELEX A Guide for Users Centre for Lexical Infor-mation Nijmegen The Netherlands 1988

[87] C McBride-Chang ldquoModels of Speech Perception and Phono-logical Processing in Readingrdquo Child Development vol 67 no4 pp 1836ndash1856 1996

[88] G C Van Orden H Kloos et al ldquoThe question of phonologyand readingrdquo inThe Science of Reading A Handbook pp 61ndash782005

[89] D Jared and K OrsquoDonnell ldquoSkilled adult readers activate themeanings of high-frequency words using phonology Evidencefrom eye trackingrdquoMemoryampCognition vol 45 no 2 pp 334ndash346 2017

[90] G S Dell ldquoA Spreading-Activation Theory of Retrieval inSentence Productionrdquo Psychological Review vol 93 no 3 pp283ndash321 1986

[91] A Marantz ldquoNo escape from morphemes in morphologicalprocessingrdquo Language and Cognitive Processes vol 28 no 7 pp905ndash916 2013

[92] M Bozic W D Marslen-Wilson E A Stamatakis M HDavis and L K Tyler ldquoDifferentiating morphology formand meaning Neural correlates of morphological complexityrdquoCognitive Neuroscience vol 19 no 9 pp 1464ndash1475 2007

[93] R F Port andA P Leary ldquoAgainst formal phonologyrdquoLanguagevol 81 no 4 pp 927ndash964 2005

[94] J P Blevins ldquoStems and paradigmsrdquo Language vol 79 no 4 pp737ndash767 2003

[95] R Forsyth and D Holmes ldquoFeature-finding for test classifica-tionrdquo Literary and Linguistic Computing vol 11 no 4 pp 163ndash174 1996

[96] H Pham and R H Baayen ldquoVietnamese compounds showan anti-frequency effect in visual lexical decisionrdquo LanguageCognition and Neuroscience vol 30 no 9 pp 1077ndash1095 2015

[97] L Cohen and S Dehaene ldquoVentral and dorsal contributions toword readingrdquo inTheCognitive Neurosciences M S GazzanigaEd pp 789ndash804 Massachusetts Institute of Technology 2009

[98] E Keuleers P Lacey K Rastle and M Brysbaert ldquoThe BritishLexicon Project Lexical decision data for 28730 monosyllabicand disyllabic English wordsrdquo Behavior Research Methods vol44 no 1 pp 287ndash304 2012

[99] R K Wagner J K Torgesen and C A Rashotte ldquoDevel-opment of reading-related phonological processing abilitiesnew evidence of bidirectional causality from a latent variablelongitudinal studyrdquoDevelopmental Psychology vol 30 no 1 pp73ndash87 1994

[100] K F EWong andH-C Chen ldquoOrthographic and phonologicalprocessing in reading Chinese text Evidence from eye fixa-tionsrdquo Language and Cognitive Processes vol 14 no 5-6 pp461ndash480 1999

[101] R L Newman D Jared and C A Haigh ldquoDoes phonologyplay a role when skilled readers read high-frequency wordsEvidence from ERPsrdquo Language and Cognitive Processes vol 27no 9 pp 1361ndash1384 2012

[102] D Jared J Ashby S J Agauas and B A Levy ldquoPhonologicalactivation of word meanings in grade 5 readersrdquo Journal ofExperimental Psychology LearningMemory andCognition vol42 no 4 pp 524ndash541 2016

[103] T Bitan A Kaftory A Meiri-Leib Z Eviatar and O PelegldquoPhonological ambiguity modulates resolution of semanticambiguity during reading An fMRI study of Hebrewrdquo Neu-ropsychology vol 31 no 7 pp 759ndash777 2017

[104] D Jared and S Bainbridge ldquoReading homophone puns Evi-dence from eye trackingrdquo Canadian Journal of ExperimentalPsychology vol 71 no 1 pp 2ndash13 2017

[105] S Amenta M Marelli and S Sulpizio ldquoFrom sound tomeaning Phonology-to-Semantics mapping in visual wordrecognitionrdquo Psychonomic Bulletin amp Review vol 24 no 3 pp887ndash893 2017

[106] M Perrone-Bertolotti J Kujala J R Vidal et al ldquoHow silent issilent reading intracerebral evidence for top-down activationof temporal voice areas during readingrdquo The Journal of Neuro-science vol 32 no 49 pp 17554ndash17562 2012

[107] B Yao P Belin andC Scheepers ldquoSilent reading of direct versusindirect speech activates voice-selective areas in the auditorycortexrdquo Journal of Cognitive Neuroscience vol 23 no 10 pp3146ndash3152 2011

[108] M Coltheart B Curtis P Atkins and M Haller ldquoMod-els of Reading Aloud Dual-Route and Parallel-Distributed-Processing Approachesrdquo Psychological Review vol 100 no 4pp 589ndash608 1993

[109] M Coltheart ldquoModeling reading The dual-route approachrdquo inThe Science of Reading A Handbook pp 6ndash23 2005

38 Complexity

[110] M Pitt K Johnson E Hume S Kiesling and W RaymondldquoThe Buckeye corpus of conversational speech Labeling con-ventions and a test of transcriber reliabilityrdquo Speech Communi-cation vol 45 no 1 pp 89ndash95 2005

[111] K Johnson ldquoMassive reduction in conversational AmericanEnglishrdquo in Proceedings of the Spontaneous Speech Data andAnalysis Proceedings of the 1st Session of the 10th InternationalSymposium pp 29ndash54 The National International Institute forJapanese Language Tokyo Japan 2004

[112] C Phillips ldquoLevels of representation in the electrophysiology ofspeech perceptionrdquo Cognitive Science vol 25 no 5 pp 711ndash7312001

[113] R L Diehl A J Lotto and L L Holt ldquoSpeech perceptionrdquoAnnual Review of Psychology vol 55 pp 149ndash179 2004

[114] D Norris and J M McQueen ldquoShortlist B a Bayesian model ofcontinuous speech recognitionrdquo Psychological Review vol 115no 2 pp 357ndash395 2008

[115] S Hawkins ldquoRoles and representations of systematic finephonetic detail in speech understandingrdquo Journal of Phoneticsvol 31 no 3-4 pp 373ndash405 2003

[116] CCucchiarini andH Strik ldquoAutomatic Phonetic TranscriptionAn overviewrdquo in Proceedings of the 15th ICPhS pp 347ndash350Barcelona Spain 2003

[117] K JohnsonTheAuditoryPerceptual Basis for Speech Segmenta-tion Ohio State University Working Papers in Linguistics 1997

[118] H Fletcher ldquoAuditory patternsrdquo Reviews of Modern Physics vol12 no 1 pp 47ndash65 1940

[119] D Arnold ldquoAcousticndlcoder Coding sound files for use withndl R package version 101rdquo 2017

[120] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2016

[121] J Pickett and I Pollack ldquoIntelligibility of excerpts from fluentspeech Effects of rate of utterance and duration of excerptrdquoLanguage and Speech vol 6 pp 151ndash164 1963

[122] L Shockey ldquoPerception of reduced forms by non-native speak-ers of Englishrdquo in Sound Patterns of Spontaneous Speech DDuez Ed pp 97ndash100 ESCA Aix 1998

[123] M Ernestus H Baayen and R Schreuder ldquoThe recognition ofreduced word formsrdquo Brain and Language vol 81 no 1-3 pp162ndash173 2002

[124] R H Baayen C Shaoul J Willits and M Ramscar ldquoCom-prehension without segmentation a proof of concept withnaive discriminative learningrdquo Language Cognition and Neu-roscience vol 31 no 1 pp 106ndash128 2016

[125] C P Browman and L Goldstein ldquoArticulatory phonology Anoverviewrdquo Phonetica vol 49 no 3-4 pp 155ndash180 1992

[126] J A Tourville and F H Guenther ldquoThe DIVA model A neuraltheory of speech acquisition and productionrdquo Language andCognitive Processes vol 26 no 7 pp 952ndash981 2011

[127] G Csardi and T Nepusz ldquoThe igraph software package forcomplex network researchrdquo InterJournal Complex Systems vol1695 no 5 pp 1ndash9 2006

[128] M Ernestus and R H Baayen ldquoPredicting the unpredictableInterpreting neutralized segments in Dutchrdquo Language vol 79no 1 pp 5ndash38 2003

[129] R Weingarten G Nottbusch and U Will ldquoMorphemes sylla-bles and graphemes in written word productionrdquo inMultidisci-plinary Approaches to Speech Production T Pechmann and CHabel Eds pp 529ndash572 Mouton de Gruyter Berlin Germany2004

[130] R Bertram F E Toslashnnessen S Stromqvist J Hyona andP Niemi ldquoCascaded processing in written compound wordproductionrdquo Frontiers in Human Neuroscience vol 9 pp 1ndash102015

[131] T Cho ldquoEffects of morpheme boundaries on intergesturaltiming Evidence from Koreanrdquo Phonetica vol 58 no 3 pp129ndash162 2001

[132] S N Wood Generalized Additive Models Chapman ampHallCRC NY USA 2017

[133] M Seidenberg ldquoSublexical structures in visual word recogni-tion Access units or orthographic redundancyrdquo in Attentionand Performance XII M Coltheart Ed pp 245ndash264 LawrenceErlbaum Associates Hove 1987

[134] J M McQueen ldquoSegmentation of continuous speech usingphonotacticsrdquo Journal of Memory and Language vol 39 no 1pp 21ndash46 1998

[135] J B Hay Causes and Consequences of Word Structure Rout-ledge NY USA London UK 2003

[136] J B Hay ldquoFrom speech perception to morphology Affix-ordering revisitedrdquo Language vol 78 pp 527ndash555 2002

[137] J B Hay and R H Baayen ldquoPhonotactics parsing and produc-tivityrdquo Italian Journal of Linguistics vol 15 no 1 pp 99ndash1302003

[138] J Hay J Pierrehumbert and M Beckman ldquoSpeech perceptionwell-formedness and the statistics of the lexiconrdquo Papers inlaboratory phonology VI pp 58ndash74 2004

[139] M Ferro C Marzi and V Pirrelli ldquoA self-organizing modelof word storage and processing implications for morphologylearningrdquo Lingue e Linguaggio vol 10 no 2 pp 209ndash226 2011

[140] F ChersiM Ferro G Pezzulo andV Pirrelli ldquoTopological self-organization and prediction learning support both action andlexical chains in the brainrdquo Topics in Cognitive Science vol 6no 3 pp 476ndash491 2014

[141] V Pirrelli M Ferro and C Marzi ldquoComputational complexityof abstractive morphologyrdquo in Understanding and MeasuringMorphological Complexity M Bearman D Brown and G GCorbett Eds pp 141ndash166 Oxford University Press 2015

[142] C Marzi M Ferro and V Pirrelli ldquoIs inflectional irregularitydysfunctional to human processingrdquo in Abstract Booklet TheMental Lexicon 2018 V Kuperman Ed University of AlbertaEdmonton 2018

[143] E Keuleers M Stevens P Mandera and M Brysbaert ldquoWordknowledge in the crowd Measuring vocabulary size and wordprevalence in a massive online experimentrdquo The QuarterlyJournal of Experimental Psychology vol 68 no 8 pp 1665ndash16922015

[144] M Ernestus Voice Assimilation And Segment Reduction inCasual Dutch A Corpus-Based Study of the Phonology-PhoneticsInterface LOT Utrecht the Netherlands 2000

[145] R Kemps M Ernestus R Schreuder and H Baayen ldquoProcess-ing reducedword formsThe suffix restoration effectrdquoBrain andLanguage vol 90 no 1-3 pp 117ndash127 2004

[146] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) vol 1 pp 886ndash893 June 2005

[147] G Hickok ldquoThe architecture of speech production and the roleof the phoneme in speech processingrdquo Language Cognition andNeuroscience vol 29 no 1 pp 2ndash20 2014

[148] B V Tucker M Sims and R H Baayen Opposing Forceson Acoustic Duration Manuscript University of Alberta andUniversity of Tubingen 2018

Complexity 39

[149] I Plag J Homann and G Kunter ldquoHomophony and mor-phology The acoustics of word-final S in Englishrdquo Journal ofLinguistics vol 53 no 1 pp 181ndash216 2017

[150] F Tomaschek I Plag M Ernestus and R H BaayenModeling the Duration of Word-Final s in English withNaive Discriminative Learning Manuscript University ofSiegenTubingenNijmegen 2018

[151] R E Kalman ldquoA new approach to linear filtering and predictionproblemsrdquo Journal of Fluids Engineering vol 82 no 1 pp 35ndash451960

[152] B Russell ldquoOn denotingrdquo Mind vol 14 no 56 pp 479ndash4931905

[153] N Hornstein Logical Form FromGB toMinimalism Blackwell1995

[154] R Jackendoff Semantic StructuresMITPress Cambridge 1990[155] J J McCarthy ldquoA prosodic theory of non-concatenative mor-

phologyrdquo Linguistic Inquiry vol 12 pp 373ndash418 1981[156] A Ussishkin ldquoA Fixed Prosodic Theory of Nonconcatenative

Templaticmorphologyrdquo Natural Language amp Linguistic Theoryvol 23 no 1 pp 169ndash218 2005

[157] A Ussishkin ldquoAffix-favored contrast inequity and psycholin-guistic grounding for non-concatenative morphologyrdquo Mor-phology vol 16 no 1 pp 107ndash125 2006

[158] RW Young andWMorganTheNavajo Language A Grammarand Colloquial Dictionary University of New Mexico Press1980

[159] S C Levinson and A Majid ldquoThe island of time Yelı Dnye thelanguage of Rossel islandrdquo Frontiers in psychology vol 4 2013

[160] S C Levinson ldquoThe language of space in Yelı Dnyerdquo inGrammars of Space Explorations in Cognitive Diversity pp 157ndash203 Cambridge University Press 2006

[161] E Shafaei-Bajestan and R H Baayen ldquoWide Learning forAuditory Comprehensionrdquo in Proceedings of the Interspeech2018 2018

[162] J Hay and K Drager ldquoStuffed toys and speech perceptionrdquoLinguistics vol 48 no 4 pp 865ndash892 2010

[163] V Henri and C Henri ldquoOn our earliest recollections ofchildhoodrdquo Psychological Review vol 2 pp 215-216 1895

[164] S Freud ldquoChildhood and concealing memoriesrdquo in The BasicWritings of Sigmund Freud The Modern Library NY USA19051953

[165] P J Bauer and M Larkina ldquoThe onset of childhood amnesiain childhood A prospective investigation of the course anddeterminants of forgetting of early-life eventsrdquoMemory vol 22no 8 pp 907ndash924 2014

[166] M S Seidenberg and L M Gonnerman ldquoExplaining deriva-tional morphology as the convergence of codesrdquo Trends inCognitive Sciences vol 4 no 9 pp 353ndash361 2000

[167] M Coltheart K Rastle C Perry R Langdon and J ZieglerldquoDRC a dual route cascaded model of visual word recognitionand reading aloudrdquoPsychological Review vol 108 no 1 pp 204ndash256 2001

[168] M V Butz and E F Kutter How the Mind Comes into BeingIntroducing Cognitive Science from a Functional and Computa-tional Perspective Oxford University Press 2016

[169] M W Harm and M S Seidenberg ldquoPhonology readingacquisition and dyslexia Insights from connectionist modelsrdquoPsychological Review vol 106 no 3 pp 491ndash528 1999

[170] L M Gonnerman M S Seidenberg and E S AndersenldquoGraded semantic and phonological similarity effects in prim-ing Evidence for a distributed connectionist approach to

morphologyrdquo Journal of Experimental Psychology General vol136 no 2 pp 323ndash345 2007

[171] H Zhao O Gallo I Frosio and J Kautz ldquoLoss Functions forImage Restoration With Neural Networksrdquo IEEE Transactionson Computational Imaging vol 3 no 1 pp 47ndash57 2017

[172] C Clarke and P Luce ldquoPerceptual adaptation to speakercharacteristics Vot boundaries in stop voicing categorizationrdquoin Proceedings of the ISCA Workshop on Plasticity in SpeechPerception 2005

[173] DNorris JMMcQueen andACutler ldquoPerceptual learning inspeechrdquo Cognitive Psychology vol 47 no 2 pp 204ndash238 2003

[174] A Schweitzer and N Lewandowski ldquoSocial factors in conver-gence of f1 and f2 in spontaneous speechrdquo in Proceedings ofthe 10th International Seminar on Speech Production CologneGermany 2014

[175] M J Pickering and S Garrod ldquoToward a mechanistic psychol-ogy of dialoguerdquo Behavioral and Brain Sciences vol 27 no 2 pp169ndash190 2004

[176] C M Clarke and M F Garrett ldquoRapid adaptation to foreign-accented Englishrdquo The Journal of the Acoustical Society ofAmerica vol 116 no 6 pp 3647ndash3658 2004

[177] W J Levelt ldquoDie konnektionistische moderdquo Sprache UndKognition vol 10 no 2 pp 61ndash72 1991

[178] A Kostic ldquoInformational load constraints on processinginflected morphologyrdquo in Morphological Aspects of LanguageProcessing L B Feldman Ed Lawrence Erlbaum Inc Publish-ers New Jersey USA 1995

[179] J P BlevinsWord andParadigmMorphology OxfordUniversityPress 2016

[180] E Forstemann Uber Deutsche volksetymologie Zeitschrift furvergleichende Sprachforschung auf dem Gebiete des DeutschenGriechischen und Lateinischen 1852

[181] K Nault Morphological Therapy Protocal [Ph D thesis] Uni-versity of Alberta Edmonton 2010

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 9: The Discriminative Lexicon: A Unified Computational Model ...downloads.hindawi.com/journals/complexity/2019/4895891.pdf · as developed within computational linguistics and com-puter

Complexity 9

Table 1 Linear mixed model fit to paired associate learning scores with the correlation 119903 of the row vectors in S of the paired words aspredictor Treatment coding was used for factors For the youngest age group 119903 is not predictive but all other age groups show increasinglylarge slopes compared to the slope of the youngest age group The range of 119903 is [minus488 minus253] hence larger values for the coefficients of 119903 andits interactions imply worse performance on the PAL task

A parametric coefficients Estimate Std Error t-value p-valueintercept 36220 06822 53096 lt 00001119903 03058 01672 18293 00691age=39 02297 01869 12292 02207age=49 04665 01869 24964 00135age=59 06078 01869 32528 00014age=69 08029 01869 42970 lt 00001sex=male -01074 00230 -46638 lt 00001119903age=39 01167 00458 25490 00117119903age=49 02090 00458 45640 lt 00001119903age=59 02463 00458 53787 lt 00001119903age=69 03239 00458 70735 lt 00001B smooth terms edf Refdf F-value p-valuerandom intercepts word pair 178607 180000 1282283 lt 00001

Table 2 Summary of a Gaussian location scale additive model fitted to the similarity ratings in the MEN dataset with as predictor thecorrelation 119903 of wordrsquos semantic vectors in the Smatrix TPRS thin plate regression spline b theminimum standard deviation for the logb linkfunction Location parameter estimating the mean scale parameter estimating the variance through the logb link function (120578 = log(120590 minus 119887))A parametric coefficients Estimate Std Error t-value p-valueIntercept [location] 252990 01799 1406127 lt 00001Intercept [scale] 21297 00149 1430299 lt 00001B smooth terms edf Refdf F-value p-valueTPRS 119903 [location] 83749 88869 27662399 lt 00001TPRS 119903 [scale] 59333 70476 875384 lt 00001

words performance of our linear network with linguisticallymotivated lexomes is sufficiently high to serve as a basis forfurther analysis

323 Correlational Structure of Inflectional and DerivationalVectors Now that we have established that the semantic vec-tor space S obtainedwith perhaps the simplest possible error-driven learning algorithm discriminating between outcomesgiven multiple cues (1) indeed captures semantic similaritywe next consider how the semantic vectors of inflectional andderivational lexomes cluster in this space Figure 2 presents aheatmap for the correlation matrix of the functional lexomesthat we implemented as described in Section 31

Nominal and verbal inflectional lexomes as well as adver-bial ly cluster in the lower left corner and tend to be eithernot correlated with derivational lexomes (indicated by lightyellow) or to be negatively correlated (more reddish colors)Some inflectional lexomes however are interspersed withderivational lexomes For instance comparative superla-tive and perfective form a small subcluster togetherwithin the group of derivational lexomes

The derivational lexomes show for a majority of pairssmall positive correlationsand stronger correlations in thecase of the verb-forming derivational lexomes mis over

under and undo which among themselves show thevstrongest positive correlations of allThe inflectional lexomescreating abstract nouns ion ence ity and ness also form asubcluster In other words Figure 2 shows that there is somestructure to the distribution of inflectional and derivationallexomes in the semantic vector space

Derived words (but not inflected words) have their owncontent lexomes and hence their own row vectors in S Do thesemantic vectors of these words cluster according to the theirformatives To address this question we extracted the rowvectors from S for a total of 3500 derivedwords for 31 differentformatives resulting in a 3500 times 23562 matrix 119863 sliced outof S To this matrix we added the semantic vectors for the 31formatives We then used linear discriminant analysis (LDA)to predict a lexomersquos formative from its semantic vector usingthe lda function from the MASS package [82] (for LDA towork we had to remove columns from D with very smallvariance as a consequence the dimension of the matrix thatwas the input to LDA was 3531 times 814) LDA accuracy for thisclassification task with 31 possible outcomes was 072 All 31derivational lexomes were correctly classified

When the formatives are randomly permuted thus break-ing the relation between the semantic vectors and theirformatives LDA accuracy was on average 0528 (range across

10 Complexity

SGPR

ESEN

TPE

RSIS

TEN

CE PLCO

NTI

NU

OU

SPA

ST LYG

ENPE

RSO

N3

PASS

IVE

CAN

FUTU

REU

ND

OU

ND

ERO

VER MIS

ISH

SELF IZ

EIN

STRU

MEN

TD

IS IST

OU

GH

T EE ISM

SUB

COM

PARA

TIV

ESU

PERL

ATIV

EPE

RFEC

TIV

EPE

RSO

N1

ION

ENCE IT

YN

ESS Y

SHA

LLO

RDIN

AL

OU

TAG

AIN

FUL

LESS

ABL

EAG

ENT

MEN

T ICO

US

NO

TIV

E

SGPRESENTPERSISTENCEPLCONTINUOUSPASTLYGENPERSON3PASSIVECANFUTUREUNDOUNDEROVERMISISHSELFIZEINSTRUMENTDISISTOUGHTEEISMSUBCOMPARATIVESUPERLATIVEPERFECTIVEPERSON1IONENCEITYNESSYSHALLORDINALOUTAGAINFULLESSABLEAGENTMENTICOUSNOTIVE

minus02 0 01 02Value

020

060

010

00Color Key

and Histogram

Cou

nt

Figure 2 Heatmap for the correlation matrix of the row vectors in S of derivational and inflectional function lexomes (individual words willbecome legible by zooming in on the figure at maximal magnification)

10 permutations 0519ndash0537) From this we conclude thatderived words show more clustering in the semantic space ofS than can be expected under randomness

How are the semantic vectors of derivational lexomespositioned with respect to the clusters of their derived wordsTo address this question we constructed heatmaps for thecorrelation matrices of the pertinent semantic vectors Anexample of such a heatmap is presented for ness in Figure 3Apart from the existence of clusters within the cluster of nesscontent lexomes it is striking that the ness lexome itself isfound at the very left edge of the dendrogram and at the very

left column and bottom row of the heatmapThe color codingindicates that surprisingly the ness derivational lexome isnegatively correlated with almost all content lexomes thathave ness as formative Thus the semantic vector of nessis not a prototype at the center of its cloud of exemplarsbut an antiprototype This vector is close to the cloud ofsemantic vectors but it is outside its periphery This patternis not specific to ness but is found for the other derivationallexomes as well It is intrinsic to our model

The reason for this is straightforward During learningalthough for a derived wordrsquos lexome 119894 and its derivational

Complexity 11

NES

Ssle

eple

ssne

ssas

sert

iven

ess

dire

ctne

ssso

undn

ess

bign

ess

perm

issiv

enes

sap

prop

riate

ness

prom

ptne

sssh

rew

dnes

sw

eakn

ess

dark

ness

wild

erne

ssbl

ackn

ess

good

ness

kind

ness

lone

lines

sfo

rgiv

enes

ssti

llnes

sco

nsci

ousn

ess

happ

ines

ssic

knes

saw

aren

ess

blin

dnes

sbr

ight

ness

wild

ness

mad

ness

grea

tnes

sso

ftnes

sth

ickn

ess

tend

erne

ssbi

ttern

ess

effec

tiven

ess

hard

ness

empt

ines

sw

illin

gnes

sfo

olish

ness

eage

rnes

sne

atne

sspo

liten

ess

fresh

ness

high

ness

nerv

ousn

ess

wet

ness

unea

sines

slo

udne

ssco

olne

ssid

lene

ssdr

ynes

sbl

uene

ssda

mpn

ess

fond

ness

liken

ess

clean

lines

ssa

dnes

ssw

eetn

ess

read

ines

scle

vern

ess

noth

ingn

ess

cold

ness

unha

ppin

ess

serio

usne

ssal

ertn

ess

light

ness

uglin

ess

flatn

ess

aggr

essiv

enes

sse

lfminusco

nsci

ousn

ess

ruth

lessn

ess

quie

tnes

sva

guen

ess

shor

tnes

sbu

sines

sho

lines

sop

enne

ssfit

ness

chee

rful

ness

earn

estn

ess

right

eous

ness

unpl

easa

ntne

ssqu

ickn

ess

corr

ectn

ess

disti

nctiv

enes

sun

cons

ciou

snes

sfie

rcen

ess

orde

rline

ssw

hite

ness

talln

ess

heav

ines

sw

icke

dnes

sto

ughn

ess

lawle

ssne

ssla

zine

ssnu

mbn

ess

sore

ness

toge

ther

ness

resp

onsiv

enes

scle

arne

ssne

arne

ssde

afne

ssill

ness

dizz

ines

sse

lfish

ness

fatn

ess

rest

lessn

ess

shyn

ess

richn

ess

thor

ough

ness

care

less

ness

tired

ness

fairn

ess

wea

rines

stig

htne

ssge

ntle

ness

uniq

uene

ssfr

iend

lines

sva

stnes

sco

mpl

eten

ess

than

kful

ness

slow

ness

drun

kenn

ess

wre

tche

dnes

sfr

ankn

ess

exac

tnes

she

lples

snes

sat

trac

tiven

ess

fulln

ess

roug

hnes

sm

eann

ess

hope

lessn

ess

drow

sines

ssti

ffnes

sclo

sene

ssgl

adne

ssus

eful

ness

firm

ness

rude

ness

bold

ness

shar

pnes

sdu

llnes

sw

eigh

tless

ness

stra

ngen

ess

smoo

thne

ss

NESSsleeplessnessassertivenessdirectnesssoundnessbignesspermissivenessappropriatenesspromptnessshrewdnessweaknessdarknesswildernessblacknessgoodnesskindnesslonelinessforgivenessstillnessconsciousnesshappinesssicknessawarenessblindnessbrightnesswildnessmadnessgreatnesssoftnessthicknesstendernessbitternesseffectivenesshardnessemptinesswillingnessfoolishnesseagernessneatnesspolitenessfreshnesshighnessnervousnesswetnessuneasinessloudnesscoolnessidlenessdrynessbluenessdampnessfondnesslikenesscleanlinesssadnesssweetnessreadinessclevernessnothingnesscoldnessunhappinessseriousnessalertnesslightnessuglinessflatnessaggressivenessselfminusconsciousnessruthlessnessquietnessvaguenessshortnessbusinessholinessopennessfitnesscheerfulnessearnestnessrighteousnessunpleasantnessquicknesscorrectnessdistinctivenessunconsciousnessfiercenessorderlinesswhitenesstallnessheavinesswickednesstoughnesslawlessnesslazinessnumbnesssorenesstogethernessresponsivenessclearnessnearnessdeafnessillnessdizzinessselfishnessfatnessrestlessnessshynessrichnessthoroughnesscarelessnesstirednessfairnesswearinesstightnessgentlenessuniquenessfriendlinessvastnesscompletenessthankfulnessslownessdrunkennesswretchednessfranknessexactnesshelplessnessattractivenessfullnessroughnessmeannesshopelessnessdrowsinessstiffnessclosenessgladnessusefulnessfirmnessrudenessboldnesssharpnessdullnessweightlessnessstrangenesssmoothness

Color Keyand Histogram

0 05minus05

Value

020

0040

0060

0080

00C

ount

Figure 3 Heatmap for the correlation matrix of lexomes for content words with ness as well as the derivational lexome of ness itself Thislexome is found at the very left edge of the dendrograms and is negatively correlated with almost all content lexomes (Individual words willbecome legible by zooming in on the figure at maximal magnification)

lexome ness are co-present cues the derivational lexomeoccurs in many other words 119895 and each time anotherword 119895 is encountered weights are reduced from ness to119894 As this happens for all content lexomes the derivationallexome is during learning slowly but steadily discriminatedaway from its content lexomes We shall see that this is animportant property for our model to capture morphologicalproductivity for comprehension and speech production

When the additive model of Mitchell and Lapata [67]is used to construct a semantic vector for ness ie when

the average vector is computed for the vectors obtained bysubtracting the vector of the derived word from that of thebase word the result is a vector that is embedded insidethe cluster of derived vectors and hence inherits semanticidiosyncracies from all these derived words

324 Semantic Plausibility and Transparency Ratings forDerived Words In order to obtain further evidence for thevalidity of inflectional and derivational lexomes we re-analyzed the semantic plausibility judgements for word pairs

12 Complexity

consisting of a base and a novel derived form (eg accentaccentable) reported byMarelli and Baroni [60] and availableat httpcliccimecunitnitcomposesFRACSSThe number of pairs available to us for analysis 236 wasrestricted compared to the original 2559 pairs due to theconstraints that the base words had to occur in the tasacorpus Furthermore we also explored the role of wordsrsquoemotional valence arousal and dominance as availablein Warriner et al [83] which restricted the number ofitems even further The reason for doing so is that humanjudgements such as those of age of acquisition may reflectdimensions of emotion [59]

A measure derived from the semantic vectors of thederivational lexomes activation diversity and the L1-normof the semantic vector (the sum of the absolute values of thevector elements ie its city-block distance) turned out tobe predictive for these plausibility ratings as documented inTable 3 and the left panel of Figure 4 Activation diversity is ameasure of lexicality [58] In an auditory word identificationtask for instance speech that gives rise to a low activationdiversity elicits fast rejections whereas speech that generateshigh activation diversity elicits higher acceptance rates but atthe cost of longer response times [18]

Activation diversity interacted with word length Agreater word length had a strong positive effect on ratedplausibility but this effect progressively weakened as acti-vation diversity increases In turn activation diversity hada positive effect on plausibility for shorter words and anegative effect for longer words Apparently the evidencefor lexicality that comes with higher L1-norms contributespositively when there is little evidence coming from wordlength As word length increases and the relative contributionof the formative in theword decreases the greater uncertaintythat comes with higher lexicality (a greater L1-norm impliesmore strong links to many other lexomes) has a detrimentaleffect on the ratings Higher arousal scores also contributedto higher perceived plausibility Valence and dominance werenot predictive and were therefore left out of the specificationof themodel reported here (word frequency was not includedas predictor because all derived words are neologisms withzero frequency addition of base frequency did not improvemodel fit 119901 gt 091)

The degree to which a complex word is semanticallytransparent with respect to its base word is of both practi-cal and theoretical interest An evaluation of the semantictransparency of complex words using transformation matri-ces is developed in Marelli and Baroni [60] Within thepresent framework semantic transparency can be examinedstraightforwardly by comparing the correlations of (i) thesemantic vectors of the base word to which the semanticvector of the affix is added with (ii) the semantic vector ofthe derived word itself The more distant the two vectors arethe more negative their correlation should be This is exactlywhat we find For ness for instance the six most negativecorrelations are business (119903 = minus066) effectiveness (119903 = minus051)awareness (119903 = minus050) loneliness (119903 = minus045) sickness (119903 =minus044) and consciousness (119903 = minus043) Although ness canhave an anaphoric function in discourse [84] words such asbusiness and consciousness have a much deeper and richer

semantics than just reference to a previously mentioned stateof affairs A simple comparison of the wordrsquos actual semanticvector in S (derived from the TASA corpus) and its semanticvector obtained by accumulating evidence over base and affix(ie summing the vectors of base and affix) brings this outstraightforwardly

However evaluating correlations for transparency isprone to researcher bias We therefore also investigated towhat extent the semantic transparency ratings collected byLazaridou et al [68] can be predicted from the activationdiversity of the semantic vector of the derivational lexomeThe original dataset of Lazaridou and colleagues comprises900 words of which 330 meet the criterion of occurringmore than 8 times in the tasa corpus and for which wehave semantic vectors available The summary of a Gaussianlocation-scale additivemodel is given in Table 4 and the rightpanel of Figure 4 visualizes the interaction of word lengthby activation diversity Although modulated by word lengthoverall transparency ratings decrease as activation diversityis increased Apparently the stronger the connections aderivational lexome has with other lexomes the less clearit becomes what its actual semantic contribution to a novelderived word is In other words under semantic uncertaintytransparency ratings decrease

4 Comprehension

Now that we have established that the present semanticvectors make sense even though they are based on a smallcorpus we next consider a comprehension model that hasform vectors as input and semantic vectors as output (a pack-age for R implementing the comprehension and productionalgorithms of linear discriminative learning is available athttpwwwsfsuni-tuebingendesimhbaayenpublicationsWpmWithLdl 10targz The package isdescribed in Baayen et al [85]) We begin with introducingthe central concepts underlying mappings from form tomeaning We then discuss visual comprehension and thenturn to auditory comprehension

41 Setting Up the Mapping Let C denote the cue matrix amatrix that specifies for each word (rows) the form cues ofthat word (columns) For a toy lexicon with the words onetwo and three the Cmatrix is

C

= wV wVn Vn tu tu Tr Tri rionetwothree

( 100100

100010

010001

001001 ) (12)

where we use the disc keyboard phonetic alphabet (theldquoDIstinct SingleCharacterrdquo representationwith one characterfor each phoneme was introduced by CELEX [86]) fortriphones the symbol denotes the word boundary Supposethat the semantic vectors for these words are the row vectors

Complexity 13

26 27

28

29

29

3 3

31

31

32

32

33

33

34

34

35

35

3

3

35

4

45 36

38

40

42

44

Activ

atio

n D

iver

sity

of th

e Der

ivat

iona

l Lex

ome

8 10 12 14 16 186Word Length

36

38

40

42

44

Activ

atio

n D

iver

sity

of th

e Der

ivat

iona

l Lex

ome

6 8 10 12 144Word Length

Figure 4 Interaction of word length by activation diversity in GAMs fitted to plausibility ratings for complex words (left) and to semantictransparency ratings (right)

Table 3 GAM fitted to plausibility ratings for derivational neologisms with the derivational lexomes able again agent ist less and notdata from Marelli and Baroni [60]

A parametric coefficients Estimate Std Error t-value p-valueIntercept -18072 27560 -06557 05127Word Length 05901 02113 27931 00057Activation Diversity 11221 06747 16632 00976Arousal 03847 01521 25295 00121Word Length times Activation Diversity -01318 00500 -26369 00089B smooth terms edf Refdf F-value p-valueRandom Intercepts Affix 33610 40000 556723 lt 00001

of the following matrix S

S = one two threeonetwothree

( 100201031001

040110 ) (13)

We are interested in a transformation matrix 119865 such that

CF = S (14)

The transformation matrix is straightforward to obtain LetC1015840 denote the Moore-Penrose generalized inverse of Cavailable in R as the ginv function in theMASS package [82]Then

F = C1015840S (15)

For the present example

F =one two three

wVwVnVntutuTrTriri

((((((((((((

033033033010010003003003

010010010050050003003003

013013013005005033033033

))))))))))))

(16)

and for this simple example CF is exactly equal to SIn the remainder of this section we investigate how well

this very simple end-to-end model performs for visual wordrecognition as well as auditory word recognition For visualword recognition we use the semantic matrix S developed inSection 3 but we consider two different cue matrices C oneusing letter trigrams (following [58]) and one using phonetrigramsA comparison of the performance of the twomodels

14 Complexity

Table 4 Gaussian location-scale additive model fitted to the semantic transparency ratings for derived words te tensor product smooth ands thin plate regression spline

A parametric coefficients Estimate Std Error t-value p-valueintercept [location] 55016 02564 214537 lt 00001intercept [scale] -04135 00401 -103060 lt 00001B smooth terms edf Refdf F-value p-valuete(activation diversity word length) [location] 53879 62897 237798 00008s(derivational lexome) [location] 143180 150000 3806254 lt 00001s(activation diversity) [scale] 21282 25993 284336 lt 00001s(word length) [scale] 17722 21403 1327311 lt 00001

sheds light on the role of wordsrsquo ldquosound imagerdquo on wordrecognition in reading (cf [87ndash89] for phonological effectsin visual word Recognition) For auditory word recognitionwe make use of the acoustic features developed in Arnold etal [18]

Although the features that we selected as cues are basedon domain knowledge they do not have an ontological statusin our framework in contrast to units such as phonemes andmorphemes in standard linguistic theories In these theoriesphonemes and morphemes are Russelian atomic units offormal phonological and syntactic calculi and considerableresearch has been directed towards showing that these unitsare psychologically real For instance speech errors involvingmorphemes have been interpreted as clear evidence for theexistence in the mind of morphemes [90] Research hasbeen directed at finding behavioral and neural correlates ofphonemes and morphemes [6 91 92] ignoring fundamentalcriticisms of both the phoneme and the morpheme astheoretical constructs [25 93 94]

The features that we use for representing aspects of formare heuristic features that have been selected or developedprimarily because they work well as discriminators We willgladly exchange the present features for other features if theseother features can be shown to afford higher discriminativeaccuracy Nevertheless the features that we use are groundedin domain knowledge For instance the letter trigrams usedfor modeling reading (see eg[58]) are motivated by thefinding in stylometry that letter trigrams are outstandingfeatures for discriminating between authorial hands [95] (inwhat follows we work with letter triplets and triphoneswhich are basically contextually enriched letter and phoneunits For languages with strong phonotactic restrictionssuch that the syllable inventory is quite small (eg Viet-namese) digraphs work appear to work better than trigraphs[96] Baayen et al [85] show that working with four-gramsmay enhance performance and current work in progress onmany other languages shows that for highly inflecting lan-guages with long words 4-grams may outperform 3-gramsFor computational simplicity we have not experimented withmixtures of units of different length nor with algorithmswithwhich such units might be determined) Letter n-grams arealso posited by Cohen and Dehaene [97] for the visual wordform system at the higher endof a hierarchy of neurons tunedto increasingly large visual features of words

Triphones the units that we use for representing thelsquoacoustic imagersquo or lsquoauditory verbal imageryrsquo of canonicalword forms have the same discriminative potential as lettertriplets but have as additional advantage that they do betterjustice compared to phonemes to phonetic contextual inter-dependencies such as plosives being differentiated primarilyby formant transitions in adjacent vowels The acousticfeatures that we use for modeling auditory comprehensionto be introduced in more detail below are motivated in partby the sensitivity of specific areas on the cochlea to differentfrequencies in the speech signal

Evaluation of model predictions proceeds by comparingthe predicted semantic vector s obtained by multiplying anobserved cue vector c with the transformation matrix F (s =cF) with the corresponding target row vector s of S A word 119894is counted as correctly recognized when s119894 is most stronglycorrelated with the target semantic vector s119894 of all targetvectors s119895 across all words 119895

Inflected words do not have their own semantic vectors inS We therefore created semantic vectors for inflected wordsby adding the semantic vectors of stem and affix and addedthese as additional row vectors to S before calculating thetransformation matrix F

We note here that there is only one transformationmatrix ie one discrimination network that covers allaffixes inflectional and derivational as well as derived andmonolexomic (simple) words This approach contrasts withthat of Marelli and Baroni [60] who pair every affix with itsown transformation matrix

42 Visual Comprehension For visual comprehension wefirst consider a model straightforwardly mapping form vec-tors onto semantic vectors We then expand the model withan indirect route first mapping form vectors onto the vectorsfor the acoustic image (derived from wordsrsquo triphone repre-sentations) and thenmapping the acoustic image vectors ontothe semantic vectors

The dataset on which we evaluated our models com-prised 3987 monolexomic English words 6595 inflectedvariants of monolexomic words and 898 derived words withmonolexomic base words to a total of 11480 words Thesecounts follow from the simultaneous constraints of (i) a wordappearing with sufficient frequency in TASA (ii) the wordbeing available in the British Lexicon Project (BLP [98]) (iii)

Complexity 15

the word being available in the CELEX database and theword being of the abovementioned morphological type Inthe present study we did not include inflected variants ofderived words nor did we include compounds

421 The Direct Route Straight from Orthography to Seman-tics To model single word reading as gauged by the visuallexical decision task we used letter trigrams as cues In ourdataset there were a total of 3465 unique trigrams resultingin a 11480 times 3465 orthographic cue matrix C The semanticvectors of the monolexomic and derived words were takenfrom the semantic weight matrix described in Section 3 Forinflected words semantic vectors were obtained by summa-tion of the semantic vectors of base words and inflectionalfunctions For the resulting semantic matrix S we retainedthe 5030 column vectors with the highest variances settingthe cutoff value for the minimal variance to 034times10minus7 FromS and C we derived the transformation matrix F which weused to obtain estimates s of the semantic vectors s in S

For 59 of the words s had the highest correlation withthe targeted semantic vector s (for the correctly predictedwords the mean correlation was 083 and the median was086 With respect to the incorrectly predicted words themean and median correlation were both 048) The accuracyobtained with naive discriminative learning using orthog-onal lexomes as outcomes instead of semantic vectors was27Thus as expected performance of linear discriminationlearning (ldl) is substantially better than that of naivediscriminative learning (ndl)

To assess whether this model is productive in the sensethat it canmake sense of novel complex words we considereda separate set of inflected and derived words which were notincluded in the original dataset For an unseen complexwordboth base word and the inflectional or derivational functionappeared in the training set but not the complex word itselfThe network F therefore was presented with a novel formvector which it mapped onto a novel vector in the semanticspace Recognition was successful if this novel vector wasmore strongly correlated with the semantic vector obtainedby summing the semantic vectors of base and inflectional orderivational function than with any other semantic vector

Of 553 unseen inflected words 43 were recognized suc-cessfully The semantic vectors predicted from their trigramsby the network F were overall well correlated in the meanwith the targeted semantic vectors of the novel inflectedwords (obtained by summing the semantic vectors of theirbase and inflectional function) 119903 = 067 119901 lt 00001The predicted semantic vectors also correlated well with thesemantic vectors of the inflectional functions (119903 = 061 119901 lt00001) For the base words the mean correlation dropped to119903 = 028 (119901 lt 00001)

For unseen derivedwords (514 in total) we also calculatedthe predicted semantic vectors from their trigram vectorsThe resulting semantic vectors s had moderate positivecorrelations with the semantic vectors of their base words(119903 = 040 119901 lt 00001) but negative correlations withthe semantic vectors of their derivational functions (119903 =minus013 119901 lt 00001) They did not correlate with the semantic

vectors obtained by summation of the semantic vectors ofbase and affix (119903 = 001 119901 = 056)

The reduced correlations for the derived words as com-pared to those for the inflected words likely reflects tosome extent that the model was trained on many moreinflected words (in all 6595) than derived words (898)whereas the number of different inflectional functions (7)was much reduced compared to the number of derivationalfunctions (24) However the negative correlations of thederivational semantic vectors with the semantic vectors oftheir derivational functions fit well with the observation inSection 323 that derivational functions are antiprototypesthat enter into negative correlations with the semantic vectorsof the corresponding derived words We suspect that theabsence of a correlation of the predicted vectors with thesemantic vectors obtained by integrating over the vectorsof base and derivational function is due to the semanticidiosyncracies that are typical for derivedwords For instanceausterity can denote harsh discipline or simplicity or apolicy of deficit cutting or harshness to the taste Anda worker is not just someone who happens to work butsomeone earning wages or a nonreproductive bee or waspor a thread in a computer program Thus even within theusages of one and the same derived word there is a lotof semantic heterogeneity that stands in stark contrast tothe straightforward and uniform interpretation of inflectedwords

422 The Indirect Route from Orthography via Phonology toSemantics Although reading starts with orthographic inputit has been shown that phonology actually plays a role duringthe reading process as well For instance developmentalstudies indicate that childrenrsquos reading development can bepredicted by their phonological abilities [87 99] Evidence ofthe influence of phonology on adultsrsquo reading has also beenreported [89 100ndash105] As pointed out by Perrone-Bertolottiet al [106] silent reading often involves an imagery speechcomponent we hear our own ldquoinner voicerdquo while readingSince written words produce a vivid auditory experiencealmost effortlessly they argue that auditory verbal imageryshould be part of any neurocognitive model of readingInterestingly in a study using intracranial EEG recordingswith four epileptic neurosurgical patients they observed thatsilent reading elicits auditory processing in the absence ofany auditory stimulation suggesting that auditory images arespontaneously evoked during reading (see also [107])

To explore the role of auditory images in silent readingwe operationalized sound images bymeans of phone trigrams(in our model phone trigrams are independently motivatedas intermediate targets of speech production see Section 5for further discussion) We therefore ran the model againwith phone trigrams instead of letter triplets The semanticmatrix S was exactly the same as above However a newtransformation matrix F was obtained for mapping a 11480times 5929 cue matrix C of phone trigrams onto S To our sur-prise the triphone model outperformed the trigram modelsubstantially Overall accuracy rose to 78 an improvementof almost 20

16 Complexity

In order to integrate this finding into our model weimplemented an additional network K that maps ortho-graphic vectors (coding the presence or absence of trigrams)onto phonological vectors (indicating the presence or absenceof triphones) The result is a dual route model for (silent)reading with a first route utilizing a network mappingstraight from orthography to semantics and a secondindirect route utilizing two networks one mapping fromtrigrams to triphones and the other subsequently mappingfrom triphones to semantics

The network K which maps trigram vectors onto tri-phone vectors provides good support for the relevant tri-phones but does not provide guidance as to their orderingUsing a graph-based algorithm detailed in the Appendix (seealso Section 52) we found that for 92 of the words thecorrect sequence of triphones jointly spanning the wordrsquossequence of letters received maximal support

We then constructed amatrix Twith the triphone vectorspredicted from the trigrams by networkK and defined a newnetwork H mapping these predicted triphone vectors ontothe semantic vectors SThemean correlation for the correctlyrecognized words was 090 and the median correlation was094 For words that were not recognized correctly the meanand median correlation were 045 and 043 respectively Wenow have for each word two estimated semantic vectors avector s1 obtained with network F of the first route and avector s2 obtained with network H from the auditory targetsthat themselves were predicted by the orthographic cuesThemean of the correlations of these pairs of vectors was 119903 = 073(119901 lt 00001)

To assess to what extent the networks that we definedare informative for actual visual word recognition we madeuse of the reaction times in the visual lexical decision taskavailable in the BLPAll 11480words in the current dataset arealso included in BLPWe derived threemeasures from each ofthe two networks

The first measure is the sum of the L1-norms of s1 ands2 to which we will refer as a wordrsquos total activation diversityThe total activation diversity is an estimate of the support fora wordrsquos lexicality as provided by the two routes The secondmeasure is the correlation of s1 and s2 to which we will referas a wordrsquos route congruency The third measure is the L1-norm of a lexomersquos column vector in S to which we referas its prior This last measure has previously been observedto be a strong predictor for lexical decision latencies [59] Awordrsquos prior is an estimate of a wordrsquos entrenchment and prioravailability

We fitted a generalized additive model to the inverse-transformed response latencies in the BLP with these threemeasures as key covariates of interest including as controlpredictors word length andword type (derived inflected andmonolexomic with derived as reference level) As shown inTable 5 inflected words were responded to more slowly thanderived words and the same holds for monolexomic wordsalbeit to a lesser extent Response latencies also increasedwith length Total activation diversity revealed a U-shapedeffect (Figure 5 left panel) For all but the lowest totalactivation diversities we find that response times increase as

total activation diversity increases This result is consistentwith previous findings for auditory word recognition [18]As expected given the results of Baayen et al [59] the priorwas a strong predictor with larger priors affording shorterresponse times There was a small effect of route congruencywhich emerged in a nonlinear interaction with the priorsuch that for large priors a greater route congruency affordedfurther reduction in response time (Figure 5 right panel)Apparently when the two routes converge on the samesemantic vector uncertainty is reduced and a faster responsecan be initiated

For comparison the dual-route model was implementedwith ndl as well Three measures were derived from thenetworks and used to predict the same response latenciesThese measures include activation activation diversity andprior all of which have been reported to be reliable predictorsfor RT in visual lexical decision [54 58 59] However sincehere we are dealing with two routes the three measures canbe independently derived from both routes We thereforesummed up the measures of the two routes obtaining threemeasures total activation total activation diversity and totalprior With word type and word length as control factorsTable 6 shows that thesemeasures participated in a three-wayinteraction presented in Figure 6 Total activation showed aU-shaped effect on RT that is increasingly attenuated as totalactivation diversity is increased (left panel) Total activationalso interacted with the total prior (right panel) For mediumrange values of total activation RTs increased with totalactivation and as expected decreased with the total priorThecenter panel shows that RTs decrease with total prior formostof the range of total activation diversity and that the effect oftotal activation diversity changes sign going from low to highvalues of the total prior

Model comparison revealed that the generalized additivemodel with ldl measures (Table 5) provides a substantiallyimproved fit to the data compared to the GAM using ndlmeasures (Table 6) with a difference of no less than 65101AIC units At the same time the GAM based on ldl is lesscomplex

Given the substantially better predictability of ldl mea-sures on human behavioral data one remaining question iswhether it is really the case that two routes are involved insilent reading After all the superior accuracy of the secondroute might be an artefact of a simple machine learningtechnique performing better for triphones than for trigramsThis question can be addressed by examining whether thefit of the GAM summarized in Table 5 improves or worsensdepending on whether the activation diversity of the first orthe second route is taken out of commission

When the GAM is provided access to just the activationdiversity of s2 the AIC increased by 10059 units Howeverwhen themodel is based only on the activation diversity of s1themodel fit increased by no less than 14790 AIC units Fromthis we conclude that at least for the visual lexical decisionlatencies in the BLP the second route first mapping trigramsto triphones and then mapping triphones onto semanticvectors plays the more important role

The superiority of the second route may in part bedue to the number of triphone features being larger

Complexity 17

minus0295

minus00175

026

part

ial e

ffect

1 2 3 4 50Total activation diversity

000

005

010

015

Part

ial e

ffect

05

10

15

20

Prio

r

02 04 06 08 1000Selfminuscorrelation

Figure 5 The partial effects of total activation diversity (left) and the interaction of route congruency and prior (right) on RT in the BritishLexicon Project

Table 5 Summary of a generalized additive model fitted to response latencies in visual lexical decision using measures based on ldl s thinplate regression spline smooth te tensor product smooth

A parametric coefficients Estimate Std Error t-value p-valueintercept -17774 00087 -205414 lt0001word typeInflected 01110 00059 18742 lt0001word typeMonomorphemic 00233 00054 4322 lt0001word length 00117 00011 10981 lt0001B smooth terms edf Refdf F p-values(total activation diversity) 6002 7222 236 lt0001te(route congruency prior) 14673 17850 2137 lt0001

02 06 10 14

NDL total activation

minus0171

0031

0233

part

ial e

ffect

minus4 minus2 0 2 4

10

15

20

25

30

35

40

NDL total prior

ND

L to

tal a

ctiv

atio

n di

vers

ity

minus0416

01915

0799

part

ial e

ffect

02 06 10 14

minus4

minus2

02

4

NDL total activation

ND

L to

tal p

rior

minus0158

01635

0485

part

ial e

ffect

10

15

20

25

30

35

40

ND

L to

tal a

ctiv

atio

n di

vers

ity

Figure 6The interaction of total activation and total activation diversity (left) of total prior by total activation diversity (center) and of totalactivation by total prior (right) on RT in the British Lexicon Project based on ndl

18 Complexity

Table 6 Summary of a generalized additive model fitted to response latencies in visual lexical decision using measures from ndl te tensorproduct smooth

A parametric coefficients Estimate Std Error t-value p-valueintercept -16934 00086 -195933 lt0001word typeInflected 00286 00052 5486 lt0001word typeMonomorphemic 00042 00055 0770 = 4word length 00066 00011 5893 lt0001B smooth terms edf Refdf F p-valuete(activation activation diversity prior) 4134 4972 1084 lt0001

than the number of trigram features (3465 trigrams ver-sus 5929 triphones) More features which mathematicallyamount to more predictors enable more precise map-pings Furthermore the heavy use made in English ofletter sequences such as ough (with 10 different pro-nunciations httpswwwdictionarycomesough)reduces semantic discriminability compared to the corre-sponding spoken forms It is noteworthy however that thebenefits of the triphone-to-semantics mapping are possibleonly thanks to the high accuracy with which orthographictrigram vectors are mapped onto phonological triphonevectors (92)

It is unlikely that the lsquophonological routersquo is always dom-inant in silent reading Especially in fast lsquodiagonalrsquo readingthe lsquodirect routersquomay bemore dominantThere is remarkablealthough for the present authors unexpected convergencewith the dual route model for reading aloud of Coltheartet al [108] and Coltheart [109] However while the text-to-phonology route of their model has as primary function toexplain why nonwords can be pronounced our results showthat both routes can actually be active when silently readingreal words A fundamental difference is however that in ourmodel wordsrsquo semantics play a central role

43 Auditory Comprehension For the modeling of readingwe made use of letter trigrams as cues These cues abstractaway from the actual visual patterns that fall on the retinapatterns that are already transformed at the retina beforebeing sent to the visual cortex Our hypothesis is that lettertrigrams represent those high-level cells or cell assemblies inthe visual system that are critical for reading andwe thereforeleave the modeling possibly with deep learning networksof how patterns on the retina are transformed into lettertrigrams for further research

As a consequence of the high level of abstraction of thetrigrams a word form is represented by a unique vectorspecifying which of a fixed set of letter trigrams is present inthe word Although one might consider modeling auditorycomprehension with phone triplets (triphones) replacing theletter trigrams of visual comprehension such an approachwould not do justice to the enormous variability of actualspeech Whereas the modeling of reading printed wordscan depart from the assumption that the pixels of a wordrsquosletters on a computer screen are in a fixed configurationindependently of where the word is shown on the screenthe speech signal of the same word type varies from token

to token as illustrated in Figure 7 for the English wordcrisis A survey of the Buckeye corpus [110] of spontaneousconversations recorded at Columbus Ohio [111] indicatesthat around 5 of the words are spoken with one syllablemissing and that a little over 20 of words have at least onephone missing

It is widely believed that the phoneme as an abstractunit of sound is essential for coming to grips with thehuge variability that characterizes the speech signal [112ndash114]However the phoneme as theoretical linguistic constructis deeply problematic [93] and for many spoken formscanonical phonemes do not do justice to the phonetics ofthe actual sounds [115] Furthermore if words are defined assequences of phones the problem arises what representationsto posit for words with two ormore reduced variants Addingentries for reduced forms to the lexicon turns out not toafford better overal recognition [116] Although exemplarmodels have been put forward to overcome this problem[117] we take a different approach here and following Arnoldet al [18] lay out a discriminative approach to auditorycomprehension

The cues that we make use of to represent the acousticsignal are the Frequency Band Summary Features (FBSFs)introduced by Arnold et al [18] as input cues FBSFs summa-rize the information present in the spectrogram of a speechsignal The algorithm that derives FBSFs first chunks theinput at the minima of the Hilbert amplitude envelope of thesignalrsquos oscillogram (see the upper panel of Figure 8) Foreach chunk the algorithm distinguishes 21 frequency bandsin the MEL scaled spectrum and intensities are discretizedinto 5 levels (lower panel in Figure 8) for small intervalsof time For each chunk and for each frequency band inthese chunks a discrete feature is derived that specifies chunknumber frequency band number and a description of thetemporal variation in the band bringing together minimummaximum median initial and final intensity values The21 frequency bands are inspired by the 21 receptive areason the cochlear membrane that are sensitive to differentranges of frequencies [118] Thus a given FBSF is a proxyfor cell assemblies in the auditory cortex that respond to aparticular pattern of changes over time in spectral intensityThe AcousticNDLCodeR package [119] for R [120] wasemployed to extract the FBSFs from the audio files

We tested ldl on 20 hours of speech sampled from theaudio files of the ucla library broadcast newsscapedata a vast repository of multimodal TV news broadcasts

Complexity 19

0 5000 10000 15000 20000 25000 30000 35000 40000minus6000

minus4000

minus2000

0

2000

4000

6000

8000

0 5000 10000 15000 20000 25000 30000 35000 40000minus6000

minus4000

minus2000

0

2000

4000

6000

8000

Figure 7Oscillogram for twodifferent realizations of theword crisiswith different degrees of reduction in theNewsScape archive [kraız](left)and [khraısıs](right)

00 01 02 03 04 05 06time

minus020minus015minus010minus005

000005010015

ampl

itude

1

21

frequ

ency

ban

d

Figure 8 Oscillogram of the speech signal for a realization of the word economic with Hilbert envelope (in orange) outlining the signal isshown on the top panel The lower panel depicts the discretized MEL scaled spectrum of the signalThe vertical bars are the boundary pointsthat partition the signal into 4 chunks For one chunk horizontal lines (in red) highlight one of the frequency bands for which the FBSFsprovide summaries of the variation over time in that frequency band The FBSF for the highlighted band is ldquoband11-start3-median3-min2-max5-end3-part2rdquo

provided to us by the Distributed Little Red Hen Lab Theaudio files of this resource were automatically classified asclean for relatively clean parts where there is speech withoutbackground noise or music and noisy for speech snippetswhere background noise or music is present Here we reportresults for 20 hours of clean speech to a total of 131673word tokens (representing 4779word types) with in all 40639distinct FBSFs

The FBSFs for the word tokens are brought together ina matrix C119886 with dimensions 131673 audio tokens times 40639FBSFs The targeted semantic vectors are taken from the Smatrix which is expanded to a matrix with 131673 rows onefor each audio token and 4609 columns the dimension of

the semantic vectors Although the transformation matrix Fcould be obtained by calculating C1015840S the calculation of C1015840is numerically expensive To reduce computational costs wecalculated F as follows (the transpose of a square matrix Xdenoted by X119879 is obtained by replacing the upper triangleof the matrix by the lower triangle and vice versa Thus( 3 87 2 )119879 = ( 3 78 2 ) For a nonsquare matrix the transpose isobtained by switching rows and columnsThus a 4times2 matrixbecomes a 2 times 4 matrix when transposed)

CF = S

C119879CF = C119879S

20 Complexity

(C119879C)minus1 (C119879C) F = (C119879C)minus1 C119879SF = (C119879C)minus1 C119879S

(17)

In this way matrix inversion is required for a much smallermatrixC119879C which is a square matrix of size 40639 times 40639

To evaluate model performance we compared the esti-mated semantic vectors with the targeted semantic vectorsusing the Pearson correlation We therefore calculated the131673 times 131673 correlation matrix for all possible pairs ofestimated and targeted semantic vectors Precisionwas calcu-lated by comparing predicted vectors with the gold standardprovided by the targeted vectors Recognition was defined tobe successful if the correlation of the predicted vector withthe targeted gold vector was the highest of all the pairwisecorrelations of this predicted vector with any of the other goldsemantic vectors Precision defined as the proportion of cor-rect recognitions divided by the total number of audio tokenswas at 3361 For the correctly identified words the meancorrelation was 072 and for the incorrectly identified wordsit was 055 To place this in perspective a naive discrimi-nation model with discrete lexomes as output performed at12 and a deep convolution network Mozilla DeepSpeech(httpsgithubcommozillaDeepSpeech based onHannun et al [9]) performed at 6 The low performanceofMozilla DeepSpeech is due primarily due to its dependenceon a languagemodelWhenpresentedwith utterances insteadof single words it performs remarkably well

44 Discussion A problem with naive discriminative learn-ing that has been puzzling for a long time is that measuresbased on ndl performed well as predictors of processingtimes [54 58] whereas accuracy for lexical decisions waslow An accuracy of 27 for a model performing an 11480-classification task is perhaps reasonable but the lack ofprecision is unsatisfactory when the goal is to model humanvisual lexicality decisions Bymoving fromndl to ldlmodelaccuracy is substantially improved (to 59) At the sametime predictions for reaction times improved considerablyas well As lexicality decisions do not require word identifica-tion further improvement in predicting decision behavior isexpected to be possible by considering not only whether thepredicted semantic is closest to the targeted vector but alsomeasures such as how densely the space around the predictedsemantic vector is populated

Accuracy for auditory comprehension is lower for thedata we considered above at around 33 Interestingly aseries of studies indicates that recognizing isolated wordstaken out of running speech is a nontrivial task also forhuman listeners [121ndash123] Correct identification by nativespeakers of 1000 randomly sampled word tokens from aGerman corpus of spontaneous conversational speech rangedbetween 21 and 44 [18] For both human listeners andautomatic speech recognition systems recognition improvesconsiderably when words are presented in their naturalcontext Given that ldl with FBSFs performs very wellon isolated word recognition it seems worth investigating

further whether the present approach can be developed into afully-fledged model of auditory comprehension that can takefull utterances as input For a blueprint of how we plan toimplement such a model see Baayen et al [124]

5 Speech Production

This section examines whether we can predict wordsrsquo formsfrom the semantic vectors of S If this is possible for thepresent dataset with reasonable accuracy we have a proofof concept that discriminative morphology is feasible notonly for comprehension but also for speech productionThe first subsection introduces the computational imple-mentation The next subsection reports on the modelrsquosperformance which is evaluated first for monomorphemicwords then for inflected words and finally for derivedwords Section 53 provides further evidence for the pro-duction network by showing that as the support from thesemantics for the triphones becomes weaker the amount oftime required for articulating the corresponding segmentsincreases

51 Computational Implementation For a productionmodelsome representational format is required for the output thatin turn drives articulation In what follows we make useof triphones as output features Triphones capture part ofthe contextual dependencies that characterize speech andthat render problematic the phoneme as elementary unit ofa phonological calculus [93] Triphones are in many waysnot ideal in that they inherit the limitations that come withdiscrete units Other output features structured along thelines of gestural phonology [125] or time series of movementsof key articulators registered with electromagnetic articulog-raphy or ultrasound are on our list for further explorationFor now we use triphones as a convenience construct andwe will show that given the support from the semantics forthe triphones the sequence of phones can be reconstructedwith high accuracy As some models of speech productionassemble articulatory targets from phone segments (eg[126]) the present implementation can be seen as a front-endfor this type of model

Before putting the model to the test we first clarify theway the model works by means of our toy lexicon with thewords one two and three Above we introduced the semanticmatrix S (13) which we repeat here for convenience

S = one two threeonetwothree

( 100201031001

040110 ) (18)

as well as an C indicator matrix specifying which triphonesoccur in which words (12) As in what follows this matrixspecifies the triphones targeted for productionwe henceforthrefer to this matrix as the Tmatrix

Complexity 21

T

= wV wVn Vn tu tu Tr Tri rionetwothree

( 100100

100010

010001

001001 ) (19)

For production our interest is in thematrixG that transformsthe row vectors of S into the row vectors of T ie we need tosolve

SG = T (20)

Given G we can predict for any semantic vector s the vectorof triphones t that quantifies the support for the triphonesprovided by s simply by multiplying s with G

t = sG (21)

As before the transformation matrix G is straightforwardto estimate Let S1015840 denote the Moore-Penrose generalizedinverse of S Since

S1015840SG = S1015840T (22)

we have

G = S1015840T (23)

Given G we can predict the triphone matrix T from thesemantic matrix S

SG = T (24)

For the present example the inverse of S and S1015840 is

S1015840 = one two threeonetwothree

( 110minus021minus009minus029107minus008

minus041minus002104 ) (25)

and the transformation matrix G is

G = wV wVn Vn tu tu Tr Tri ri

onetwothree

( 110minus021minus009110minus021minus009

110minus021minus009minus029107minus008

minus029107minus008minus041minus002104

minus041minus002104minus041minus002104 ) (26)

For this simple example T is virtually identical to T Forrealistic data T will not be identical to T but will be anapproximation of it that is optimal in the least squares senseThe triphones with the strongest support are expected to bethe most likely to be the triphones defining a wordrsquos form

We made use of the S matrix which we derived fromthe tasa corpus as described in Section 3 The majority ofcolumns of the 23561 times 23561 matrix S show very smalldeviations from zero and hence are uninformative As beforewe reduced the number of columns of S by removing columnswith very low variance Here one option is to remove allcolumns with a variance below a preset threshold 120579 Howeverto avoid adding a threshold as a free parameter we setthe number of columns retained to the number of differenttriphones in a given dataset a number which is around 119899 =4500 In summary S denotes a 119908times119899 matrix that specifies foreach of 119908 words an 119899-dimensional semantic vector

52 Model Performance

521 Performance on Monolexomic Words We first exam-ined model performance on a dataset comprising monolex-omicwords that did not carry any inflectional exponentsThisdataset of 3987words comprised three irregular comparatives

(elder less andmore) two irregular superlatives (least most)28 irregular past tense forms and one irregular past participle(smelt) For this set of words we constructed a 3987 times 4446matrix of semantic vectors S119898 (a submatrix of the S matrixintroduced above in Section 3) and a 3987 times 4446 triphonematrix T119898 (a submatrix of T) The number of colums of S119898was set to the number of different triphones (the columndimension of T119898) Those column vectors of S119898 were retainedthat had the 4446 highest variances We then estimated thetransformation matrix G and used this matrix to predict thetriphones that define wordsrsquo forms

We evaluated model performance in two ways First weinspected whether the triphones with maximal support wereindeed the targeted triphones This turned out to be the casefor all words Targeted triphones had an activation valueclose to one and nontargeted triphones an activation closeto zero As the triphones are not ordered we also investi-gated whether the sequence of phones could be constructedcorrectly for these words To this end we set a thresholdof 099 extracted all triphones with an activation exceedingthis threshold and used the all simple paths (a path issimple if the vertices it visits are not visited more than once)function from the igraph package [127] to calculate all pathsstarting with any left-edge triphone in the set of extractedtriphones From the resulting set of paths we selected the

22 Complexity

longest path which invariably was perfectly aligned with thesequence of triphones that defines wordsrsquo forms

We also evaluated model performance with a secondmore general heuristic algorithm that also makes use of thesame algorithm from graph theory Our algorithm sets up agraph with vertices collected from the triphones that are bestsupported by the relevant semantic vectors and considers allpaths it can find that lead from an initial triphone to a finaltriphoneThis algorithm which is presented inmore detail inthe appendix and which is essential for novel complex wordsproduced the correct form for 3982 out of 3987 words Itselected a shorter form for five words int for intent lin forlinnen mis for mistress oint for ointment and pin for pippinThe correct forms were also found but ranked second due toa simple length penalty that is implemented in the algorithm

From these analyses it is clear that mapping nearly4000 semantic vectors on their corresponding triphone pathscan be accomplished with very high accuracy for Englishmonolexomic words The question to be addressed nextis how well this approach works for complex words Wefirst address inflected forms and limit ourselves here to theinflected variants of the present set of 3987 monolexomicwords

522 Performance on Inflected Words Following the classicdistinction between inflection and word formation inflectedwords did not receive semantic vectors of their own Never-theless we can create semantic vectors for inflected words byadding the semantic vector of an inflectional function to thesemantic vector of its base However there are several ways inwhich themapping frommeaning to form for inflectedwordscan be set up To explain this we need some further notation

Let S119898 and T119898 denote the submatrices of S and T thatcontain the semantic and triphone vectors of monolexomicwords Assume that a subset of 119896 of these monolexomicwords is attested in the training corpus with inflectionalfunction 119886 Let T119886 denote the matrix with the triphonevectors of these inflected words and let S119886 denote thecorresponding semantic vectors To obtain S119886 we take thepertinent submatrix S119898119886 from S119898 and add the semantic vectors119886 of the affix

S119886 = S119898119886 + i⨂ s119886 (27)

Here i is a unit vector of length 119896 and ⨂ is the generalizedKronecker product which in (27) stacks 119896 copies of s119886 row-wise As a first step we could define a separate mapping G119886for each inflectional function 119886

S119886G119886 = T119886 (28)

but in this set-up learning of inflected words does not benefitfrom the knowledge of the base words This can be remediedby a mapping for augmented matrices that contain the rowvectors for both base words and inflected words[S119898

S119886]G119886 = [T119898

T119886] (29)

The dimension of G119886 (length of semantic vector by lengthof triphone vector) remains the same so this option is notmore costly than the preceding one Nevertheless for eachinflectional function a separate large matrix is required Amuch more parsimonious solution is to build augmentedmatrices for base words and all inflected words jointly

[[[[[[[[[[S119898S1198861S1198862S119886119899

]]]]]]]]]]G = [[[[[[[[[[

T119898T1198861T1198862T119886119899

]]]]]]]]]](30)

The dimension of G is identical to that of G119886 but now allinflectional functions are dealt with by a single mappingIn what follows we report the results obtained with thismapping

We selected 6595 inflected variants which met the cri-terion that the frequency of the corresponding inflectionalfunction was at least 50 This resulted in a dataset with 91comparatives 97 superlatives 2401 plurals 1333 continuousforms (eg walking) 859 past tense forms and 1086 formsclassed as perfective (past participles) as well as 728 thirdperson verb forms (eg walks) Many forms can be analyzedas either past tenses or as past participles We followed theanalyses of the treetagger which resulted in a dataset inwhichboth inflectional functions are well-attested

Following (30) we obtained an (augmented) 10582 times5483 semantic matrix S whereas before we retained the 5483columns with the highest column variance The (augmented)triphone matrix T for this dataset had the same dimensions

Inspection of the activations of the triphones revealedthat targeted triphones had top activations for 85 of themonolexomic words and 86 of the inflected words Theproportion of words with at most one intruding triphone was97 for both monolexomic and inflected words The graph-based algorithm performed with an overall accuracy of 94accuracies broken down bymorphology revealed an accuracyof 99 for the monolexomic words and an accuracy of 92for inflected words One source of errors for the productionalgorithm is inconsistent coding in the celex database Forinstance the stem of prosper is coded as having a final schwafollowed by r but the inflected forms are coded without ther creating a mismatch between a partially rhotic stem andcompletely non-rhotic inflected variants

We next put model performance to a more stringent testby using 10-fold cross-validation for the inflected variantsFor each fold we trained on all stems and 90 of all inflectedforms and then evaluated performance on the 10of inflectedforms that were not seen in training In this way we can ascer-tain the extent to which our production system (network plusgraph-based algorithm for ordering triphones) is productiveWe excluded from the cross-validation procedure irregularinflected forms forms with celex phonological forms withinconsistent within-paradigm rhoticism as well as forms thestem of which was not available in the training set Thus

Complexity 23

cross-validation was carried out for a total of 6236 inflectedforms

As before the semantic vectors for inflected words wereobtained by addition of the corresponding content and inflec-tional semantic vectors For each training set we calculatedthe transformationmatrixG from the S andTmatrices of thattraining set For an out-of-bag inflected form in the test setwe calculated its semantic vector and multiplied this vectorwith the transformation matrix (using (21)) to obtain thepredicted triphone vector t

The proportion of forms that were predicted correctlywas 062 The proportion of forms ranked second was 025Forms that were incorrectly ranked first typically were otherinflected forms (including bare stems) that happened toreceive stronger support than the targeted form Such errorsare not uncommon in spoken English For instance inthe Buckeye corpus [110] closest is once reduced to closFurthermore Dell [90] classified forms such as concludementfor conclusion and he relax for he relaxes as (noncontextual)errors

The algorithm failed to produce the targeted form for3 of the cases Examples of the forms produced instead arethe past tense for blazed being realized with [zId] insteadof [zd] the plural mouths being predicted as having [Ts] ascoda rather than [Ds] and finest being reduced to finst Thevoiceless production formouth does not follow the dictionarynorm but is used as attested by on-line pronunciationdictionaries Furthermore the voicing alternation is partlyunpredictable (see eg [128] for final devoicing in Dutch)and hence model performance here is not unreasonable Wenext consider model accuracy for derived words

523 Performance on Derived Words When building thevector space model we distinguished between inflectionand word formation Inflected words did not receive theirown semantic vectors By contrast each derived word wasassigned its own lexome together with a lexome for itsderivational function Thus happiness was paired with twolexomes happiness and ness Since the semantic matrix forour dataset already contains semantic vectors for derivedwords we first investigated how well forms are predictedwhen derived words are assessed along with monolexomicwords without any further differentiation between the twoTo this end we constructed a semantic matrix S for 4885words (rows) by 4993 (columns) and constructed the trans-formation matrix G from this matrix and the correspondingtriphone matrix T The predicted form vectors of T =SG supported the targeted triphones above all other tri-phones without exception Furthermore the graph-basedalgorithm correctly reconstructed 99 of all forms withonly 5 cases where it assigned the correct form secondrank

Next we inspected how the algorithm performs whenthe semantic vectors of derived forms are obtained fromthe semantic vectors of their base words and those of theirderivational lexomes instead of using the semantic vectors ofthe derived words themselves To allow subsequent evalua-tion by cross-validation we selected those derived words that

contained an affix that occured at least 30 times in our data set(again (38) agent (177) ful (45) instrument (82) less(54) ly (127) and ness (57)) to a total of 770 complex words

We first combined these derived words with the 3987monolexomic words For the resulting 4885 words the Tand S matrices were constructed from which we derivedthe transformation matrix G and subsequently the matrixof predicted triphone strengths T The proportion of wordsfor which the targeted triphones were the best supportedtriphones was 096 and the graph algorithm performed withan accuracy of 989

In order to assess the productivity of the system weevaluated performance on derived words by means of 10-fold cross-validation For each fold we made sure that eachaffix was present proportionally to its frequency in the overalldataset and that a derived wordrsquos base word was included inthe training set

We first examined performance when the transformationmatrix is estimated from a semantic matrix that contains thesemantic vectors of the derived words themselves whereassemantic vectors for unseen derived words are obtained bysummation of the semantic vectors of base and affix It turnsout that this set-up results in a total failure In the presentframework the semantic vectors of derived words are tooidiosyncratic and too finely tuned to their own collocationalpreferences They are too scattered in semantic space tosupport a transformation matrix that supports the triphonesof both stem and affix

We then used exactly the same cross-validation procedureas outlined above for inflected words constructing semanticvectors for derived words from the semantic vectors of baseand affix and calculating the transformation matrix G fromthese summed vectors For unseen derivedwordsGwas usedto transform the semantic vectors for novel derived words(obtained by summing the vectors of base and affix) intotriphone space

For 75 of the derived words the graph algorithmreconstructed the targeted triphones For 14 of the derivednouns the targeted form was not retrieved These includecases such as resound which the model produced with [s]instead of the (unexpected) [z] sewer which the modelproduced with R instead of only R and tumbler where themodel used syllabic [l] instead of nonsyllabic [l] given in thecelex target form

53 Weak Links in the Triphone Graph and Delays in SpeechProduction An important property of the model is that thesupport for trigrams changes where a wordrsquos graph branchesout for different morphological variants This is illustratedin Figure 9 for the inflected form blending Support for thestem-final triphone End is still at 1 but then blend can endor continue as blends blended or blending The resultinguncertainty is reflected in the weights on the edges leavingEnd In this example the targeted ing form is driven by theinflectional semantic vector for continuous and hence theedge to ndI is best supported For other forms the edgeweights will be different and hence other paths will be bettersupported

24 Complexity

1

1

1051

024023

03

023

1

001 001

1

02

03

023

002

1 0051

bl

blE

lEn

EndndI

dIN

IN

INI

NIN

dId

Id

IdI

dII

IIN

ndz

dz

dzI

zIN

nd

EnI

nIN

lEI

EIN

Figure 9 The directed graph for blending Vertices represent triphones including triphones such as IIN that were posited by the graphalgorithm to bridge potentially relevant transitions that are not instantiated in the training data Edges are labelled with their edge weightsThe graph the edge weights of which are specific to the lexomes blend and continuous incorporates not only the path for blending butalso the paths for blend blends and blended

The uncertainty that arises at branching points in thetriphone graph is of interest in the light of several exper-imental results For instance inter keystroke intervals intyping become longer at syllable and morph boundaries[129 130] Evidence from articulography suggests variabilityis greater at morph boundaries [131] Longer keystroke exe-cution times and greater articulatory variability are exactlywhat is expected under reduced edge support We thereforeexamined whether the edge weights at the first branchingpoint are predictive for lexical processing To this end weinvestigated the acoustic duration of the segment at the centerof the first triphone with a reduced LDL edge weight (in thepresent example d in ndI henceforth lsquobranching segmentrsquo)for those words in our study that are attested in the Buckeyecorpus [110]

The dataset we extracted from the Buckeye corpus com-prised 15105 tokens of a total of 1327 word types collectedfrom 40 speakers For each of these words we calculatedthe relative duration of the branching segment calculatedby dividing segment duration by word duration For thepurposes of statistical evaluation the distribution of thisrelative duration was brought closer to normality bymeans ofa logarithmic transformationWe fitted a generalized additivemixed model [132] to log relative duration with randomintercepts for word and speaker log LDL edge weight aspredictor of interest and local speech rate (Phrase Rate)log neighborhood density (Ncount) and log word frequencyas control variables A summary of this model is presentedin Table 7 As expected an increase in LDL Edge Weight

goes hand in hand with a reduction in the duration of thebranching segment In other words when the edge weight isreduced production is slowed

The adverse effects of weaknesses in a wordrsquos path inthe graph where the path branches is of interest againstthe discussion about the function of weak links in diphonetransitions in the literature on comprehension [133ndash138] Forcomprehension it has been argued that bigram or diphonelsquotroughsrsquo ie low transitional probabilities in a chain of hightransitional probabilities provide points where sequencesare segmented and parsed into their constituents Fromthe perspective of discrimination learning however low-probability transitions function in exactly the opposite wayfor comprehension [54 124] Naive discriminative learningalso predicts that troughs should give rise to shorter process-ing times in comprehension but not because morphologicaldecomposition would proceed more effectively Since high-frequency boundary bigrams and diphones are typically usedword-internally across many words they have a low cuevalidity for thesewords Conversely low-frequency boundarybigrams are much more typical for very specific base+affixcombinations and hence are better discriminative cues thatafford enhanced activation of wordsrsquo lexomes This in turngives rise to faster processing (see also Ramscar et al [78] forthe discriminative function of low-frequency bigrams in low-frequency monolexomic words)

There is remarkable convergence between the directedgraphs for speech production such as illustrated in Figure 9and computational models using temporal self-organizing

Complexity 25

Table 7 Statistics for the partial effects of a generalized additive mixed model fitted to the relative duration of edge segments in the Buckeyecorpus The trend for log frequency is positive accelerating TPRS thin plate regression spline

A parametric coefficients Estimate Std Error t-value p-valueIntercept -19165 00592 -323908 lt 00001Log LDL Edge Weight -01358 00400 -33960 00007Phrase Rate 00047 00021 22449 00248Log Ncount 01114 00189 58837 lt 00001B smooth terms edf Refdf F-value p-valueTPRS log Frequency 19007 19050 75495 00004random intercepts word 11498839 13230000 314919 lt 00001random intercepts speaker 302995 390000 345579 lt 00001

maps (TSOMs [139ndash141]) TSOMs also use error-drivenlearning and in recent work attention is also drawn toweak edges in wordsrsquo paths in the TSOM and the conse-quences thereof for lexical processing [142] We are not usingTSOMs however one reason being that in our experiencethey do not scale up well to realistically sized lexiconsA second reason is that we are pursuing the hypothesisthat form serves meaning and that self-organization ofform by itself is not necessary This hypothesis is likelytoo strong especially as we do not provide a rationale forthe trigram and triphone units that we use to representaspects of form It is worth noting that spatial organizationof form features is not restricted to TSOMs Although inour approach the spatial organization of triphone features isleft unspecified such an organization can be easily enforced(see [85] for further details) by using an algorithm suchas graphopt which self-organizes the vertices of a graphinto a two-dimensional plane Figure 9 was obtained withthis algorithm (Graphopt was developed for the layout oflarge graphs httpwwwschmuhlorggraphopt andis implemented in the igraph package Graphopt uses basicprinciples of physics to iteratively determine an optimallayout Each node in the graph is given both mass and anelectric charge and edges between nodes are modeled asspringsThis sets up a system inwhich there are attracting andrepelling forces between the vertices of the graph and thisphysical system is simulated until it reaches an equilibrium)

54 Discussion Our production network reconstructsknown wordsrsquo forms with a high accuracy 999 formonolexomic words 92 for inflected words and 99 forderived words For novel complex words accuracy under10-fold cross-validation was 62 for inflected words and 75for derived words

The drop in performance for novel forms is perhapsunsurprising given that speakers understand many morewords than they themselves produce even though they hearnovel forms on a fairly regular basis as they proceed throughlife [79 143]

However we also encountered several technical problemsthat are not due to the algorithm but to the representationsthat the algorithm has had to work with First it is surprisingthat accuracy is as high as it is given that the semanticvectors are constructed from a small corpus with a very

simple discriminative algorithm Second we encounteredinconsistencies in the phonological forms retrieved from thecelex database inconsistencies that in part are due to theuse of discrete triphones Third many cases where the modelpredictions do not match the targeted triphone sequence thetargeted forms have a minor irregularity (eg resound with[z] instead of [s]) Fourth several of the typical errors that themodel makes are known kinds of speech errors or reducedforms that one might encounter in engaged conversationalspeech

It is noteworthy that themodel is almost completely data-driven The triphones are derived from celex and otherthan somemanual corrections for inconsistencies are derivedautomatically from wordsrsquo phone sequences The semanticvectors are based on the tasa corpus and were not in any wayoptimized for the production model Given the triphone andsemantic vectors the transformation matrix is completelydetermined No by-hand engineering of rules and exceptionsis required nor is it necessary to specify with hand-codedlinks what the first segment of a word is what its secondsegment is etc as in the weaver model [37] It is only in theheuristic graph algorithm that three thresholds are requiredin order to avoid that graphs become too large to remaincomputationally tractable

The speech production system that emerges from thisapproach comprises first of all an excellent memory forforms that have been encountered before Importantly thismemory is not a static memory but a dynamic one It is not arepository of stored forms but it reconstructs the forms fromthe semantics it is requested to encode For regular unseenforms however a second network is required that projects aregularized semantic space (obtained by accumulation of thesemantic vectors of content and inflectional or derivationalfunctions) onto the triphone output space Importantly itappears that no separate transformations or rules are requiredfor individual inflectional or derivational functions

Jointly the two networks both of which can also betrained incrementally using the learning rule ofWidrow-Hoff[44] define a dual route model attempts to build a singleintegrated network were not successfulThe semantic vectorsof derived words are too idiosyncratic to allow generalizationfor novel forms It is the property of the semantic vectorsof derived lexomes described in Section 323 namely thatthey are close to but not inside the cloud of their content

26 Complexity

lexomes which makes them productive Novel forms do notpartake in the idiosyncracies of lexicalized existing wordsbut instead remain bound to base and derivational lexomesIt is exactly this property that makes them pronounceableOnce produced a novel form will then gravitate as experi-ence accumulates towards the cloud of semantic vectors ofits morphological category meanwhile also developing itsown idiosyncracies in pronunciation including sometimeshighly reduced pronunciation variants (eg tyk for Dutchnatuurlijk ([natyrlk]) [111 144 145] It is perhaps possibleto merge the two routes into one but for this much morefine-grained semantic vectors are required that incorporateinformation about discourse and the speakerrsquos stance withrespect to the adressee (see eg [115] for detailed discussionof such factors)

6 Bringing in Time

Thus far we have not considered the role of time The audiosignal comes in over time longerwords are typically readwithmore than one fixation and likewise articulation is a temporalprocess In this section we briefly outline how time can bebrought into the model We do so by discussing the readingof proper names

Proper names (morphologically compounds) pose achallenge to compositional theories as the people referredto by names such as Richard Dawkins Richard NixonRichardThompson and SandraThompson are not obviouslysemantic composites of each other We therefore assignlexomes to names irrespective of whether the individu-als referred to are real or fictive alive or dead Further-more we assume that when the personal name and familyname receive their own fixations both names are under-stood as pertaining to the same named entity which istherefore coupled with its own unique lexome Thus forthe example sentences in Table 8 the letter trigrams ofJohn are paired with the lexome JohnClark and like-wise the letter trigrams of Clark are paired with this lex-ome

By way of illustration we obtained thematrix of semanticvectors training on 100 randomly ordered tokens of thesentences of Table 8 We then constructed a trigram cuematrix C specifying for each word (John Clark wrote )which trigrams it contains In parallel a matrix L specify-ing for each word its corresponding lexomes (JohnClarkJohnClark write ) was set up We then calculatedthe matrix F by solving CF = L and used F to cal-culate estimated (predicted) semantic vectors L Figure 10presents the correlations of the estimated semantic vectorsfor the word forms John John Welsch Janet and Clarkwith the targeted semantic vectors for the named entitiesJohnClark JohnWelsch JohnWiggam AnneHastieand JaneClark For the sequentially read words John andWelsch the semantic vector generated by John and thatgenerated by Welsch were summed to obtain the integratedsemantic vector for the composite name

Figure 10 upper left panel illustrates that upon readingthe personal name John there is considerable uncertainty

about which named entity is at issue When subsequentlythe family name is read (upper right panel) uncertainty isreduced and John Welsch now receives full support whereasother named entities have correlations close to zero or evennegative correlations As in the small world of the presenttoy example Janet is a unique personal name there is nouncertainty about what named entity is at issue when Janetis read For the family name Clark on its own by contrastJohn Clark and Janet Clark are both viable options

For the present example we assumed that every ortho-graphic word received one fixation However depending onwhether the eye lands far enough into the word namessuch as John Clark and compounds such as graph theorycan also be read with a single fixation For single-fixationreading learning events should comprise the joint trigrams ofthe two orthographic words which are then simultaneouslycalibrated against the targeted lexome (JohnClark graph-theory)

Obviously the true complexities of reading such asparafoveal preview are not captured by this example Never-theless as a rough first approximation this example showsa way forward for integrating time into the discriminativelexicon

7 General Discussion

We have presented a unified discrimination-driven modelfor visual and auditory word comprehension and wordproduction the discriminative lexicon The layout of thismodel is visualized in Figure 11 Input (cochlea and retina)and output (typing and speaking) systems are presented inlight gray the representations of themodel are shown in darkgray For auditory comprehension we make use of frequencyband summary (FBS) features low-level cues that we deriveautomatically from the speech signal The FBS features serveas input to the auditory network F119886 that maps vectors of FBSfeatures onto the semantic vectors of S For reading lettertrigrams represent the visual input at a level of granularitythat is optimal for a functional model of the lexicon (see [97]for a discussion of lower-level visual features) Trigrams arerelatively high-level features compared to the FBS featuresFor features implementing visual input at a much lower levelof visual granularity using histograms of oriented gradients[146] see Linke et al [15] Letter trigrams are mapped by thevisual network F119900 onto S

For reading we also implemented a network heredenoted by K119886 that maps trigrams onto auditory targets(auditory verbal images) represented by triphones Theseauditory triphones in turn are mapped by network H119886onto the semantic vectors S This network is motivated notonly by our observation that for predicting visual lexicaldecision latencies triphones outperform trigrams by a widemargin Network H119886 is also part of the control loop forspeech production see Hickok [147] for discussion of thiscontrol loop and the necessity of auditory targets in speechproduction Furthermore the acoustic durations with whichEnglish speakers realize the stem vowels of English verbs[148] as well as the duration of word final s in English [149]

Complexity 27

Table 8 Example sentences for the reading of proper names To keep the example simple inflectional lexomes are not taken into account

John Clark wrote great books about antsJohn Clark published a great book about dangerous antsJohn Welsch makes great photographs of airplanesJohn Welsch has a collection of great photographs of airplanes landingJohn Wiggam will build great harpsichordsJohn Wiggam will be a great expert in tuning harpsichordsAnne Hastie has taught statistics to great second year studentsAnne Hastie has taught probability to great first year studentsJanet Clark teaches graph theory to second year studentsJohnClark write great book about antJohnClark publish a great book about dangerous antJohnWelsch make great photograph of airplaneJohnWelsch have a collection of great photograph airplane landingJohnWiggam future build great harpsichordJohnWiggam future be a great expert in tuning harpsichordAnneHastie have teach statistics to great second year studentAnneHastie have teach probability to great first year studentJanetClark teach graphtheory to second year student

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

John

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

John Welsch

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

Janet

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

Clark

Pear

son

corr

elat

ion

minus0

50

00

51

0

Figure 10 Correlations of estimated semantic vectors for John JohnWelsch Janet andClarkwith the targeted semantic vectors of JohnClarkJohnWelsch JohnWiggam AnneHastie and JaneClark

can be predicted from ndl networks using auditory targets todiscriminate between lexomes [150]

Speech production is modeled by network G119886 whichmaps semantic vectors onto auditory targets which in turnserve as input for articulation Although current modelsof speech production assemble articulatory targets fromphonemes (see eg [126 147]) a possibility that is certainlycompatible with our model when phones are recontextual-ized as triphones we think that it is worthwhile investigatingwhether a network mapping semantic vectors onto vectors ofarticulatory parameters that in turn drive the articulators canbe made to work especially since many reduced forms arenot straightforwardly derivable from the phone sequences oftheir lsquocanonicalrsquo forms [111 144]

We have shown that the networks of our model havereasonable to high production and recognition accuraciesand also generate a wealth of well-supported predictions forlexical processing Central to all networks is the hypothesisthat the relation between form and meaning can be modeleddiscriminatively with large but otherwise surprisingly simplelinear networks the underlyingmathematics of which (linearalgebra) is well-understood Given the present results it isclear that the potential of linear algebra for understandingthe lexicon and lexical processing has thus far been severelyunderestimated (the model as outlined in Figure 11 is amodular one for reasons of computational convenience andinterpretabilityThe different networks probably interact (seefor some evidence [47]) One such interaction is explored

28 Complexity

typing speaking

orthographic targets

trigrams

auditory targets

triphones

semantic vectors

TASA

auditory cues

FBS features

orthographic cues

trigrams

cochlea retina

To Ta

Ha

GoGa

KaS

Fa Fo

CoCa

Figure 11 Overview of the discriminative lexicon Input and output systems are presented in light gray the representations of the model areshown in dark gray Each of these representations is a matrix with its own set of features Arcs are labeled with the discriminative networks(transformationmatrices) thatmapmatrices onto each otherThe arc from semantic vectors to orthographic vectors is in gray as thismappingis not explored in the present study

in Baayen et al [85] In their production model for Latininflection feedback from the projected triphone paths backto the semantics (synthesis by analysis) is allowed so thatof otherwise equally well-supported paths the path that bestexpresses the targeted semantics is selected For informationflows between systems in speech production see Hickok[147]) In what follows we touch upon a series of issues thatare relevant for evaluating our model

Incremental learning In the present study we esti-mated the weights on the connections of these networkswith matrix operations but importantly these weights canalso be estimated incrementally using the learning rule ofWidrow and Hoff [44] further improvements in accuracyare expected when using the Kalman filter [151] As allmatrices can be updated incrementally (and this holds aswell for the matrix with semantic vectors which are alsotime-variant) and in theory should be updated incrementally

whenever information about the order of learning events isavailable the present theory has potential for studying lexicalacquisition and the continuing development of the lexiconover the lifetime [79]

Morphology without compositional operations Wehave shown that our networks are predictive for a rangeof experimental findings Important from the perspective ofdiscriminative linguistics is that there are no compositionaloperations in the sense of Frege and Russell [1 2 152] Thework on compositionality in logic has deeply influencedformal linguistics (eg [3 153]) and has led to the beliefthat the ldquoarchitecture of the language facultyrdquo is groundedin a homomorphism between a calculus (or algebra) ofsyntactic representations and a calculus (or algebra) based onsemantic primitives Within this tradition compositionalityarises when rules combining representations of form arematched with rules combining representations of meaning

Complexity 29

The approach of Marelli and Baroni [60] who derive thesemantic vector of a complex word from the semantic vectorof its base and a dedicated affix-specific mapping from thepertinent base words to the corresponding derived wordsis in the spirit of compositionality in that a change in form(through affixation) goes hand in hand with a change inmeaning (modeled with a linear mapping in semantic space)

The perspective on morphological processing developedin the present study breaks with this tradition Althoughsemantic vectors for inflected words are obtained by summa-tion of the semantic vectors of the base word and those ofits inflectional functions (see [85] for modeling of the richverbal paradigms of Latin) the form vectors correspondingto the resulting semantic vectors are not obtained by thesummation of vectors for chunks of form nor by any otheroperations on forms Attempts to align chunks of wordsrsquo formwith their semantic structures mdash typically represented bygraphs constructed over semantic primitives (see eg [154])mdash are destined to end in quagmires of arbitrary decisionsabout the number of inflectional classes required (Estonian30 or 300) what the chunks should look like (does Germanhave a morpheme d in its articles and demonstratives derdie das diese ) whether Semitic morphology requiresoperations on form on one or two tiers [155ndash157] andhow many affixal position slots are required for Athebascanlanguages such asNavajo [158] A language that is particularlypainful for compositional theories is Yelı Dnye an Indo-Pacific language spoken on Rossel island (in the south-east ofPapuaNewGuinea)The language has a substantial inventoryof over a 1000 function words used for verb agreementTypically these words are monosyllables that simultaneouslyspecify values for negation tense aspect person and numberof subject deixis evidentiality associated motion and coun-terfactuality Furthermore verbs typically have no less thaneight different stems the use of which depends on particularconstellations of inflectional functions [159 160] Yelı Dnyethus provides a striking counterexample to the hypothesisthat a calculus of meaning would be paralleled by a calculusof form

Whereas the theory of linear discriminative learningradically rejects the idea that morphological processing isgrounded in compositionality as defined in mainstream for-mal linguistic theory it does allow for the conjoint expressionin form of lexical meanings and inflectional functions andfor forms to map onto such conjoint semantic vectors Butjust asmultiple inflectional functions can be expressed jointlyin a single word multiple words can map onto nonconjointsemantic vectors as was illustrated above for named entityrecognition John Wiggam can be understood to refer to avery specific person without this person being a composi-tional function of the proper name John and the surnameWiggam In both these cases there is no isomorphy betweeninflectional functions and lexical meanings on the one handand chunks of form on the other hand

Of course conjoint vectors for lexical meanings andinflectional functions can be implemented andmade to workonly because a Latin verb form such as sapıvissemus isunderstood when building the semantic space S to expresslexomes for person (first) number (plural) voice (active)

tense (pluperfect) mood (subjunctive) and lexical meaning(to know) (see [85] for computational modeling details)Although this could loosely be described as a decomposi-tional approach to inflected words we prefer not to use theterm lsquodecompositionalrsquo as in formal linguistics this termas outlined above has a very different meaning A moreadequate terminology would be that the system is conjointrealizational for production and conjoint inferential for com-prehension (we note here that we have focused on temporallyconjoint realization and inference leaving temporally disjointrealization and inference for sequences of words and possiblycompounds and fixed expression for further research)

Naive versus linear discriminative learning There isone important technical difference between the learningengine of the current study ldl and naive discriminativelearning (ndl) Because ndl makes the simplifying assump-tion that outcomes are orthogonal the weights from the cuesto a given outcome are independent of the weights from thecues to the other outcomes This independence assumptionwhich motivates the word naive in naive discriminativelearning makes it possible to very efficiently update networkweights during incremental learning In ldl by contrast thisindependence no longer holds learning is no longer naiveWe therefore refer to our networks not as ndl networksbut simply as linear discriminative learning networks Actualincremental updating of ldl networks is computationallymore costly but further numerical optimization is possibleand implementations are underway

Scaling An important property of the present approachis that good results are obtained already for datasets ofmodestsize Our semantic matrix is derived from the tasa corpuswhich with only 10 million words is dwarfed by the 28 billionwords of the ukWaC corpus used by Marelli and Baroni [60]more words than any individual speaker can ever encounterin their lifetime Similarly Arnold et al [18] report goodresults for word identification when models are trained on20 hours of speech which contrasts with the huge volumesof speech required for training deep learning networks forspeech recognition Although performance increases some-whatwithmore training data [161] it is clear that considerableheadway can be made with relatively moderate volumesof speech It thus becomes feasible to train models onfor instance Australian English and New Zealand Englishusing already existing speech corpora and to implementnetworks that make precise quantitative predictions for howunderstanding is mediated by listenersrsquo expectations abouttheir interlocutors (see eg [162])

In this study we have obtained good results with asemantic matrix with a dimensionality of around 4000lexomes By itself we find it surprising that a semantic vectorspace can be scaffolded with a mere 4000 words But thisfindingmay also shed light on the phenomenon of childhoodamnesia [163 164] Bauer and Larkina [165] pointed outthat young children have autobiographical memories thatthey may not remember a few years later They argue thatthe rate of forgetting diminishes as we grow older andstabilizes in adulthood Part of this process of forgettingmay relate to the changes in the semantic matrix Cruciallywe expect the correlational structure of lexomes to change

30 Complexity

considerably over time as experience accumulates If this isindeed the case the semantic vectors for the lexomes foryoung children are different from the corresponding lexomesfor older children which will again differ from those ofadults As a consequence autobiographical memories thatwere anchored to the lexomes at a young age can no longerbe accessed as the semantic vectors to which these memorieswere originally anchored no longer exist

No examplars Given how we estimate the transforma-tion matrices with which we navigate between form andmeaning it might seem that our approach is in essence animplementation of an exemplar-based theory of the mentallexicon After all for instance the C119886 matrix specifies foreach auditory word form (exemplar) its pertinent frequencyband summary features However the computational imple-mentation of the present study should not be confused withthe underlying algorithmic conceptualization The underly-ing conceptualization is that learning proceeds incrementallytrial by trial with the weights of transformation networksbeing updated with the learning rule of Widrow and Hoff[44] In the mean trial-by-trial updating with the Widrow-Hoff learning rule results in the same expected connectionweights as obtained by the present estimates using thegeneralized inverse The important conceptual advantage ofincremental learning is however that there is no need to storeexemplars By way of example for auditory comprehensionacoustic tokens are given with the input to the learningsystemThese tokens the ephemeral result of interactingwiththe environment leave their traces in the transformationmatrix F119886 but are not themselves stored The same holdsfor speech production We assume that speakers dynamicallyconstruct the semantic vectors from their past experiencesrather than retrieving these semantic vectors from somedictionary-like fixed representation familiar from currentfile systems The dynamic nature of these semantic vectorsemerged loud and clear from the modeling of novel complexwords for speech production under cross-validation In otherwords our hypothesis is that all input and output vectors areephemeral in the sense that they represent temporary statesof the cognitive system that are not themselves represented inthat system but that leave their traces in the transformationmatrices that jointly characterize and define the cognitivesystem that we approximate with the discriminative lexicon

This dynamic perspective on the mental lexicon asconsisting of several coupled networks that jointly bringthe cognitive system into states that in turn feed into fur-ther networks (not modeled here) for sentence integration(comprehension) and articulation (production) raises thequestion about the status of units in this theory Units playan important role when constructing numeric vectors forform and meaning Units for letter trigrams and phonetrigrams are of course context-enriched letter and phoneunits Content lexomes as well as lexomes for derivationaland inflectional functions are crutches for central semanticfunctions ranging from functions realizing relatively concreteonomasiological functions (lsquothis is a dog not a catrsquo) toabstract situational functions allowing entities events andstates to be specified on the dimensions of time spacequantity aspect etc However crucial to the central thrust of

the present study there are no morphological symbols (unitscombining form andmeaning) normorphological formunitssuch as stems and exponents of any kind in the modelAbove we have outlined the many linguistic considerationsthat have led us not to want to incorporate such units aspart of our theory Importantly as there are no hidden layersin our model it is not possible to seek for lsquosubsymbolicrsquoreflexes of morphological units in patterns of activation overhidden layers Here we depart from earlier connectionistmodels which rejected symbolic representations but retainedthe belief that there should be morphological representationalbeit subsymbolic ones Seidenberg and Gonnerman [166]for instance discuss morphological representations as beinglsquointerlevel representationsrsquo that are emergent reflections ofcorrelations among orthography phonology and semanticsThis is not to say that themore agglutinative a language is themore the present model will seem to the outside observer tobe operating withmorphemes But themore a language tendstowards fusional or polysyntheticmorphology the less strongthis impression will be

Model complexity Next consider the question of howdifficult lexical processing actually is Sidestepping the com-plexities of the neural embedding of lexical processing [126147] here we narrow down this question to algorithmiccomplexity State-of-the-art models in psychology [37 114167] implementmany layers of hierarchically organized unitsand many hold it for an established fact that such units arein some sense psychologically real (see eg [19 91 168])However empirical results can be equally well understoodwithout requiring the theoretical construct of the morpheme(see eg [47 54 58 166 169 170]) The present studyillustrates this point for lsquoboundaryrsquo effects in written andoral production When paths in the triphone graph branchuncertainty increases and processing slows down Althoughobserved lsquoboundaryrsquo effects are compatible with a post-Bloomfieldian lexicon the very same effects arise in ourmodel even though morphemes do not figure in any way inour computations

Importantly the simplicity of the linear mappings in ourmodel of the discriminative lexicon allows us to sidestep theproblem that typically arise in computationally implementedfull-scale hierarchical systems namely that errors arisingat lower levels propagate to higher levels The major stepsforward made in recent end-to-end models in languageengineering suggest that end-to-end cognitive models arealso likely to perform much better One might argue thatthe hidden layers of deep learning networks represent thetraditional lsquohidden layersrsquo of phonemes morphemes andword forms mediating between form and meaning in lin-guistic models and offshoots thereof in psychology Howeverthe case of baboon lexical learning [12 13] illustrates thatthis research strategy is not without risk and that simplerdiscriminative networks can substantially outperform deeplearning networks [15] We grant however that it is conceiv-able that model predictions will improve when the presentlinear networks are replaced by deep learning networks whenpresented with the same input and output representationsused here (see Zhao et al [171] for a discussion of lossfunctions for deep learning targeting numeric instead of

Complexity 31

categorical output vectors) What is unclear is whethersuch networks will improve our understandingmdashthe currentlinear networks offer the analyst connection weights betweeninput and output representations that are straightforwardlylinguistically interpretable Where we think deep learningnetworks really will come into their own is bridging the gapbetween for instance the cochlea and retina on the one handand the heuristic representations (letter trigrams) that wehave used as a starting point for our functional theory of themappings between form and meaning

Network flexibility An important challenge for deeplearning networks designed to model human lexical process-ing is to keep the networks flexible and open to rapid updat-ing This openness to continuous learning is required notonly by findings on categorization [40 41 41 42] includingphonetic categorization [172 173] but is also visible in timeseries of reaction times Baayen and Hendrix [17] reportedimproved prediction of visual lexical decision reaction timesin the British Lexicon Project when a naive discriminativelearning network is updated from trial to trial simulating thewithin-experiment learning that goes on as subjects proceedthrough the experiment In natural speech the phenomenonof phonetic convergence between interlocutors [174] likewisebears witness to a lexicon that is constantly recalibratedin interaction with its environment (for the many ways inwhich interlocutorsrsquo speech aligns see [175]) Likewise therapidity with which listeners adapt to foreign accents [176]within a few minutes shows that lexical networks exhibitfast local optimization As deep learning networks typicallyrequire huge numbers of cycles through the training dataonce trained they are at risk of not having the requiredflexibility for fast local adaptation This risk is reduced for(incremental) wide learning as illustrated by the phoneticconvergence evidenced by the model of Arnold et al [18] forauditory comprehension (a very similar point was made byLevelt [177] in his critique of connectionist models in thenineties)

Open questions This initial study necessarily leavesmany questions unanswered Compounds and inflectedderived words have not been addressed and morphologicaloperations such as reduplication infixation and nonconcate-native morphology provide further testing grounds for thepresent approach We also need to address the processingof compounds with multiple constituents ubiquitous inlanguages as different as German Mandarin and EstonianFurthermore in actual utterances words undergo assimila-tion at their boundaries and hence a crucial test case forthe discriminative lexicon is to model the production andcomprehension of words in multi-word utterances For ourproduction model we have also completely ignored stresssyllabification and pitch which however likely has renderedthe production task more difficult than necessary

A challenge for discriminative morphology is the devel-opment of proper semantic matrices For languages withrich morphology such as Estonian current morphologicalparsers will identify case-inflected words as nominativesgenitives or partitives (among others) but these labelsfor inflectional forms do not correspond to the semanticfunctions encoded in these forms (see eg [178 179]) What

is required are computational tools that detect the appropriateinflectional lexomes for these words in the sentences orutterances in which they are embedded

A related problem specifically for the speech productionmodel is how to predict the strongly reduced word formsthat are rampant in conversational speech [111] Our auditorycomprehension network F119886 is confronted with reduced formsin training and the study of Arnold et al [18] indicates thatpresent identification accuracymay approximate human per-formance for single-word recognition Whereas for auditorycomprehension we are making progress what is required forspeech production is a much more detailed understanding ofthe circumstances under which speakers actually use reducedword forms The factors driving reduction may be in partcaptured by the semantic vectors but we anticipate thataspects of the unfolding discourse play an important role aswell which brings us back to the unresolved question of howto best model not isolated words but words in context

An important challenge for our theory which formalizesincremental implicit learning is how to address and modelthe wealth of explicit knowledge that speakers have abouttheir language We play with the way words sound in poetrywe are perplexed by wordsrsquo meanings and reinterpret themso that they make more sense (eg the folk etymology [180]reflected in modern English crawfish a crustacean and nota fish originally Middle English crevis compare ecrevissein French) we teach morphology in schools and we evenfind that making morphological relations explicit may bebeneficial for aphasic patients [181] The rich culture of worduse cannot but interact with the implicit system but howexactly this interaction unfolds andwhat its consequences areat both sides of the consciousunconscious divide is presentlyunclear

Although many questions remain our model is remark-ably successful in capturing many aspects of lexical andmorphological processing in both comprehension and pro-duction Since themodel builds on a combination of linguisti-callymotivated lsquosmartrsquo low-level representations for form andmeaning and large but otherwise straightforward andmathe-matically well-understood two-layer linear networks (to beclear networks without any hidden layers) the conclusionseems justified that algorithmically the ldquomental lexiconrdquomaybe much simpler than previously thought

Appendix

A Vectors Matrices and MatrixMultiplication

Figure 12 presents two data points 119886 and 119887 with coordinates(3 4) and (-2 -3) Arrows drawn from the origin to thesepoints are shown in blue with a solid and dashed linerespectively We refer to these points as vectors Vectors aredenoted with lower case letters in bold and we place the 119909coordinate of a point next to the 119910 coordinate

a = (3 4) b = (minus2 minus3) (A1)

32 Complexity

minus4 minus2 0 2 4

minus4

minus2

02

4

x

y

a

b

x

y

Figure 12 Points 119886 and 119887 can be transformed into points 119901 and 119902 bya linear transformation

We can represent a and b jointly by placing them next to eachother in a matrix For matrices we use capital letters in boldfont

A = ( 3 4minus2 minus3) (A2)

In addition to the points 119886 and 119887 Figure 12 also shows twoother datapoints 119909 and 119910 The matrix for these two points is

B = (minus5 24 minus1) (A3)

Let us assume that points 119886 and 119887 represent the forms oftwo words 1205961 and 1205962 and that the points 119909 and 119910 representthe meanings of these two words (below we discuss in detailhow exactly wordsrsquo forms and meanings can be representedas vectors of numbers) As morphology is the study of therelation between wordsrsquo forms and their meanings we areinterested in how to transform points 119886 and b into points119909 and 119910 and likewise in how to transform points 119909 and119910 back into points 119886 and 119887 The first transformation isrequired for comprehension and the second transformationfor production

We begin with the transformation for comprehensionFormally we have matrix A which we want to transforminto matrix B A way of mapping A onto B is by means ofa linear transformation using matrix multiplication For 2 times2 matrices matrix multiplication is defined as follows(1198861 11988621198871 1198872) (1199091 11990921199101 1199102) = (11988611199091 + 11988621199101 11988611199092 + 1198862119910211988711199091 + 11988721199101 11988711199092 + 11988721199102) (A4)

Such a mapping of A onto B is given by a matrix F

F = ( 1 2minus2 minus1) (A5)

and it is straightforward to verify that indeed( 3 4minus2 minus3) ( 1 2minus2 minus1) = (minus5 24 minus1) (A6)

Using matrix notation we can write

AF = B (A7)

Given A and B how do we obtain F The answeris straightforward but we need two further concepts theidentity matrix and the matrix inverse For multiplication ofnumbers the identity multiplication is multiplication with 1For matrices multiplication with the identity matrix I leavesthe locations of the points unchangedThe identitymatrix hasones on the main diagonal and zeros elsewhere It is easilyverified that indeed(1 00 1) (1199091 11990921199101 1199102) = (1199091 11990921199101 1199102) (1 00 1) = (1199091 11990921199101 1199102) (A8)

For numbers the inverse of multiplication with 119904 is dividingby 119904 (1119904 ) (119904119909) = 119909 (A9)

For matrices the inverse of a square matrix X is that matrixXminus1 such that their product is the identity matrix

Xminus1X = XXminus1 = I (A10)

For nonsquare matrices the inverse Yminus1 is defined such that

Yminus1 (YX) = (XY) 119884minus1 = X (A11)

We find the square matrix F that maps the square matrices Aonto B as follows

AF = B

Aminus1AF = Aminus1B

IF = Aminus1B

F = Aminus1B(A12)

Since for the present example A and its inverse happen to beidentical (Aminus1 = A) we obtain (A6)

Fmaps wordsrsquo form vectors onto wordsrsquo semantic vectorsLet us now consider the reverse and see how we can obtainwordsrsquo form vectors from wordsrsquo semantic vectors That isgiven B we want to find that matrix G which maps B ontoA ie

BG = A (A13)

Solving this equation proceeds exactly as above

BG = A

Bminus1BG = Bminus1A

IG = Bminus1A

G = Bminus1A(A14)

Complexity 33

The inverse of B is

Bminus1 = (13 2343 53) (A15)

and hence we have

(13 2343 53) ( 3 4minus2 minus3) = (minus13 minus2323 13 ) = G (A16)

The inverse of a matrix needs not exist A square matrixthat does not have an inverse is referred to as a singularmatrix Almost all matrices with which we work in theremainder of this study are singular For singular matricesan approximation of the inverse can be used such as theMoore-Penrose generalized inverse (In R the generalizedinverse is available in theMASS package function ginv Thenumeric library that is used by ginv is LAPACK availableat httpwwwnetliborglapack) In this study wedenote the generalized inverse of a matrix X by X1015840

The examples presented here are restricted to 2 times 2matrices but themathematics generalize tomatrices of largerdimensions When multiplying two matrices X and Y theonly constraint is that the number of columns ofX is the sameas the number of rows ofYThe resultingmatrix has the samenumber of rows as X and the same number of columns as Y

In the present study the rows of matrices represent wordsand columns the features of thesewords For instance the rowvectors of A can represent wordsrsquo form features and the rowvectors of and B the corresponding semantic features Thefirst row vector of A is mapped onto the first row vector ofB as follows (3 4) ( 1 2minus2 minus1) = (minus5 2) (A17)

A transformationmatrix such as F can be represented as atwo-layer networkThe network corresponding to F is shownin Figure 13 When the input vector (3 4) is presented to thenetwork the nodes 1198941 and 1198942 are set to 3 and 4 respectively Toobtain the activations 1199001 = minus5 and 1199002 = 2 for the output vector(minus5 2) we cumulate the evidence at the input weighted bythe connection strengths specified on the edges of the graphThus for 1199001 we obtain the value 3 times 1 + 4 times minus2 = minus5 and for1199002 we have 3 times 2 + 4 times minus1 = 2 In other words the networkmaps the input vector (3 4) onto the output vector (minus5 2)

An important property of linear maps is that they areproductive We can present a novel form vector say

s = (2 2) (A18)

to the network and it will map this vector onto a novelsemantic vector Using matrices this new semantic vector isstraightforwardly calculated(2 2) ( 1 2minus2 minus1) = (minus2 2) (A19)

1 2 minus2 minus1

i1 i2

o1 o2

semantic vectors come out here

form features go in here

connection weightsare the transformationmatrix

Figure 13 The linear network corresponding to the transformationmatrix F There are two input nodes 1198941 and 1198942 (in blue)and twooutput nodes 1199001 and 1199002 in red When the input vector (3 4) ispresented to the network the value at 1199001 is 3 times 1 + 4 times minus2 = minus5and that at 1199002 is 3 times 2 + 4 times minus1 = 2 The network maps the point(3 4) onto the point (minus5 2)B Graph-Based Triphone Sequencing

The hypothesis underlying graph-based triphone sequencingis that in the directed graph that has triphones as verticesand an edge between any two vertices that overlap (iesegments 2 and 3 of triphone 1 are identical to segments 1and 2 of triphone 2) the path from a wordrsquos initial triphone(the triphone starting with ) to a wordrsquos final triphone (thetriphone ending with a ) is the path receiving the bestsupport of all possible paths from the initial to the finaltriphone Unfortunately the directed graph containing allvertices and edges is too large tomake computations tractablein a reasonable amount of computation time The followingalgorithm is a heuristic algorithm that first collects potentiallyrelevant vertices and edges and then calculates all pathsstarting fromany initial vertex using theall simple pathsfunction from the igraph package [127] As a first step allvertices that are supported by the stem and whole wordwith network support exceeding 01 are selected To theresulting set those vertices are added that are supported bythe affix which is conditional on these vertices having anetwork support exceeding 095 A directed graph can nowbe constructed and for each initial vertex all paths startingat these vertices are calculated The subset of those pathsreaching a final vertex is selected and the support for eachpath is calculated As longer paths trivially will have greatersupport the support for a path is weighted for its lengthsimply by dividing the raw support by the path length Pathsare ranked by weighted path support and the path withmaximal support is selected as the acoustic image drivingarticulation

It is possible that no path from an initial vertex to a finalvertex is found due to critical boundary triphones not beinginstantiated in the data on which the model was trainedThis happens when the model is assessed on novel inflected

34 Complexity

nobo

dyso

met

hing

ever

ythi

ngno

thin

gev

eryo

nean

ythi

ngan

yone

som

eone

hers

him his

via

than I

exce

ptou

tside

thei

rsw

hoev

erth

ence

min

ein

side

amon

gst its

thei

rou

rm

yyo

uyo

ur her

she

your

selv

esw

hate

ver us we it

they he

them m

em

yself

him

self

hers

elfou

rselv

esyo

urse

lfits

elfth

emse

lves

abou

tw

ithou

tw

ithin

beyo

ndam

ong

betw

een

thro

ugho

utdu

ring

since by

from in of

with at fo

r tobe

low

abov

e up offdo

wn

upon on

agai

nst

tow

ard

tow

ards

arou

ndne

aral

ong

unde

rbe

hind

besid

eth

roug

hon

toin

toov

erac

ross

nobodysomethingeverythingnothingeveryoneanythinganyonesomeonehershimhisviathanIexceptoutsidetheirswhoeverthencemineinsideamongstitstheirourmyyouyourhersheyourselveswhateverusweittheyhethemmemyselfhimselfherselfourselvesyourselfitselfthemselvesaboutwithoutwithinbeyondamongbetweenthroughoutduringsincebyfrominofwithatfortobelowaboveupoffdownupononagainsttowardtowardsaroundnearalongunderbehindbesidethroughontointooveracross

minus01 0 01 02Value

Color Keyand Histogram

050

015

0025

00C

ount

Figure 14 Heatmap for the correlations of the semantic vectors of pronouns and prepositions Note that quantifier pronouns form a groupof their own that prepositions and the other pronouns form distinguisable groups and that within groups further subclusters are visible egforwe and us she and her and across and over All diagonal elements represent correlations equal to 1 (this is not brought out by the heatmapwhich color-codes diagonal elements with the color for the highest correlation of the range we specified 025)

and derived words under cross-validationTherefore verticesand edges are added to the graph for all nonfinal verticesthat are at the end of any of the paths starting at any initialtriphone Such novel vertices and edges are assigned zeronetwork support Typically paths from the initial vertex tothe final vertex can now be constructed and path support isevaluated as above

For an algorithm that allows triphone paths to includecycles which may be the case in languages with much richer

morphology than English see Baayen et al [85] and for codethe R packageWpmWithLdl described therein

C A Heatmap for Function Words(Pronouns and Prepositions)

The heatmap for the correlations of the semantic vectors ofpronouns and prepositions is presented in Figure 14

Complexity 35

Data Availability

With the exception of the data from theDistributed Little RedHen Lab all primary data are publicly available from the citedsources

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The authors are indebted to Geoff Hollis Jessie Nixon ChrisWestbury and LuisMienhardt for their constructive feedbackon earlier versions of this manuscript This research wassupported by an ERC advancedGrant (no 742545) to the firstauthor

References

[1] G Frege ldquoBegriffsschrift a formula language modeled uponthat of arithmetic for pure thoughtrdquo in From Frege to GodelA Source Book in Mathematical Logic 1879-1931 pp 1ndash82 1967

[2] B RussellAn Inquiry intoMeaning andTruth Allen andUnwinLondon UK 1942

[3] R Montague ldquoThe proper treatment of quantification in ordi-nary englishrdquo in Approaches to Natural Language pp 221ndash242Springer 1973

[4] N Chomsky Syntactic Structures vol 33 Mouton The Hague1957

[5] W J M Levelt Formal Grammars in Linguistics And Psycholin-guistics Volume 1 An Introduction to the Theory of FormalLanguages and Automata Volume 2 Applications in LinguisticTheory Volume 3 Psycholinguistic Applications John BenjaminsPublishing 2008

[6] D F Kleinschmidt and T Florian Jaeger ldquoRobust speechperception Recognize the familiar generalize to the similar andadapt to the novelrdquo Psychological Review vol 122 no 2 pp 148ndash203 2015

[7] L Breiman ldquoRandom forestsrdquo Machine Learning vol 45 no 1pp 5ndash32 2001

[8] C Strobl A-L Boulesteix A Zeileis and T Hothorn ldquoBiasin random forest variable importance measures illustrationssources and a solutionrdquo BMC Bioinformatics vol 8 2007

[9] A Hannun C Case J Casper et al ldquoDeep speechScaling up end-to-end speech recognitionrdquo 2014httpsarxivorgabs14125567

[10] Z Tuske P Golik R Schluter and H Ney ldquoAcoustic modelingwith deep neural networks using raw time signal for lvcsrrdquo inProceedings of the INTERSPEECH pp 890ndash894 2014

[11] J Schmidhuber ldquoDeep learning in neural networks anoverviewrdquo Neural Networks vol 61 pp 85ndash117 2015

[12] T Hannagan J C Ziegler S Dufau J Fagot and J GraingerldquoDeep learning of orthographic representations in baboonsrdquoPLoS ONE vol 9 no 1 2014

[13] J Grainger S Dufau M Montant J C Ziegler and J FagotldquoOrthographic processing in baboons (Papio papio)rdquo Sciencevol 336 no 6078 pp 245ndash248 2012

[14] D Scarf K Boy A U Reinert J Devine O Gunturkun andM Colombo ldquoOrthographic processing in pigeons (columba

livia)rdquo Proceedings of the National Academy of Sciences vol 113no 40 pp 11272ndash11276 2016

[15] M Linke F Broker M Ramscar and R H Baayen ldquoArebaboons learning ldquoorthographicrdquo representations Probablynotrdquo PLoS ONE vol 12 no 8 2017

[16] D H Hubel and T N Wiesel ldquoReceptive fields binocularinteraction and functional architecture in the catrsquos visualcortexrdquo The Journal of Physiology vol 160 pp 106ndash154 1962

[17] R H Baayen and P Hendrix ldquoTwo-layer networks non-linear separation and human learningrdquo in From Semantics toDialectometry Festschrift in honor of JohnNerbonne Tributes 32M Wieling M Kroon G Van Noord and G Bouma Eds pp13ndash22 College Publications 2017

[18] D Arnold F Tomaschek K Sering F Lopez and R H BaayenldquoWords from spontaneous conversational speech can be rec-ognized with human-like accuracy by an error-driven learningalgorithm that discriminates between meanings straight fromsmart acoustic features bypassing the phoneme as recognitionunitrdquo PLoS ONE vol 12 no 4 2017

[19] P Zwitserlood ldquoProcessing and representation of morphologi-cal complexity in native language comprehension and produc-tionrdquo in The Construction of Words G E Booij Ed pp 583ndash602 Springer 2018

[20] G Booij ldquoConstruction morphologyrdquo Language and LinguisticsCompass vol 4 no 7 pp 543ndash555 2010

[21] J P Blevins ldquoWord-based morphologyrdquo Journal of Linguisticsvol 42 no 3 pp 531ndash573 2006

[22] G Stump Inflectional Morphology A Theory of ParadigmStructure Cambridge University Press 2001

[23] R Beard ldquoOn the extent and nature of irregularity in thelexiconrdquo Lingua vol 42 no 4 pp 305ndash341 1977

[24] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1974

[25] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1991

[26] C Hockett ldquoThe origin of speechrdquo Scientific American vol 203pp 89ndash97 1960

[27] M Erelt Estonian Language M Erelt Ed Estonian AcademyPublishers Tallinn Estonia 2003

[28] P H Matthews Grammatical Theory in the United States fromBloomfield to Chomsky Cambridge University Press Cam-bridge 1993

[29] N Chomsky andMHalleTheSound Pattern of English Harperand Row NY USA 1968

[30] T Landauer and S Dumais ldquoA solution to Platorsquos problem Thelatent semantic analysis theory of acquisition induction andrepresentation of knowledgerdquo Psychological Review vol 104 no2 pp 211ndash240 1997

[31] W Weaver ldquoTranslationrdquo in Machine Translation of LanguagesFourteen Essays W N Locke and A D Booth Eds pp 15ndash23MIT Press Cambridge 1955

[32] J R Firth Selected Papers of J R Firth 1952-59 IndianaUniversity Press 1968

[33] K Lund and C Burgess ldquoProducing high-dimensional seman-tic spaces from lexical co-occurrencerdquo Behavior Research Meth-ods Instruments and Computers vol 28 no 2 pp 203ndash2081996

[34] C Shaoul and C Westbury ldquoExploring lexical co-occurrencespace using HiDExrdquo Behavior Research Methods vol 42 no 2pp 393ndash413 2010

36 Complexity

[35] T Mikolov I Sutskever K Chen G Corrado and J DeanldquoDistributed representations of words and phrases and theircompositionalityrdquo inAdvances in Neural Information ProcessingSystems pp 3111ndash3119 2013

[36] A Roelofs ldquoThe WEAVER model of word-form encoding inspeech productionrdquo Cognition vol 64 no 3 pp 249ndash284 1997

[37] W J M Levelt A Roelofs and A S Meyer ldquoA theory of lexicalaccess in speech productionrdquoBehavioral and Brain Sciences vol22 no 1 pp 1ndash38 1999

[38] M Taft ldquoInteractive-activation as a Framework for Under-standing Morphological Processingrdquo Language and CognitiveProcesses vol 9 no 3 pp 271ndash294 1994

[39] R Schreuder and R H Baayen ldquoModeling morphologicalprocessingrdquo inMorphological Aspects of Language Processing LB Feldman Ed pp 131ndash154 Lawrence ErlbaumHillsdale NewJersey USA 1995

[40] B C Love D L Medin and T M Gureckis ldquoSUSTAIN ANetwork Model of Category Learningrdquo Psychological Reviewvol 111 no 2 pp 309ndash332 2004

[41] C J Marsolek ldquoWhat antipriming reveals about primingrdquoTrends in Cognitive Sciences vol 12 no 5 pp 176ndash181 2008

[42] M Ramscar and R Port ldquoCategorization (without categories)rdquoin Handbook of Cognitive Linguistics E Dabrowska and DDivjak Eds pp 75ndash99 De Gruyter Berlin Germany 2015

[43] R A Rescorla and A R Wagner ldquoA theory of Pavlovianconditioning Variations in the effectiveness of reinforcementand nonreinforcementrdquo in Classical Conditioning II CurrentResearch and Theory A H Black and W F Prokasy Eds pp64ndash99 Appleton Century Crofts NY USA 1972

[44] B Widrow and M E Hoff ldquoAdaptive switching circuitsrdquo 1960WESCON Convention Record Part IV pp 96ndash104 1960

[45] R H Baayen R Schreuder and R Sproat ldquoMorphology inthe Mental Lexicon A Computational Model for Visual WordRecognitionrdquo in Lexicon Development for Speech and LanguageProcessing F van Eynde and D Gibbon Eds pp 267ndash291Kluwer Academic Publishers 2000

[46] M Taft ldquoA morphological-decomposition model of lexicalrepresentationrdquo Linguistics vol 26 no 4 pp 657ndash667 1988

[47] MWHarmandM S Seidenberg ldquoComputing themeanings ofwords in reading Cooperative division of labor between visualand phonological processesrdquo Psychological Review vol 111 no3 pp 662ndash720 2004

[48] P C Trimmer J M McNamara A Houston and J A RMarshall ldquoDoes natural selection favour the Rescorla-Wagnerrulerdquo Journal of Theoretical Biology vol 302 pp 39ndash52 2012

[49] R R Miller R C Barnet and N J Grahame ldquoAssessment ofthe Rescorla-Wagner modelrdquo Psychological Bulletin vol 117 no3 pp 363ndash386 1995

[50] L J Kamin ldquoPredictability surprise attention and condition-ingrdquo in Punishment and Aversive Behavior B A Campbell andRMChurch Eds pp 276ndash296Appleton-Century-CroftsNYUSA 1969

[51] R A Rescorla ldquoPavlovian conditioning Itrsquos not what you thinkit isrdquo American Psychologist vol 43 no 3 pp 151ndash160 1988

[52] M Ramscar D Yarlett M Dye K Denny and K ThorpeldquoThe Effects of Feature-Label-Order and Their Implications forSymbolic Learningrdquo Cognitive Science vol 34 no 6 pp 909ndash957 2010

[53] M Ramscar and D Yarlett ldquoLinguistic self-correction in theabsence of feedback A new approach to the logical problem oflanguage acquisitionrdquo Cognitive Science vol 31 no 6 pp 927ndash960 2007

[54] R H Baayen P Milin D F ETHurđevic P Hendrix and MMarelli ldquoAn Amorphous Model for Morphological Process-ing in Visual Comprehension Based on Naive DiscriminativeLearningrdquo Psychological Review vol 118 no 3 pp 438ndash4812011

[55] F M Del Prado Martın R Bertram T Haikio R Schreuderand R H Baayen ldquoMorphological family size in a morphologi-cally rich languageThe case of Finnish compared to Dutch andHebrewrdquo Journal of Experimental Psychology LearningMemoryand Cognition vol 30 pp 1271ndash1278 2004

[56] P Milin D F Durdevic and F M Del Prado Martın ldquoThesimultaneous effects of inflectional paradigms and classes onlexical recognition Evidence from Serbianrdquo Journal of Memoryand Language vol 60 no 1 pp 50ndash64 2009

[57] B K Bergen ldquoThe psychological reality of phonaesthemesrdquoLanguage vol 80 no 2 pp 290ndash311 2004

[58] P Milin L B Feldman M Ramscar P Hendrix and R HBaayen ldquoDiscrimination in lexical decisionrdquo PLoS ONE vol 12no 2 2017

[59] R H Baayen P Milin and M Ramscar ldquoFrequency in lexicalprocessingrdquo Aphasiology vol 30 no 11 pp 1174ndash1220 2016

[60] M Marelli and M Baroni ldquoAffixation in semantic space Mod-eling morpheme meanings with compositional distributionalsemanticsrdquo Psychological Review vol 122 no 3 pp 485ndash5152015

[61] T Sering PMilin andRH Baayen ldquoLanguage comprehensionas amultiple label classification problemrdquo StatisticaNeerlandicapp 1ndash15 2018

[62] R Kaye and RWilson Linear Algebra Oxford University Press1998

[63] S H Ivens and B L Koslin Demands for Reading LiteracyRequire New Accountability Methods Touchstone Applied Sci-ence Associates 1991

[64] T K Landauer P W Foltz and D Laham ldquoAn introduction tolatent semantic analysisrdquoDiscourse Processes vol 25 no 2-3 pp259ndash284 1998

[65] H Schmid ldquoImprovements in Part-of-Speech Tagging with anApplication to Germanrdquo in Proceedings of the ACL SIGDAT-Workshop pp 13ndash25 Dublin Ireland 1995

[66] R H Baayen R Piepenbrock and L Gulikers The CELEXLexical Database (CD-ROM) Linguistic Data ConsortiumUniversity of Pennsylvania Philadelphia PA 1995

[67] J Mitchell and M Lapata ldquoVector-based models of semanticcompositionrdquo in Proceedings of the ACL pp 236ndash244 2008

[68] A Lazaridou M Marelli R Zamparelli and M BaronildquoCompositionally derived representations of morphologicallycomplex words in distributional semanticsrdquo in Proceedings ofthe ACL (1) pp 1517ndash1526 2013

[69] R Cotterell H Schutze and J Eisner ldquoMorphological smooth-ing and extrapolation of word embeddingsrdquo in Proceedings ofthe 54th Annual Meeting of the Association for ComputationalLinguistics vol 1 pp 1651ndash1660 2016

[70] B D Zeller S Pado and J Snajder ldquoTowards semantic valida-tion of a derivational lexiconrdquo in Proceedings of the COLING2014 the 25th International Conference on Computational Lin-guistics Technical Papers pp 1728ndash1739 2014

[71] H Marchand The Categories and Types of Present-Day EnglishWord Formation A Synchronic-Diachronic Approach BeckrsquoscheVerlagsbuchhandlung Munchen Germany 1969

[72] L Bauer EnglishWord-Formation Cambridge University PressCambridge 1983

Complexity 37

[73] I PlagWord-Formation in English CambridgeUniversity PressCambridge 2003

[74] R Beard Lexeme-Morpheme Base Morphology A GeneralTheory of Inflection and Word Formation State University ofNew York Press Albany NY USA 1995

[75] K Geeraert J Newman and R H Baayen ldquoIdiom VariationExperimental Data and a Blueprint of a Computational ModelrdquoTopics in Cognitive Science vol 9 no 3 pp 653ndash669 2017

[76] C Shaoul S Bitschau N Schilling et al ldquondl2 Naive discrim-inative learning an implementation in R R packagerdquo 2015

[77] P Milin D Divjak and R H Baayen ldquoA learning perspectiveon individual differences in skilled reading Exploring andexploiting orthographic and semantic discrimination cuesrdquoJournal of Experimental Psychology Learning Memory andCognition vol 43 no 11 pp 1730ndash1751 2017

[78] M Ramscar P Hendrix C Shaoul P Milin and R H BaayenldquoNonlinear dynamics of lifelong learning the myth of cognitivedeclinerdquo Topics in Cognitive Science vol 6 pp 5ndash42 2014

[79] M Ramscar C C Sun P Hendrix and R H Baayen ldquoTheMis-measurement of Mind Life-Span Changes in Paired-Associate-Learning Scores Reflect the ldquoCostrdquo of Learning Not CognitiveDeclinerdquo Psychological Science vol 28 no 8 pp 1171ndash1179 2017

[80] G desRosiers and D Ivison ldquoPaired associate learning Form 1and form 2 of the Wechsler memory scalerdquo Archives of ClinicalNeuropsychology vol 3 no 1 pp 47ndash67 1988

[81] E Bruni N K Tran andM Baroni ldquoMultimodal distributionalsemanticsrdquo Journal of Artificial Intelligence Research vol 49 pp1ndash47 2014

[82] W N Venables and B D RipleyModern Applied Statistics withS-PLUS Springer NY USA 2002

[83] A B Warriner V Kuperman and M Brysbaert ldquoNorms ofvalence arousal and dominance for 13915 English lemmasrdquoBehavior Research Methods vol 45 no 4 pp 1191ndash1207 2013

[84] D Kastovsky ldquoProductivity in word formationrdquo Linguistics vol24 no 3 pp 585ndash600 1986

[85] R H Baayen Y Chuang and J P Blevins ldquoInflectionalmorphology with linear mappingsrdquoTheMental Lexicon vol 13no 2 pp 232ndash270 2018

[86] Burnage CELEX A Guide for Users Centre for Lexical Infor-mation Nijmegen The Netherlands 1988

[87] C McBride-Chang ldquoModels of Speech Perception and Phono-logical Processing in Readingrdquo Child Development vol 67 no4 pp 1836ndash1856 1996

[88] G C Van Orden H Kloos et al ldquoThe question of phonologyand readingrdquo inThe Science of Reading A Handbook pp 61ndash782005

[89] D Jared and K OrsquoDonnell ldquoSkilled adult readers activate themeanings of high-frequency words using phonology Evidencefrom eye trackingrdquoMemoryampCognition vol 45 no 2 pp 334ndash346 2017

[90] G S Dell ldquoA Spreading-Activation Theory of Retrieval inSentence Productionrdquo Psychological Review vol 93 no 3 pp283ndash321 1986

[91] A Marantz ldquoNo escape from morphemes in morphologicalprocessingrdquo Language and Cognitive Processes vol 28 no 7 pp905ndash916 2013

[92] M Bozic W D Marslen-Wilson E A Stamatakis M HDavis and L K Tyler ldquoDifferentiating morphology formand meaning Neural correlates of morphological complexityrdquoCognitive Neuroscience vol 19 no 9 pp 1464ndash1475 2007

[93] R F Port andA P Leary ldquoAgainst formal phonologyrdquoLanguagevol 81 no 4 pp 927ndash964 2005

[94] J P Blevins ldquoStems and paradigmsrdquo Language vol 79 no 4 pp737ndash767 2003

[95] R Forsyth and D Holmes ldquoFeature-finding for test classifica-tionrdquo Literary and Linguistic Computing vol 11 no 4 pp 163ndash174 1996

[96] H Pham and R H Baayen ldquoVietnamese compounds showan anti-frequency effect in visual lexical decisionrdquo LanguageCognition and Neuroscience vol 30 no 9 pp 1077ndash1095 2015

[97] L Cohen and S Dehaene ldquoVentral and dorsal contributions toword readingrdquo inTheCognitive Neurosciences M S GazzanigaEd pp 789ndash804 Massachusetts Institute of Technology 2009

[98] E Keuleers P Lacey K Rastle and M Brysbaert ldquoThe BritishLexicon Project Lexical decision data for 28730 monosyllabicand disyllabic English wordsrdquo Behavior Research Methods vol44 no 1 pp 287ndash304 2012

[99] R K Wagner J K Torgesen and C A Rashotte ldquoDevel-opment of reading-related phonological processing abilitiesnew evidence of bidirectional causality from a latent variablelongitudinal studyrdquoDevelopmental Psychology vol 30 no 1 pp73ndash87 1994

[100] K F EWong andH-C Chen ldquoOrthographic and phonologicalprocessing in reading Chinese text Evidence from eye fixa-tionsrdquo Language and Cognitive Processes vol 14 no 5-6 pp461ndash480 1999

[101] R L Newman D Jared and C A Haigh ldquoDoes phonologyplay a role when skilled readers read high-frequency wordsEvidence from ERPsrdquo Language and Cognitive Processes vol 27no 9 pp 1361ndash1384 2012

[102] D Jared J Ashby S J Agauas and B A Levy ldquoPhonologicalactivation of word meanings in grade 5 readersrdquo Journal ofExperimental Psychology LearningMemory andCognition vol42 no 4 pp 524ndash541 2016

[103] T Bitan A Kaftory A Meiri-Leib Z Eviatar and O PelegldquoPhonological ambiguity modulates resolution of semanticambiguity during reading An fMRI study of Hebrewrdquo Neu-ropsychology vol 31 no 7 pp 759ndash777 2017

[104] D Jared and S Bainbridge ldquoReading homophone puns Evi-dence from eye trackingrdquo Canadian Journal of ExperimentalPsychology vol 71 no 1 pp 2ndash13 2017

[105] S Amenta M Marelli and S Sulpizio ldquoFrom sound tomeaning Phonology-to-Semantics mapping in visual wordrecognitionrdquo Psychonomic Bulletin amp Review vol 24 no 3 pp887ndash893 2017

[106] M Perrone-Bertolotti J Kujala J R Vidal et al ldquoHow silent issilent reading intracerebral evidence for top-down activationof temporal voice areas during readingrdquo The Journal of Neuro-science vol 32 no 49 pp 17554ndash17562 2012

[107] B Yao P Belin andC Scheepers ldquoSilent reading of direct versusindirect speech activates voice-selective areas in the auditorycortexrdquo Journal of Cognitive Neuroscience vol 23 no 10 pp3146ndash3152 2011

[108] M Coltheart B Curtis P Atkins and M Haller ldquoMod-els of Reading Aloud Dual-Route and Parallel-Distributed-Processing Approachesrdquo Psychological Review vol 100 no 4pp 589ndash608 1993

[109] M Coltheart ldquoModeling reading The dual-route approachrdquo inThe Science of Reading A Handbook pp 6ndash23 2005

38 Complexity

[110] M Pitt K Johnson E Hume S Kiesling and W RaymondldquoThe Buckeye corpus of conversational speech Labeling con-ventions and a test of transcriber reliabilityrdquo Speech Communi-cation vol 45 no 1 pp 89ndash95 2005

[111] K Johnson ldquoMassive reduction in conversational AmericanEnglishrdquo in Proceedings of the Spontaneous Speech Data andAnalysis Proceedings of the 1st Session of the 10th InternationalSymposium pp 29ndash54 The National International Institute forJapanese Language Tokyo Japan 2004

[112] C Phillips ldquoLevels of representation in the electrophysiology ofspeech perceptionrdquo Cognitive Science vol 25 no 5 pp 711ndash7312001

[113] R L Diehl A J Lotto and L L Holt ldquoSpeech perceptionrdquoAnnual Review of Psychology vol 55 pp 149ndash179 2004

[114] D Norris and J M McQueen ldquoShortlist B a Bayesian model ofcontinuous speech recognitionrdquo Psychological Review vol 115no 2 pp 357ndash395 2008

[115] S Hawkins ldquoRoles and representations of systematic finephonetic detail in speech understandingrdquo Journal of Phoneticsvol 31 no 3-4 pp 373ndash405 2003

[116] CCucchiarini andH Strik ldquoAutomatic Phonetic TranscriptionAn overviewrdquo in Proceedings of the 15th ICPhS pp 347ndash350Barcelona Spain 2003

[117] K JohnsonTheAuditoryPerceptual Basis for Speech Segmenta-tion Ohio State University Working Papers in Linguistics 1997

[118] H Fletcher ldquoAuditory patternsrdquo Reviews of Modern Physics vol12 no 1 pp 47ndash65 1940

[119] D Arnold ldquoAcousticndlcoder Coding sound files for use withndl R package version 101rdquo 2017

[120] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2016

[121] J Pickett and I Pollack ldquoIntelligibility of excerpts from fluentspeech Effects of rate of utterance and duration of excerptrdquoLanguage and Speech vol 6 pp 151ndash164 1963

[122] L Shockey ldquoPerception of reduced forms by non-native speak-ers of Englishrdquo in Sound Patterns of Spontaneous Speech DDuez Ed pp 97ndash100 ESCA Aix 1998

[123] M Ernestus H Baayen and R Schreuder ldquoThe recognition ofreduced word formsrdquo Brain and Language vol 81 no 1-3 pp162ndash173 2002

[124] R H Baayen C Shaoul J Willits and M Ramscar ldquoCom-prehension without segmentation a proof of concept withnaive discriminative learningrdquo Language Cognition and Neu-roscience vol 31 no 1 pp 106ndash128 2016

[125] C P Browman and L Goldstein ldquoArticulatory phonology Anoverviewrdquo Phonetica vol 49 no 3-4 pp 155ndash180 1992

[126] J A Tourville and F H Guenther ldquoThe DIVA model A neuraltheory of speech acquisition and productionrdquo Language andCognitive Processes vol 26 no 7 pp 952ndash981 2011

[127] G Csardi and T Nepusz ldquoThe igraph software package forcomplex network researchrdquo InterJournal Complex Systems vol1695 no 5 pp 1ndash9 2006

[128] M Ernestus and R H Baayen ldquoPredicting the unpredictableInterpreting neutralized segments in Dutchrdquo Language vol 79no 1 pp 5ndash38 2003

[129] R Weingarten G Nottbusch and U Will ldquoMorphemes sylla-bles and graphemes in written word productionrdquo inMultidisci-plinary Approaches to Speech Production T Pechmann and CHabel Eds pp 529ndash572 Mouton de Gruyter Berlin Germany2004

[130] R Bertram F E Toslashnnessen S Stromqvist J Hyona andP Niemi ldquoCascaded processing in written compound wordproductionrdquo Frontiers in Human Neuroscience vol 9 pp 1ndash102015

[131] T Cho ldquoEffects of morpheme boundaries on intergesturaltiming Evidence from Koreanrdquo Phonetica vol 58 no 3 pp129ndash162 2001

[132] S N Wood Generalized Additive Models Chapman ampHallCRC NY USA 2017

[133] M Seidenberg ldquoSublexical structures in visual word recogni-tion Access units or orthographic redundancyrdquo in Attentionand Performance XII M Coltheart Ed pp 245ndash264 LawrenceErlbaum Associates Hove 1987

[134] J M McQueen ldquoSegmentation of continuous speech usingphonotacticsrdquo Journal of Memory and Language vol 39 no 1pp 21ndash46 1998

[135] J B Hay Causes and Consequences of Word Structure Rout-ledge NY USA London UK 2003

[136] J B Hay ldquoFrom speech perception to morphology Affix-ordering revisitedrdquo Language vol 78 pp 527ndash555 2002

[137] J B Hay and R H Baayen ldquoPhonotactics parsing and produc-tivityrdquo Italian Journal of Linguistics vol 15 no 1 pp 99ndash1302003

[138] J Hay J Pierrehumbert and M Beckman ldquoSpeech perceptionwell-formedness and the statistics of the lexiconrdquo Papers inlaboratory phonology VI pp 58ndash74 2004

[139] M Ferro C Marzi and V Pirrelli ldquoA self-organizing modelof word storage and processing implications for morphologylearningrdquo Lingue e Linguaggio vol 10 no 2 pp 209ndash226 2011

[140] F ChersiM Ferro G Pezzulo andV Pirrelli ldquoTopological self-organization and prediction learning support both action andlexical chains in the brainrdquo Topics in Cognitive Science vol 6no 3 pp 476ndash491 2014

[141] V Pirrelli M Ferro and C Marzi ldquoComputational complexityof abstractive morphologyrdquo in Understanding and MeasuringMorphological Complexity M Bearman D Brown and G GCorbett Eds pp 141ndash166 Oxford University Press 2015

[142] C Marzi M Ferro and V Pirrelli ldquoIs inflectional irregularitydysfunctional to human processingrdquo in Abstract Booklet TheMental Lexicon 2018 V Kuperman Ed University of AlbertaEdmonton 2018

[143] E Keuleers M Stevens P Mandera and M Brysbaert ldquoWordknowledge in the crowd Measuring vocabulary size and wordprevalence in a massive online experimentrdquo The QuarterlyJournal of Experimental Psychology vol 68 no 8 pp 1665ndash16922015

[144] M Ernestus Voice Assimilation And Segment Reduction inCasual Dutch A Corpus-Based Study of the Phonology-PhoneticsInterface LOT Utrecht the Netherlands 2000

[145] R Kemps M Ernestus R Schreuder and H Baayen ldquoProcess-ing reducedword formsThe suffix restoration effectrdquoBrain andLanguage vol 90 no 1-3 pp 117ndash127 2004

[146] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) vol 1 pp 886ndash893 June 2005

[147] G Hickok ldquoThe architecture of speech production and the roleof the phoneme in speech processingrdquo Language Cognition andNeuroscience vol 29 no 1 pp 2ndash20 2014

[148] B V Tucker M Sims and R H Baayen Opposing Forceson Acoustic Duration Manuscript University of Alberta andUniversity of Tubingen 2018

Complexity 39

[149] I Plag J Homann and G Kunter ldquoHomophony and mor-phology The acoustics of word-final S in Englishrdquo Journal ofLinguistics vol 53 no 1 pp 181ndash216 2017

[150] F Tomaschek I Plag M Ernestus and R H BaayenModeling the Duration of Word-Final s in English withNaive Discriminative Learning Manuscript University ofSiegenTubingenNijmegen 2018

[151] R E Kalman ldquoA new approach to linear filtering and predictionproblemsrdquo Journal of Fluids Engineering vol 82 no 1 pp 35ndash451960

[152] B Russell ldquoOn denotingrdquo Mind vol 14 no 56 pp 479ndash4931905

[153] N Hornstein Logical Form FromGB toMinimalism Blackwell1995

[154] R Jackendoff Semantic StructuresMITPress Cambridge 1990[155] J J McCarthy ldquoA prosodic theory of non-concatenative mor-

phologyrdquo Linguistic Inquiry vol 12 pp 373ndash418 1981[156] A Ussishkin ldquoA Fixed Prosodic Theory of Nonconcatenative

Templaticmorphologyrdquo Natural Language amp Linguistic Theoryvol 23 no 1 pp 169ndash218 2005

[157] A Ussishkin ldquoAffix-favored contrast inequity and psycholin-guistic grounding for non-concatenative morphologyrdquo Mor-phology vol 16 no 1 pp 107ndash125 2006

[158] RW Young andWMorganTheNavajo Language A Grammarand Colloquial Dictionary University of New Mexico Press1980

[159] S C Levinson and A Majid ldquoThe island of time Yelı Dnye thelanguage of Rossel islandrdquo Frontiers in psychology vol 4 2013

[160] S C Levinson ldquoThe language of space in Yelı Dnyerdquo inGrammars of Space Explorations in Cognitive Diversity pp 157ndash203 Cambridge University Press 2006

[161] E Shafaei-Bajestan and R H Baayen ldquoWide Learning forAuditory Comprehensionrdquo in Proceedings of the Interspeech2018 2018

[162] J Hay and K Drager ldquoStuffed toys and speech perceptionrdquoLinguistics vol 48 no 4 pp 865ndash892 2010

[163] V Henri and C Henri ldquoOn our earliest recollections ofchildhoodrdquo Psychological Review vol 2 pp 215-216 1895

[164] S Freud ldquoChildhood and concealing memoriesrdquo in The BasicWritings of Sigmund Freud The Modern Library NY USA19051953

[165] P J Bauer and M Larkina ldquoThe onset of childhood amnesiain childhood A prospective investigation of the course anddeterminants of forgetting of early-life eventsrdquoMemory vol 22no 8 pp 907ndash924 2014

[166] M S Seidenberg and L M Gonnerman ldquoExplaining deriva-tional morphology as the convergence of codesrdquo Trends inCognitive Sciences vol 4 no 9 pp 353ndash361 2000

[167] M Coltheart K Rastle C Perry R Langdon and J ZieglerldquoDRC a dual route cascaded model of visual word recognitionand reading aloudrdquoPsychological Review vol 108 no 1 pp 204ndash256 2001

[168] M V Butz and E F Kutter How the Mind Comes into BeingIntroducing Cognitive Science from a Functional and Computa-tional Perspective Oxford University Press 2016

[169] M W Harm and M S Seidenberg ldquoPhonology readingacquisition and dyslexia Insights from connectionist modelsrdquoPsychological Review vol 106 no 3 pp 491ndash528 1999

[170] L M Gonnerman M S Seidenberg and E S AndersenldquoGraded semantic and phonological similarity effects in prim-ing Evidence for a distributed connectionist approach to

morphologyrdquo Journal of Experimental Psychology General vol136 no 2 pp 323ndash345 2007

[171] H Zhao O Gallo I Frosio and J Kautz ldquoLoss Functions forImage Restoration With Neural Networksrdquo IEEE Transactionson Computational Imaging vol 3 no 1 pp 47ndash57 2017

[172] C Clarke and P Luce ldquoPerceptual adaptation to speakercharacteristics Vot boundaries in stop voicing categorizationrdquoin Proceedings of the ISCA Workshop on Plasticity in SpeechPerception 2005

[173] DNorris JMMcQueen andACutler ldquoPerceptual learning inspeechrdquo Cognitive Psychology vol 47 no 2 pp 204ndash238 2003

[174] A Schweitzer and N Lewandowski ldquoSocial factors in conver-gence of f1 and f2 in spontaneous speechrdquo in Proceedings ofthe 10th International Seminar on Speech Production CologneGermany 2014

[175] M J Pickering and S Garrod ldquoToward a mechanistic psychol-ogy of dialoguerdquo Behavioral and Brain Sciences vol 27 no 2 pp169ndash190 2004

[176] C M Clarke and M F Garrett ldquoRapid adaptation to foreign-accented Englishrdquo The Journal of the Acoustical Society ofAmerica vol 116 no 6 pp 3647ndash3658 2004

[177] W J Levelt ldquoDie konnektionistische moderdquo Sprache UndKognition vol 10 no 2 pp 61ndash72 1991

[178] A Kostic ldquoInformational load constraints on processinginflected morphologyrdquo in Morphological Aspects of LanguageProcessing L B Feldman Ed Lawrence Erlbaum Inc Publish-ers New Jersey USA 1995

[179] J P BlevinsWord andParadigmMorphology OxfordUniversityPress 2016

[180] E Forstemann Uber Deutsche volksetymologie Zeitschrift furvergleichende Sprachforschung auf dem Gebiete des DeutschenGriechischen und Lateinischen 1852

[181] K Nault Morphological Therapy Protocal [Ph D thesis] Uni-versity of Alberta Edmonton 2010

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 10: The Discriminative Lexicon: A Unified Computational Model ...downloads.hindawi.com/journals/complexity/2019/4895891.pdf · as developed within computational linguistics and com-puter

10 Complexity

SGPR

ESEN

TPE

RSIS

TEN

CE PLCO

NTI

NU

OU

SPA

ST LYG

ENPE

RSO

N3

PASS

IVE

CAN

FUTU

REU

ND

OU

ND

ERO

VER MIS

ISH

SELF IZ

EIN

STRU

MEN

TD

IS IST

OU

GH

T EE ISM

SUB

COM

PARA

TIV

ESU

PERL

ATIV

EPE

RFEC

TIV

EPE

RSO

N1

ION

ENCE IT

YN

ESS Y

SHA

LLO

RDIN

AL

OU

TAG

AIN

FUL

LESS

ABL

EAG

ENT

MEN

T ICO

US

NO

TIV

E

SGPRESENTPERSISTENCEPLCONTINUOUSPASTLYGENPERSON3PASSIVECANFUTUREUNDOUNDEROVERMISISHSELFIZEINSTRUMENTDISISTOUGHTEEISMSUBCOMPARATIVESUPERLATIVEPERFECTIVEPERSON1IONENCEITYNESSYSHALLORDINALOUTAGAINFULLESSABLEAGENTMENTICOUSNOTIVE

minus02 0 01 02Value

020

060

010

00Color Key

and Histogram

Cou

nt

Figure 2 Heatmap for the correlation matrix of the row vectors in S of derivational and inflectional function lexomes (individual words willbecome legible by zooming in on the figure at maximal magnification)

10 permutations 0519ndash0537) From this we conclude thatderived words show more clustering in the semantic space ofS than can be expected under randomness

How are the semantic vectors of derivational lexomespositioned with respect to the clusters of their derived wordsTo address this question we constructed heatmaps for thecorrelation matrices of the pertinent semantic vectors Anexample of such a heatmap is presented for ness in Figure 3Apart from the existence of clusters within the cluster of nesscontent lexomes it is striking that the ness lexome itself isfound at the very left edge of the dendrogram and at the very

left column and bottom row of the heatmapThe color codingindicates that surprisingly the ness derivational lexome isnegatively correlated with almost all content lexomes thathave ness as formative Thus the semantic vector of nessis not a prototype at the center of its cloud of exemplarsbut an antiprototype This vector is close to the cloud ofsemantic vectors but it is outside its periphery This patternis not specific to ness but is found for the other derivationallexomes as well It is intrinsic to our model

The reason for this is straightforward During learningalthough for a derived wordrsquos lexome 119894 and its derivational

Complexity 11

NES

Ssle

eple

ssne

ssas

sert

iven

ess

dire

ctne

ssso

undn

ess

bign

ess

perm

issiv

enes

sap

prop

riate

ness

prom

ptne

sssh

rew

dnes

sw

eakn

ess

dark

ness

wild

erne

ssbl

ackn

ess

good

ness

kind

ness

lone

lines

sfo

rgiv

enes

ssti

llnes

sco

nsci

ousn

ess

happ

ines

ssic

knes

saw

aren

ess

blin

dnes

sbr

ight

ness

wild

ness

mad

ness

grea

tnes

sso

ftnes

sth

ickn

ess

tend

erne

ssbi

ttern

ess

effec

tiven

ess

hard

ness

empt

ines

sw

illin

gnes

sfo

olish

ness

eage

rnes

sne

atne

sspo

liten

ess

fresh

ness

high

ness

nerv

ousn

ess

wet

ness

unea

sines

slo

udne

ssco

olne

ssid

lene

ssdr

ynes

sbl

uene

ssda

mpn

ess

fond

ness

liken

ess

clean

lines

ssa

dnes

ssw

eetn

ess

read

ines

scle

vern

ess

noth

ingn

ess

cold

ness

unha

ppin

ess

serio

usne

ssal

ertn

ess

light

ness

uglin

ess

flatn

ess

aggr

essiv

enes

sse

lfminusco

nsci

ousn

ess

ruth

lessn

ess

quie

tnes

sva

guen

ess

shor

tnes

sbu

sines

sho

lines

sop

enne

ssfit

ness

chee

rful

ness

earn

estn

ess

right

eous

ness

unpl

easa

ntne

ssqu

ickn

ess

corr

ectn

ess

disti

nctiv

enes

sun

cons

ciou

snes

sfie

rcen

ess

orde

rline

ssw

hite

ness

talln

ess

heav

ines

sw

icke

dnes

sto

ughn

ess

lawle

ssne

ssla

zine

ssnu

mbn

ess

sore

ness

toge

ther

ness

resp

onsiv

enes

scle

arne

ssne

arne

ssde

afne

ssill

ness

dizz

ines

sse

lfish

ness

fatn

ess

rest

lessn

ess

shyn

ess

richn

ess

thor

ough

ness

care

less

ness

tired

ness

fairn

ess

wea

rines

stig

htne

ssge

ntle

ness

uniq

uene

ssfr

iend

lines

sva

stnes

sco

mpl

eten

ess

than

kful

ness

slow

ness

drun

kenn

ess

wre

tche

dnes

sfr

ankn

ess

exac

tnes

she

lples

snes

sat

trac

tiven

ess

fulln

ess

roug

hnes

sm

eann

ess

hope

lessn

ess

drow

sines

ssti

ffnes

sclo

sene

ssgl

adne

ssus

eful

ness

firm

ness

rude

ness

bold

ness

shar

pnes

sdu

llnes

sw

eigh

tless

ness

stra

ngen

ess

smoo

thne

ss

NESSsleeplessnessassertivenessdirectnesssoundnessbignesspermissivenessappropriatenesspromptnessshrewdnessweaknessdarknesswildernessblacknessgoodnesskindnesslonelinessforgivenessstillnessconsciousnesshappinesssicknessawarenessblindnessbrightnesswildnessmadnessgreatnesssoftnessthicknesstendernessbitternesseffectivenesshardnessemptinesswillingnessfoolishnesseagernessneatnesspolitenessfreshnesshighnessnervousnesswetnessuneasinessloudnesscoolnessidlenessdrynessbluenessdampnessfondnesslikenesscleanlinesssadnesssweetnessreadinessclevernessnothingnesscoldnessunhappinessseriousnessalertnesslightnessuglinessflatnessaggressivenessselfminusconsciousnessruthlessnessquietnessvaguenessshortnessbusinessholinessopennessfitnesscheerfulnessearnestnessrighteousnessunpleasantnessquicknesscorrectnessdistinctivenessunconsciousnessfiercenessorderlinesswhitenesstallnessheavinesswickednesstoughnesslawlessnesslazinessnumbnesssorenesstogethernessresponsivenessclearnessnearnessdeafnessillnessdizzinessselfishnessfatnessrestlessnessshynessrichnessthoroughnesscarelessnesstirednessfairnesswearinesstightnessgentlenessuniquenessfriendlinessvastnesscompletenessthankfulnessslownessdrunkennesswretchednessfranknessexactnesshelplessnessattractivenessfullnessroughnessmeannesshopelessnessdrowsinessstiffnessclosenessgladnessusefulnessfirmnessrudenessboldnesssharpnessdullnessweightlessnessstrangenesssmoothness

Color Keyand Histogram

0 05minus05

Value

020

0040

0060

0080

00C

ount

Figure 3 Heatmap for the correlation matrix of lexomes for content words with ness as well as the derivational lexome of ness itself Thislexome is found at the very left edge of the dendrograms and is negatively correlated with almost all content lexomes (Individual words willbecome legible by zooming in on the figure at maximal magnification)

lexome ness are co-present cues the derivational lexomeoccurs in many other words 119895 and each time anotherword 119895 is encountered weights are reduced from ness to119894 As this happens for all content lexomes the derivationallexome is during learning slowly but steadily discriminatedaway from its content lexomes We shall see that this is animportant property for our model to capture morphologicalproductivity for comprehension and speech production

When the additive model of Mitchell and Lapata [67]is used to construct a semantic vector for ness ie when

the average vector is computed for the vectors obtained bysubtracting the vector of the derived word from that of thebase word the result is a vector that is embedded insidethe cluster of derived vectors and hence inherits semanticidiosyncracies from all these derived words

324 Semantic Plausibility and Transparency Ratings forDerived Words In order to obtain further evidence for thevalidity of inflectional and derivational lexomes we re-analyzed the semantic plausibility judgements for word pairs

12 Complexity

consisting of a base and a novel derived form (eg accentaccentable) reported byMarelli and Baroni [60] and availableat httpcliccimecunitnitcomposesFRACSSThe number of pairs available to us for analysis 236 wasrestricted compared to the original 2559 pairs due to theconstraints that the base words had to occur in the tasacorpus Furthermore we also explored the role of wordsrsquoemotional valence arousal and dominance as availablein Warriner et al [83] which restricted the number ofitems even further The reason for doing so is that humanjudgements such as those of age of acquisition may reflectdimensions of emotion [59]

A measure derived from the semantic vectors of thederivational lexomes activation diversity and the L1-normof the semantic vector (the sum of the absolute values of thevector elements ie its city-block distance) turned out tobe predictive for these plausibility ratings as documented inTable 3 and the left panel of Figure 4 Activation diversity is ameasure of lexicality [58] In an auditory word identificationtask for instance speech that gives rise to a low activationdiversity elicits fast rejections whereas speech that generateshigh activation diversity elicits higher acceptance rates but atthe cost of longer response times [18]

Activation diversity interacted with word length Agreater word length had a strong positive effect on ratedplausibility but this effect progressively weakened as acti-vation diversity increases In turn activation diversity hada positive effect on plausibility for shorter words and anegative effect for longer words Apparently the evidencefor lexicality that comes with higher L1-norms contributespositively when there is little evidence coming from wordlength As word length increases and the relative contributionof the formative in theword decreases the greater uncertaintythat comes with higher lexicality (a greater L1-norm impliesmore strong links to many other lexomes) has a detrimentaleffect on the ratings Higher arousal scores also contributedto higher perceived plausibility Valence and dominance werenot predictive and were therefore left out of the specificationof themodel reported here (word frequency was not includedas predictor because all derived words are neologisms withzero frequency addition of base frequency did not improvemodel fit 119901 gt 091)

The degree to which a complex word is semanticallytransparent with respect to its base word is of both practi-cal and theoretical interest An evaluation of the semantictransparency of complex words using transformation matri-ces is developed in Marelli and Baroni [60] Within thepresent framework semantic transparency can be examinedstraightforwardly by comparing the correlations of (i) thesemantic vectors of the base word to which the semanticvector of the affix is added with (ii) the semantic vector ofthe derived word itself The more distant the two vectors arethe more negative their correlation should be This is exactlywhat we find For ness for instance the six most negativecorrelations are business (119903 = minus066) effectiveness (119903 = minus051)awareness (119903 = minus050) loneliness (119903 = minus045) sickness (119903 =minus044) and consciousness (119903 = minus043) Although ness canhave an anaphoric function in discourse [84] words such asbusiness and consciousness have a much deeper and richer

semantics than just reference to a previously mentioned stateof affairs A simple comparison of the wordrsquos actual semanticvector in S (derived from the TASA corpus) and its semanticvector obtained by accumulating evidence over base and affix(ie summing the vectors of base and affix) brings this outstraightforwardly

However evaluating correlations for transparency isprone to researcher bias We therefore also investigated towhat extent the semantic transparency ratings collected byLazaridou et al [68] can be predicted from the activationdiversity of the semantic vector of the derivational lexomeThe original dataset of Lazaridou and colleagues comprises900 words of which 330 meet the criterion of occurringmore than 8 times in the tasa corpus and for which wehave semantic vectors available The summary of a Gaussianlocation-scale additivemodel is given in Table 4 and the rightpanel of Figure 4 visualizes the interaction of word lengthby activation diversity Although modulated by word lengthoverall transparency ratings decrease as activation diversityis increased Apparently the stronger the connections aderivational lexome has with other lexomes the less clearit becomes what its actual semantic contribution to a novelderived word is In other words under semantic uncertaintytransparency ratings decrease

4 Comprehension

Now that we have established that the present semanticvectors make sense even though they are based on a smallcorpus we next consider a comprehension model that hasform vectors as input and semantic vectors as output (a pack-age for R implementing the comprehension and productionalgorithms of linear discriminative learning is available athttpwwwsfsuni-tuebingendesimhbaayenpublicationsWpmWithLdl 10targz The package isdescribed in Baayen et al [85]) We begin with introducingthe central concepts underlying mappings from form tomeaning We then discuss visual comprehension and thenturn to auditory comprehension

41 Setting Up the Mapping Let C denote the cue matrix amatrix that specifies for each word (rows) the form cues ofthat word (columns) For a toy lexicon with the words onetwo and three the Cmatrix is

C

= wV wVn Vn tu tu Tr Tri rionetwothree

( 100100

100010

010001

001001 ) (12)

where we use the disc keyboard phonetic alphabet (theldquoDIstinct SingleCharacterrdquo representationwith one characterfor each phoneme was introduced by CELEX [86]) fortriphones the symbol denotes the word boundary Supposethat the semantic vectors for these words are the row vectors

Complexity 13

26 27

28

29

29

3 3

31

31

32

32

33

33

34

34

35

35

3

3

35

4

45 36

38

40

42

44

Activ

atio

n D

iver

sity

of th

e Der

ivat

iona

l Lex

ome

8 10 12 14 16 186Word Length

36

38

40

42

44

Activ

atio

n D

iver

sity

of th

e Der

ivat

iona

l Lex

ome

6 8 10 12 144Word Length

Figure 4 Interaction of word length by activation diversity in GAMs fitted to plausibility ratings for complex words (left) and to semantictransparency ratings (right)

Table 3 GAM fitted to plausibility ratings for derivational neologisms with the derivational lexomes able again agent ist less and notdata from Marelli and Baroni [60]

A parametric coefficients Estimate Std Error t-value p-valueIntercept -18072 27560 -06557 05127Word Length 05901 02113 27931 00057Activation Diversity 11221 06747 16632 00976Arousal 03847 01521 25295 00121Word Length times Activation Diversity -01318 00500 -26369 00089B smooth terms edf Refdf F-value p-valueRandom Intercepts Affix 33610 40000 556723 lt 00001

of the following matrix S

S = one two threeonetwothree

( 100201031001

040110 ) (13)

We are interested in a transformation matrix 119865 such that

CF = S (14)

The transformation matrix is straightforward to obtain LetC1015840 denote the Moore-Penrose generalized inverse of Cavailable in R as the ginv function in theMASS package [82]Then

F = C1015840S (15)

For the present example

F =one two three

wVwVnVntutuTrTriri

((((((((((((

033033033010010003003003

010010010050050003003003

013013013005005033033033

))))))))))))

(16)

and for this simple example CF is exactly equal to SIn the remainder of this section we investigate how well

this very simple end-to-end model performs for visual wordrecognition as well as auditory word recognition For visualword recognition we use the semantic matrix S developed inSection 3 but we consider two different cue matrices C oneusing letter trigrams (following [58]) and one using phonetrigramsA comparison of the performance of the twomodels

14 Complexity

Table 4 Gaussian location-scale additive model fitted to the semantic transparency ratings for derived words te tensor product smooth ands thin plate regression spline

A parametric coefficients Estimate Std Error t-value p-valueintercept [location] 55016 02564 214537 lt 00001intercept [scale] -04135 00401 -103060 lt 00001B smooth terms edf Refdf F-value p-valuete(activation diversity word length) [location] 53879 62897 237798 00008s(derivational lexome) [location] 143180 150000 3806254 lt 00001s(activation diversity) [scale] 21282 25993 284336 lt 00001s(word length) [scale] 17722 21403 1327311 lt 00001

sheds light on the role of wordsrsquo ldquosound imagerdquo on wordrecognition in reading (cf [87ndash89] for phonological effectsin visual word Recognition) For auditory word recognitionwe make use of the acoustic features developed in Arnold etal [18]

Although the features that we selected as cues are basedon domain knowledge they do not have an ontological statusin our framework in contrast to units such as phonemes andmorphemes in standard linguistic theories In these theoriesphonemes and morphemes are Russelian atomic units offormal phonological and syntactic calculi and considerableresearch has been directed towards showing that these unitsare psychologically real For instance speech errors involvingmorphemes have been interpreted as clear evidence for theexistence in the mind of morphemes [90] Research hasbeen directed at finding behavioral and neural correlates ofphonemes and morphemes [6 91 92] ignoring fundamentalcriticisms of both the phoneme and the morpheme astheoretical constructs [25 93 94]

The features that we use for representing aspects of formare heuristic features that have been selected or developedprimarily because they work well as discriminators We willgladly exchange the present features for other features if theseother features can be shown to afford higher discriminativeaccuracy Nevertheless the features that we use are groundedin domain knowledge For instance the letter trigrams usedfor modeling reading (see eg[58]) are motivated by thefinding in stylometry that letter trigrams are outstandingfeatures for discriminating between authorial hands [95] (inwhat follows we work with letter triplets and triphoneswhich are basically contextually enriched letter and phoneunits For languages with strong phonotactic restrictionssuch that the syllable inventory is quite small (eg Viet-namese) digraphs work appear to work better than trigraphs[96] Baayen et al [85] show that working with four-gramsmay enhance performance and current work in progress onmany other languages shows that for highly inflecting lan-guages with long words 4-grams may outperform 3-gramsFor computational simplicity we have not experimented withmixtures of units of different length nor with algorithmswithwhich such units might be determined) Letter n-grams arealso posited by Cohen and Dehaene [97] for the visual wordform system at the higher endof a hierarchy of neurons tunedto increasingly large visual features of words

Triphones the units that we use for representing thelsquoacoustic imagersquo or lsquoauditory verbal imageryrsquo of canonicalword forms have the same discriminative potential as lettertriplets but have as additional advantage that they do betterjustice compared to phonemes to phonetic contextual inter-dependencies such as plosives being differentiated primarilyby formant transitions in adjacent vowels The acousticfeatures that we use for modeling auditory comprehensionto be introduced in more detail below are motivated in partby the sensitivity of specific areas on the cochlea to differentfrequencies in the speech signal

Evaluation of model predictions proceeds by comparingthe predicted semantic vector s obtained by multiplying anobserved cue vector c with the transformation matrix F (s =cF) with the corresponding target row vector s of S A word 119894is counted as correctly recognized when s119894 is most stronglycorrelated with the target semantic vector s119894 of all targetvectors s119895 across all words 119895

Inflected words do not have their own semantic vectors inS We therefore created semantic vectors for inflected wordsby adding the semantic vectors of stem and affix and addedthese as additional row vectors to S before calculating thetransformation matrix F

We note here that there is only one transformationmatrix ie one discrimination network that covers allaffixes inflectional and derivational as well as derived andmonolexomic (simple) words This approach contrasts withthat of Marelli and Baroni [60] who pair every affix with itsown transformation matrix

42 Visual Comprehension For visual comprehension wefirst consider a model straightforwardly mapping form vec-tors onto semantic vectors We then expand the model withan indirect route first mapping form vectors onto the vectorsfor the acoustic image (derived from wordsrsquo triphone repre-sentations) and thenmapping the acoustic image vectors ontothe semantic vectors

The dataset on which we evaluated our models com-prised 3987 monolexomic English words 6595 inflectedvariants of monolexomic words and 898 derived words withmonolexomic base words to a total of 11480 words Thesecounts follow from the simultaneous constraints of (i) a wordappearing with sufficient frequency in TASA (ii) the wordbeing available in the British Lexicon Project (BLP [98]) (iii)

Complexity 15

the word being available in the CELEX database and theword being of the abovementioned morphological type Inthe present study we did not include inflected variants ofderived words nor did we include compounds

421 The Direct Route Straight from Orthography to Seman-tics To model single word reading as gauged by the visuallexical decision task we used letter trigrams as cues In ourdataset there were a total of 3465 unique trigrams resultingin a 11480 times 3465 orthographic cue matrix C The semanticvectors of the monolexomic and derived words were takenfrom the semantic weight matrix described in Section 3 Forinflected words semantic vectors were obtained by summa-tion of the semantic vectors of base words and inflectionalfunctions For the resulting semantic matrix S we retainedthe 5030 column vectors with the highest variances settingthe cutoff value for the minimal variance to 034times10minus7 FromS and C we derived the transformation matrix F which weused to obtain estimates s of the semantic vectors s in S

For 59 of the words s had the highest correlation withthe targeted semantic vector s (for the correctly predictedwords the mean correlation was 083 and the median was086 With respect to the incorrectly predicted words themean and median correlation were both 048) The accuracyobtained with naive discriminative learning using orthog-onal lexomes as outcomes instead of semantic vectors was27Thus as expected performance of linear discriminationlearning (ldl) is substantially better than that of naivediscriminative learning (ndl)

To assess whether this model is productive in the sensethat it canmake sense of novel complex words we considereda separate set of inflected and derived words which were notincluded in the original dataset For an unseen complexwordboth base word and the inflectional or derivational functionappeared in the training set but not the complex word itselfThe network F therefore was presented with a novel formvector which it mapped onto a novel vector in the semanticspace Recognition was successful if this novel vector wasmore strongly correlated with the semantic vector obtainedby summing the semantic vectors of base and inflectional orderivational function than with any other semantic vector

Of 553 unseen inflected words 43 were recognized suc-cessfully The semantic vectors predicted from their trigramsby the network F were overall well correlated in the meanwith the targeted semantic vectors of the novel inflectedwords (obtained by summing the semantic vectors of theirbase and inflectional function) 119903 = 067 119901 lt 00001The predicted semantic vectors also correlated well with thesemantic vectors of the inflectional functions (119903 = 061 119901 lt00001) For the base words the mean correlation dropped to119903 = 028 (119901 lt 00001)

For unseen derivedwords (514 in total) we also calculatedthe predicted semantic vectors from their trigram vectorsThe resulting semantic vectors s had moderate positivecorrelations with the semantic vectors of their base words(119903 = 040 119901 lt 00001) but negative correlations withthe semantic vectors of their derivational functions (119903 =minus013 119901 lt 00001) They did not correlate with the semantic

vectors obtained by summation of the semantic vectors ofbase and affix (119903 = 001 119901 = 056)

The reduced correlations for the derived words as com-pared to those for the inflected words likely reflects tosome extent that the model was trained on many moreinflected words (in all 6595) than derived words (898)whereas the number of different inflectional functions (7)was much reduced compared to the number of derivationalfunctions (24) However the negative correlations of thederivational semantic vectors with the semantic vectors oftheir derivational functions fit well with the observation inSection 323 that derivational functions are antiprototypesthat enter into negative correlations with the semantic vectorsof the corresponding derived words We suspect that theabsence of a correlation of the predicted vectors with thesemantic vectors obtained by integrating over the vectorsof base and derivational function is due to the semanticidiosyncracies that are typical for derivedwords For instanceausterity can denote harsh discipline or simplicity or apolicy of deficit cutting or harshness to the taste Anda worker is not just someone who happens to work butsomeone earning wages or a nonreproductive bee or waspor a thread in a computer program Thus even within theusages of one and the same derived word there is a lotof semantic heterogeneity that stands in stark contrast tothe straightforward and uniform interpretation of inflectedwords

422 The Indirect Route from Orthography via Phonology toSemantics Although reading starts with orthographic inputit has been shown that phonology actually plays a role duringthe reading process as well For instance developmentalstudies indicate that childrenrsquos reading development can bepredicted by their phonological abilities [87 99] Evidence ofthe influence of phonology on adultsrsquo reading has also beenreported [89 100ndash105] As pointed out by Perrone-Bertolottiet al [106] silent reading often involves an imagery speechcomponent we hear our own ldquoinner voicerdquo while readingSince written words produce a vivid auditory experiencealmost effortlessly they argue that auditory verbal imageryshould be part of any neurocognitive model of readingInterestingly in a study using intracranial EEG recordingswith four epileptic neurosurgical patients they observed thatsilent reading elicits auditory processing in the absence ofany auditory stimulation suggesting that auditory images arespontaneously evoked during reading (see also [107])

To explore the role of auditory images in silent readingwe operationalized sound images bymeans of phone trigrams(in our model phone trigrams are independently motivatedas intermediate targets of speech production see Section 5for further discussion) We therefore ran the model againwith phone trigrams instead of letter triplets The semanticmatrix S was exactly the same as above However a newtransformation matrix F was obtained for mapping a 11480times 5929 cue matrix C of phone trigrams onto S To our sur-prise the triphone model outperformed the trigram modelsubstantially Overall accuracy rose to 78 an improvementof almost 20

16 Complexity

In order to integrate this finding into our model weimplemented an additional network K that maps ortho-graphic vectors (coding the presence or absence of trigrams)onto phonological vectors (indicating the presence or absenceof triphones) The result is a dual route model for (silent)reading with a first route utilizing a network mappingstraight from orthography to semantics and a secondindirect route utilizing two networks one mapping fromtrigrams to triphones and the other subsequently mappingfrom triphones to semantics

The network K which maps trigram vectors onto tri-phone vectors provides good support for the relevant tri-phones but does not provide guidance as to their orderingUsing a graph-based algorithm detailed in the Appendix (seealso Section 52) we found that for 92 of the words thecorrect sequence of triphones jointly spanning the wordrsquossequence of letters received maximal support

We then constructed amatrix Twith the triphone vectorspredicted from the trigrams by networkK and defined a newnetwork H mapping these predicted triphone vectors ontothe semantic vectors SThemean correlation for the correctlyrecognized words was 090 and the median correlation was094 For words that were not recognized correctly the meanand median correlation were 045 and 043 respectively Wenow have for each word two estimated semantic vectors avector s1 obtained with network F of the first route and avector s2 obtained with network H from the auditory targetsthat themselves were predicted by the orthographic cuesThemean of the correlations of these pairs of vectors was 119903 = 073(119901 lt 00001)

To assess to what extent the networks that we definedare informative for actual visual word recognition we madeuse of the reaction times in the visual lexical decision taskavailable in the BLPAll 11480words in the current dataset arealso included in BLPWe derived threemeasures from each ofthe two networks

The first measure is the sum of the L1-norms of s1 ands2 to which we will refer as a wordrsquos total activation diversityThe total activation diversity is an estimate of the support fora wordrsquos lexicality as provided by the two routes The secondmeasure is the correlation of s1 and s2 to which we will referas a wordrsquos route congruency The third measure is the L1-norm of a lexomersquos column vector in S to which we referas its prior This last measure has previously been observedto be a strong predictor for lexical decision latencies [59] Awordrsquos prior is an estimate of a wordrsquos entrenchment and prioravailability

We fitted a generalized additive model to the inverse-transformed response latencies in the BLP with these threemeasures as key covariates of interest including as controlpredictors word length andword type (derived inflected andmonolexomic with derived as reference level) As shown inTable 5 inflected words were responded to more slowly thanderived words and the same holds for monolexomic wordsalbeit to a lesser extent Response latencies also increasedwith length Total activation diversity revealed a U-shapedeffect (Figure 5 left panel) For all but the lowest totalactivation diversities we find that response times increase as

total activation diversity increases This result is consistentwith previous findings for auditory word recognition [18]As expected given the results of Baayen et al [59] the priorwas a strong predictor with larger priors affording shorterresponse times There was a small effect of route congruencywhich emerged in a nonlinear interaction with the priorsuch that for large priors a greater route congruency affordedfurther reduction in response time (Figure 5 right panel)Apparently when the two routes converge on the samesemantic vector uncertainty is reduced and a faster responsecan be initiated

For comparison the dual-route model was implementedwith ndl as well Three measures were derived from thenetworks and used to predict the same response latenciesThese measures include activation activation diversity andprior all of which have been reported to be reliable predictorsfor RT in visual lexical decision [54 58 59] However sincehere we are dealing with two routes the three measures canbe independently derived from both routes We thereforesummed up the measures of the two routes obtaining threemeasures total activation total activation diversity and totalprior With word type and word length as control factorsTable 6 shows that thesemeasures participated in a three-wayinteraction presented in Figure 6 Total activation showed aU-shaped effect on RT that is increasingly attenuated as totalactivation diversity is increased (left panel) Total activationalso interacted with the total prior (right panel) For mediumrange values of total activation RTs increased with totalactivation and as expected decreased with the total priorThecenter panel shows that RTs decrease with total prior formostof the range of total activation diversity and that the effect oftotal activation diversity changes sign going from low to highvalues of the total prior

Model comparison revealed that the generalized additivemodel with ldl measures (Table 5) provides a substantiallyimproved fit to the data compared to the GAM using ndlmeasures (Table 6) with a difference of no less than 65101AIC units At the same time the GAM based on ldl is lesscomplex

Given the substantially better predictability of ldl mea-sures on human behavioral data one remaining question iswhether it is really the case that two routes are involved insilent reading After all the superior accuracy of the secondroute might be an artefact of a simple machine learningtechnique performing better for triphones than for trigramsThis question can be addressed by examining whether thefit of the GAM summarized in Table 5 improves or worsensdepending on whether the activation diversity of the first orthe second route is taken out of commission

When the GAM is provided access to just the activationdiversity of s2 the AIC increased by 10059 units Howeverwhen themodel is based only on the activation diversity of s1themodel fit increased by no less than 14790 AIC units Fromthis we conclude that at least for the visual lexical decisionlatencies in the BLP the second route first mapping trigramsto triphones and then mapping triphones onto semanticvectors plays the more important role

The superiority of the second route may in part bedue to the number of triphone features being larger

Complexity 17

minus0295

minus00175

026

part

ial e

ffect

1 2 3 4 50Total activation diversity

000

005

010

015

Part

ial e

ffect

05

10

15

20

Prio

r

02 04 06 08 1000Selfminuscorrelation

Figure 5 The partial effects of total activation diversity (left) and the interaction of route congruency and prior (right) on RT in the BritishLexicon Project

Table 5 Summary of a generalized additive model fitted to response latencies in visual lexical decision using measures based on ldl s thinplate regression spline smooth te tensor product smooth

A parametric coefficients Estimate Std Error t-value p-valueintercept -17774 00087 -205414 lt0001word typeInflected 01110 00059 18742 lt0001word typeMonomorphemic 00233 00054 4322 lt0001word length 00117 00011 10981 lt0001B smooth terms edf Refdf F p-values(total activation diversity) 6002 7222 236 lt0001te(route congruency prior) 14673 17850 2137 lt0001

02 06 10 14

NDL total activation

minus0171

0031

0233

part

ial e

ffect

minus4 minus2 0 2 4

10

15

20

25

30

35

40

NDL total prior

ND

L to

tal a

ctiv

atio

n di

vers

ity

minus0416

01915

0799

part

ial e

ffect

02 06 10 14

minus4

minus2

02

4

NDL total activation

ND

L to

tal p

rior

minus0158

01635

0485

part

ial e

ffect

10

15

20

25

30

35

40

ND

L to

tal a

ctiv

atio

n di

vers

ity

Figure 6The interaction of total activation and total activation diversity (left) of total prior by total activation diversity (center) and of totalactivation by total prior (right) on RT in the British Lexicon Project based on ndl

18 Complexity

Table 6 Summary of a generalized additive model fitted to response latencies in visual lexical decision using measures from ndl te tensorproduct smooth

A parametric coefficients Estimate Std Error t-value p-valueintercept -16934 00086 -195933 lt0001word typeInflected 00286 00052 5486 lt0001word typeMonomorphemic 00042 00055 0770 = 4word length 00066 00011 5893 lt0001B smooth terms edf Refdf F p-valuete(activation activation diversity prior) 4134 4972 1084 lt0001

than the number of trigram features (3465 trigrams ver-sus 5929 triphones) More features which mathematicallyamount to more predictors enable more precise map-pings Furthermore the heavy use made in English ofletter sequences such as ough (with 10 different pro-nunciations httpswwwdictionarycomesough)reduces semantic discriminability compared to the corre-sponding spoken forms It is noteworthy however that thebenefits of the triphone-to-semantics mapping are possibleonly thanks to the high accuracy with which orthographictrigram vectors are mapped onto phonological triphonevectors (92)

It is unlikely that the lsquophonological routersquo is always dom-inant in silent reading Especially in fast lsquodiagonalrsquo readingthe lsquodirect routersquomay bemore dominantThere is remarkablealthough for the present authors unexpected convergencewith the dual route model for reading aloud of Coltheartet al [108] and Coltheart [109] However while the text-to-phonology route of their model has as primary function toexplain why nonwords can be pronounced our results showthat both routes can actually be active when silently readingreal words A fundamental difference is however that in ourmodel wordsrsquo semantics play a central role

43 Auditory Comprehension For the modeling of readingwe made use of letter trigrams as cues These cues abstractaway from the actual visual patterns that fall on the retinapatterns that are already transformed at the retina beforebeing sent to the visual cortex Our hypothesis is that lettertrigrams represent those high-level cells or cell assemblies inthe visual system that are critical for reading andwe thereforeleave the modeling possibly with deep learning networksof how patterns on the retina are transformed into lettertrigrams for further research

As a consequence of the high level of abstraction of thetrigrams a word form is represented by a unique vectorspecifying which of a fixed set of letter trigrams is present inthe word Although one might consider modeling auditorycomprehension with phone triplets (triphones) replacing theletter trigrams of visual comprehension such an approachwould not do justice to the enormous variability of actualspeech Whereas the modeling of reading printed wordscan depart from the assumption that the pixels of a wordrsquosletters on a computer screen are in a fixed configurationindependently of where the word is shown on the screenthe speech signal of the same word type varies from token

to token as illustrated in Figure 7 for the English wordcrisis A survey of the Buckeye corpus [110] of spontaneousconversations recorded at Columbus Ohio [111] indicatesthat around 5 of the words are spoken with one syllablemissing and that a little over 20 of words have at least onephone missing

It is widely believed that the phoneme as an abstractunit of sound is essential for coming to grips with thehuge variability that characterizes the speech signal [112ndash114]However the phoneme as theoretical linguistic constructis deeply problematic [93] and for many spoken formscanonical phonemes do not do justice to the phonetics ofthe actual sounds [115] Furthermore if words are defined assequences of phones the problem arises what representationsto posit for words with two ormore reduced variants Addingentries for reduced forms to the lexicon turns out not toafford better overal recognition [116] Although exemplarmodels have been put forward to overcome this problem[117] we take a different approach here and following Arnoldet al [18] lay out a discriminative approach to auditorycomprehension

The cues that we make use of to represent the acousticsignal are the Frequency Band Summary Features (FBSFs)introduced by Arnold et al [18] as input cues FBSFs summa-rize the information present in the spectrogram of a speechsignal The algorithm that derives FBSFs first chunks theinput at the minima of the Hilbert amplitude envelope of thesignalrsquos oscillogram (see the upper panel of Figure 8) Foreach chunk the algorithm distinguishes 21 frequency bandsin the MEL scaled spectrum and intensities are discretizedinto 5 levels (lower panel in Figure 8) for small intervalsof time For each chunk and for each frequency band inthese chunks a discrete feature is derived that specifies chunknumber frequency band number and a description of thetemporal variation in the band bringing together minimummaximum median initial and final intensity values The21 frequency bands are inspired by the 21 receptive areason the cochlear membrane that are sensitive to differentranges of frequencies [118] Thus a given FBSF is a proxyfor cell assemblies in the auditory cortex that respond to aparticular pattern of changes over time in spectral intensityThe AcousticNDLCodeR package [119] for R [120] wasemployed to extract the FBSFs from the audio files

We tested ldl on 20 hours of speech sampled from theaudio files of the ucla library broadcast newsscapedata a vast repository of multimodal TV news broadcasts

Complexity 19

0 5000 10000 15000 20000 25000 30000 35000 40000minus6000

minus4000

minus2000

0

2000

4000

6000

8000

0 5000 10000 15000 20000 25000 30000 35000 40000minus6000

minus4000

minus2000

0

2000

4000

6000

8000

Figure 7Oscillogram for twodifferent realizations of theword crisiswith different degrees of reduction in theNewsScape archive [kraız](left)and [khraısıs](right)

00 01 02 03 04 05 06time

minus020minus015minus010minus005

000005010015

ampl

itude

1

21

frequ

ency

ban

d

Figure 8 Oscillogram of the speech signal for a realization of the word economic with Hilbert envelope (in orange) outlining the signal isshown on the top panel The lower panel depicts the discretized MEL scaled spectrum of the signalThe vertical bars are the boundary pointsthat partition the signal into 4 chunks For one chunk horizontal lines (in red) highlight one of the frequency bands for which the FBSFsprovide summaries of the variation over time in that frequency band The FBSF for the highlighted band is ldquoband11-start3-median3-min2-max5-end3-part2rdquo

provided to us by the Distributed Little Red Hen Lab Theaudio files of this resource were automatically classified asclean for relatively clean parts where there is speech withoutbackground noise or music and noisy for speech snippetswhere background noise or music is present Here we reportresults for 20 hours of clean speech to a total of 131673word tokens (representing 4779word types) with in all 40639distinct FBSFs

The FBSFs for the word tokens are brought together ina matrix C119886 with dimensions 131673 audio tokens times 40639FBSFs The targeted semantic vectors are taken from the Smatrix which is expanded to a matrix with 131673 rows onefor each audio token and 4609 columns the dimension of

the semantic vectors Although the transformation matrix Fcould be obtained by calculating C1015840S the calculation of C1015840is numerically expensive To reduce computational costs wecalculated F as follows (the transpose of a square matrix Xdenoted by X119879 is obtained by replacing the upper triangleof the matrix by the lower triangle and vice versa Thus( 3 87 2 )119879 = ( 3 78 2 ) For a nonsquare matrix the transpose isobtained by switching rows and columnsThus a 4times2 matrixbecomes a 2 times 4 matrix when transposed)

CF = S

C119879CF = C119879S

20 Complexity

(C119879C)minus1 (C119879C) F = (C119879C)minus1 C119879SF = (C119879C)minus1 C119879S

(17)

In this way matrix inversion is required for a much smallermatrixC119879C which is a square matrix of size 40639 times 40639

To evaluate model performance we compared the esti-mated semantic vectors with the targeted semantic vectorsusing the Pearson correlation We therefore calculated the131673 times 131673 correlation matrix for all possible pairs ofestimated and targeted semantic vectors Precisionwas calcu-lated by comparing predicted vectors with the gold standardprovided by the targeted vectors Recognition was defined tobe successful if the correlation of the predicted vector withthe targeted gold vector was the highest of all the pairwisecorrelations of this predicted vector with any of the other goldsemantic vectors Precision defined as the proportion of cor-rect recognitions divided by the total number of audio tokenswas at 3361 For the correctly identified words the meancorrelation was 072 and for the incorrectly identified wordsit was 055 To place this in perspective a naive discrimi-nation model with discrete lexomes as output performed at12 and a deep convolution network Mozilla DeepSpeech(httpsgithubcommozillaDeepSpeech based onHannun et al [9]) performed at 6 The low performanceofMozilla DeepSpeech is due primarily due to its dependenceon a languagemodelWhenpresentedwith utterances insteadof single words it performs remarkably well

44 Discussion A problem with naive discriminative learn-ing that has been puzzling for a long time is that measuresbased on ndl performed well as predictors of processingtimes [54 58] whereas accuracy for lexical decisions waslow An accuracy of 27 for a model performing an 11480-classification task is perhaps reasonable but the lack ofprecision is unsatisfactory when the goal is to model humanvisual lexicality decisions Bymoving fromndl to ldlmodelaccuracy is substantially improved (to 59) At the sametime predictions for reaction times improved considerablyas well As lexicality decisions do not require word identifica-tion further improvement in predicting decision behavior isexpected to be possible by considering not only whether thepredicted semantic is closest to the targeted vector but alsomeasures such as how densely the space around the predictedsemantic vector is populated

Accuracy for auditory comprehension is lower for thedata we considered above at around 33 Interestingly aseries of studies indicates that recognizing isolated wordstaken out of running speech is a nontrivial task also forhuman listeners [121ndash123] Correct identification by nativespeakers of 1000 randomly sampled word tokens from aGerman corpus of spontaneous conversational speech rangedbetween 21 and 44 [18] For both human listeners andautomatic speech recognition systems recognition improvesconsiderably when words are presented in their naturalcontext Given that ldl with FBSFs performs very wellon isolated word recognition it seems worth investigating

further whether the present approach can be developed into afully-fledged model of auditory comprehension that can takefull utterances as input For a blueprint of how we plan toimplement such a model see Baayen et al [124]

5 Speech Production

This section examines whether we can predict wordsrsquo formsfrom the semantic vectors of S If this is possible for thepresent dataset with reasonable accuracy we have a proofof concept that discriminative morphology is feasible notonly for comprehension but also for speech productionThe first subsection introduces the computational imple-mentation The next subsection reports on the modelrsquosperformance which is evaluated first for monomorphemicwords then for inflected words and finally for derivedwords Section 53 provides further evidence for the pro-duction network by showing that as the support from thesemantics for the triphones becomes weaker the amount oftime required for articulating the corresponding segmentsincreases

51 Computational Implementation For a productionmodelsome representational format is required for the output thatin turn drives articulation In what follows we make useof triphones as output features Triphones capture part ofthe contextual dependencies that characterize speech andthat render problematic the phoneme as elementary unit ofa phonological calculus [93] Triphones are in many waysnot ideal in that they inherit the limitations that come withdiscrete units Other output features structured along thelines of gestural phonology [125] or time series of movementsof key articulators registered with electromagnetic articulog-raphy or ultrasound are on our list for further explorationFor now we use triphones as a convenience construct andwe will show that given the support from the semantics forthe triphones the sequence of phones can be reconstructedwith high accuracy As some models of speech productionassemble articulatory targets from phone segments (eg[126]) the present implementation can be seen as a front-endfor this type of model

Before putting the model to the test we first clarify theway the model works by means of our toy lexicon with thewords one two and three Above we introduced the semanticmatrix S (13) which we repeat here for convenience

S = one two threeonetwothree

( 100201031001

040110 ) (18)

as well as an C indicator matrix specifying which triphonesoccur in which words (12) As in what follows this matrixspecifies the triphones targeted for productionwe henceforthrefer to this matrix as the Tmatrix

Complexity 21

T

= wV wVn Vn tu tu Tr Tri rionetwothree

( 100100

100010

010001

001001 ) (19)

For production our interest is in thematrixG that transformsthe row vectors of S into the row vectors of T ie we need tosolve

SG = T (20)

Given G we can predict for any semantic vector s the vectorof triphones t that quantifies the support for the triphonesprovided by s simply by multiplying s with G

t = sG (21)

As before the transformation matrix G is straightforwardto estimate Let S1015840 denote the Moore-Penrose generalizedinverse of S Since

S1015840SG = S1015840T (22)

we have

G = S1015840T (23)

Given G we can predict the triphone matrix T from thesemantic matrix S

SG = T (24)

For the present example the inverse of S and S1015840 is

S1015840 = one two threeonetwothree

( 110minus021minus009minus029107minus008

minus041minus002104 ) (25)

and the transformation matrix G is

G = wV wVn Vn tu tu Tr Tri ri

onetwothree

( 110minus021minus009110minus021minus009

110minus021minus009minus029107minus008

minus029107minus008minus041minus002104

minus041minus002104minus041minus002104 ) (26)

For this simple example T is virtually identical to T Forrealistic data T will not be identical to T but will be anapproximation of it that is optimal in the least squares senseThe triphones with the strongest support are expected to bethe most likely to be the triphones defining a wordrsquos form

We made use of the S matrix which we derived fromthe tasa corpus as described in Section 3 The majority ofcolumns of the 23561 times 23561 matrix S show very smalldeviations from zero and hence are uninformative As beforewe reduced the number of columns of S by removing columnswith very low variance Here one option is to remove allcolumns with a variance below a preset threshold 120579 Howeverto avoid adding a threshold as a free parameter we setthe number of columns retained to the number of differenttriphones in a given dataset a number which is around 119899 =4500 In summary S denotes a 119908times119899 matrix that specifies foreach of 119908 words an 119899-dimensional semantic vector

52 Model Performance

521 Performance on Monolexomic Words We first exam-ined model performance on a dataset comprising monolex-omicwords that did not carry any inflectional exponentsThisdataset of 3987words comprised three irregular comparatives

(elder less andmore) two irregular superlatives (least most)28 irregular past tense forms and one irregular past participle(smelt) For this set of words we constructed a 3987 times 4446matrix of semantic vectors S119898 (a submatrix of the S matrixintroduced above in Section 3) and a 3987 times 4446 triphonematrix T119898 (a submatrix of T) The number of colums of S119898was set to the number of different triphones (the columndimension of T119898) Those column vectors of S119898 were retainedthat had the 4446 highest variances We then estimated thetransformation matrix G and used this matrix to predict thetriphones that define wordsrsquo forms

We evaluated model performance in two ways First weinspected whether the triphones with maximal support wereindeed the targeted triphones This turned out to be the casefor all words Targeted triphones had an activation valueclose to one and nontargeted triphones an activation closeto zero As the triphones are not ordered we also investi-gated whether the sequence of phones could be constructedcorrectly for these words To this end we set a thresholdof 099 extracted all triphones with an activation exceedingthis threshold and used the all simple paths (a path issimple if the vertices it visits are not visited more than once)function from the igraph package [127] to calculate all pathsstarting with any left-edge triphone in the set of extractedtriphones From the resulting set of paths we selected the

22 Complexity

longest path which invariably was perfectly aligned with thesequence of triphones that defines wordsrsquo forms

We also evaluated model performance with a secondmore general heuristic algorithm that also makes use of thesame algorithm from graph theory Our algorithm sets up agraph with vertices collected from the triphones that are bestsupported by the relevant semantic vectors and considers allpaths it can find that lead from an initial triphone to a finaltriphoneThis algorithm which is presented inmore detail inthe appendix and which is essential for novel complex wordsproduced the correct form for 3982 out of 3987 words Itselected a shorter form for five words int for intent lin forlinnen mis for mistress oint for ointment and pin for pippinThe correct forms were also found but ranked second due toa simple length penalty that is implemented in the algorithm

From these analyses it is clear that mapping nearly4000 semantic vectors on their corresponding triphone pathscan be accomplished with very high accuracy for Englishmonolexomic words The question to be addressed nextis how well this approach works for complex words Wefirst address inflected forms and limit ourselves here to theinflected variants of the present set of 3987 monolexomicwords

522 Performance on Inflected Words Following the classicdistinction between inflection and word formation inflectedwords did not receive semantic vectors of their own Never-theless we can create semantic vectors for inflected words byadding the semantic vector of an inflectional function to thesemantic vector of its base However there are several ways inwhich themapping frommeaning to form for inflectedwordscan be set up To explain this we need some further notation

Let S119898 and T119898 denote the submatrices of S and T thatcontain the semantic and triphone vectors of monolexomicwords Assume that a subset of 119896 of these monolexomicwords is attested in the training corpus with inflectionalfunction 119886 Let T119886 denote the matrix with the triphonevectors of these inflected words and let S119886 denote thecorresponding semantic vectors To obtain S119886 we take thepertinent submatrix S119898119886 from S119898 and add the semantic vectors119886 of the affix

S119886 = S119898119886 + i⨂ s119886 (27)

Here i is a unit vector of length 119896 and ⨂ is the generalizedKronecker product which in (27) stacks 119896 copies of s119886 row-wise As a first step we could define a separate mapping G119886for each inflectional function 119886

S119886G119886 = T119886 (28)

but in this set-up learning of inflected words does not benefitfrom the knowledge of the base words This can be remediedby a mapping for augmented matrices that contain the rowvectors for both base words and inflected words[S119898

S119886]G119886 = [T119898

T119886] (29)

The dimension of G119886 (length of semantic vector by lengthof triphone vector) remains the same so this option is notmore costly than the preceding one Nevertheless for eachinflectional function a separate large matrix is required Amuch more parsimonious solution is to build augmentedmatrices for base words and all inflected words jointly

[[[[[[[[[[S119898S1198861S1198862S119886119899

]]]]]]]]]]G = [[[[[[[[[[

T119898T1198861T1198862T119886119899

]]]]]]]]]](30)

The dimension of G is identical to that of G119886 but now allinflectional functions are dealt with by a single mappingIn what follows we report the results obtained with thismapping

We selected 6595 inflected variants which met the cri-terion that the frequency of the corresponding inflectionalfunction was at least 50 This resulted in a dataset with 91comparatives 97 superlatives 2401 plurals 1333 continuousforms (eg walking) 859 past tense forms and 1086 formsclassed as perfective (past participles) as well as 728 thirdperson verb forms (eg walks) Many forms can be analyzedas either past tenses or as past participles We followed theanalyses of the treetagger which resulted in a dataset inwhichboth inflectional functions are well-attested

Following (30) we obtained an (augmented) 10582 times5483 semantic matrix S whereas before we retained the 5483columns with the highest column variance The (augmented)triphone matrix T for this dataset had the same dimensions

Inspection of the activations of the triphones revealedthat targeted triphones had top activations for 85 of themonolexomic words and 86 of the inflected words Theproportion of words with at most one intruding triphone was97 for both monolexomic and inflected words The graph-based algorithm performed with an overall accuracy of 94accuracies broken down bymorphology revealed an accuracyof 99 for the monolexomic words and an accuracy of 92for inflected words One source of errors for the productionalgorithm is inconsistent coding in the celex database Forinstance the stem of prosper is coded as having a final schwafollowed by r but the inflected forms are coded without ther creating a mismatch between a partially rhotic stem andcompletely non-rhotic inflected variants

We next put model performance to a more stringent testby using 10-fold cross-validation for the inflected variantsFor each fold we trained on all stems and 90 of all inflectedforms and then evaluated performance on the 10of inflectedforms that were not seen in training In this way we can ascer-tain the extent to which our production system (network plusgraph-based algorithm for ordering triphones) is productiveWe excluded from the cross-validation procedure irregularinflected forms forms with celex phonological forms withinconsistent within-paradigm rhoticism as well as forms thestem of which was not available in the training set Thus

Complexity 23

cross-validation was carried out for a total of 6236 inflectedforms

As before the semantic vectors for inflected words wereobtained by addition of the corresponding content and inflec-tional semantic vectors For each training set we calculatedthe transformationmatrixG from the S andTmatrices of thattraining set For an out-of-bag inflected form in the test setwe calculated its semantic vector and multiplied this vectorwith the transformation matrix (using (21)) to obtain thepredicted triphone vector t

The proportion of forms that were predicted correctlywas 062 The proportion of forms ranked second was 025Forms that were incorrectly ranked first typically were otherinflected forms (including bare stems) that happened toreceive stronger support than the targeted form Such errorsare not uncommon in spoken English For instance inthe Buckeye corpus [110] closest is once reduced to closFurthermore Dell [90] classified forms such as concludementfor conclusion and he relax for he relaxes as (noncontextual)errors

The algorithm failed to produce the targeted form for3 of the cases Examples of the forms produced instead arethe past tense for blazed being realized with [zId] insteadof [zd] the plural mouths being predicted as having [Ts] ascoda rather than [Ds] and finest being reduced to finst Thevoiceless production formouth does not follow the dictionarynorm but is used as attested by on-line pronunciationdictionaries Furthermore the voicing alternation is partlyunpredictable (see eg [128] for final devoicing in Dutch)and hence model performance here is not unreasonable Wenext consider model accuracy for derived words

523 Performance on Derived Words When building thevector space model we distinguished between inflectionand word formation Inflected words did not receive theirown semantic vectors By contrast each derived word wasassigned its own lexome together with a lexome for itsderivational function Thus happiness was paired with twolexomes happiness and ness Since the semantic matrix forour dataset already contains semantic vectors for derivedwords we first investigated how well forms are predictedwhen derived words are assessed along with monolexomicwords without any further differentiation between the twoTo this end we constructed a semantic matrix S for 4885words (rows) by 4993 (columns) and constructed the trans-formation matrix G from this matrix and the correspondingtriphone matrix T The predicted form vectors of T =SG supported the targeted triphones above all other tri-phones without exception Furthermore the graph-basedalgorithm correctly reconstructed 99 of all forms withonly 5 cases where it assigned the correct form secondrank

Next we inspected how the algorithm performs whenthe semantic vectors of derived forms are obtained fromthe semantic vectors of their base words and those of theirderivational lexomes instead of using the semantic vectors ofthe derived words themselves To allow subsequent evalua-tion by cross-validation we selected those derived words that

contained an affix that occured at least 30 times in our data set(again (38) agent (177) ful (45) instrument (82) less(54) ly (127) and ness (57)) to a total of 770 complex words

We first combined these derived words with the 3987monolexomic words For the resulting 4885 words the Tand S matrices were constructed from which we derivedthe transformation matrix G and subsequently the matrixof predicted triphone strengths T The proportion of wordsfor which the targeted triphones were the best supportedtriphones was 096 and the graph algorithm performed withan accuracy of 989

In order to assess the productivity of the system weevaluated performance on derived words by means of 10-fold cross-validation For each fold we made sure that eachaffix was present proportionally to its frequency in the overalldataset and that a derived wordrsquos base word was included inthe training set

We first examined performance when the transformationmatrix is estimated from a semantic matrix that contains thesemantic vectors of the derived words themselves whereassemantic vectors for unseen derived words are obtained bysummation of the semantic vectors of base and affix It turnsout that this set-up results in a total failure In the presentframework the semantic vectors of derived words are tooidiosyncratic and too finely tuned to their own collocationalpreferences They are too scattered in semantic space tosupport a transformation matrix that supports the triphonesof both stem and affix

We then used exactly the same cross-validation procedureas outlined above for inflected words constructing semanticvectors for derived words from the semantic vectors of baseand affix and calculating the transformation matrix G fromthese summed vectors For unseen derivedwordsGwas usedto transform the semantic vectors for novel derived words(obtained by summing the vectors of base and affix) intotriphone space

For 75 of the derived words the graph algorithmreconstructed the targeted triphones For 14 of the derivednouns the targeted form was not retrieved These includecases such as resound which the model produced with [s]instead of the (unexpected) [z] sewer which the modelproduced with R instead of only R and tumbler where themodel used syllabic [l] instead of nonsyllabic [l] given in thecelex target form

53 Weak Links in the Triphone Graph and Delays in SpeechProduction An important property of the model is that thesupport for trigrams changes where a wordrsquos graph branchesout for different morphological variants This is illustratedin Figure 9 for the inflected form blending Support for thestem-final triphone End is still at 1 but then blend can endor continue as blends blended or blending The resultinguncertainty is reflected in the weights on the edges leavingEnd In this example the targeted ing form is driven by theinflectional semantic vector for continuous and hence theedge to ndI is best supported For other forms the edgeweights will be different and hence other paths will be bettersupported

24 Complexity

1

1

1051

024023

03

023

1

001 001

1

02

03

023

002

1 0051

bl

blE

lEn

EndndI

dIN

IN

INI

NIN

dId

Id

IdI

dII

IIN

ndz

dz

dzI

zIN

nd

EnI

nIN

lEI

EIN

Figure 9 The directed graph for blending Vertices represent triphones including triphones such as IIN that were posited by the graphalgorithm to bridge potentially relevant transitions that are not instantiated in the training data Edges are labelled with their edge weightsThe graph the edge weights of which are specific to the lexomes blend and continuous incorporates not only the path for blending butalso the paths for blend blends and blended

The uncertainty that arises at branching points in thetriphone graph is of interest in the light of several exper-imental results For instance inter keystroke intervals intyping become longer at syllable and morph boundaries[129 130] Evidence from articulography suggests variabilityis greater at morph boundaries [131] Longer keystroke exe-cution times and greater articulatory variability are exactlywhat is expected under reduced edge support We thereforeexamined whether the edge weights at the first branchingpoint are predictive for lexical processing To this end weinvestigated the acoustic duration of the segment at the centerof the first triphone with a reduced LDL edge weight (in thepresent example d in ndI henceforth lsquobranching segmentrsquo)for those words in our study that are attested in the Buckeyecorpus [110]

The dataset we extracted from the Buckeye corpus com-prised 15105 tokens of a total of 1327 word types collectedfrom 40 speakers For each of these words we calculatedthe relative duration of the branching segment calculatedby dividing segment duration by word duration For thepurposes of statistical evaluation the distribution of thisrelative duration was brought closer to normality bymeans ofa logarithmic transformationWe fitted a generalized additivemixed model [132] to log relative duration with randomintercepts for word and speaker log LDL edge weight aspredictor of interest and local speech rate (Phrase Rate)log neighborhood density (Ncount) and log word frequencyas control variables A summary of this model is presentedin Table 7 As expected an increase in LDL Edge Weight

goes hand in hand with a reduction in the duration of thebranching segment In other words when the edge weight isreduced production is slowed

The adverse effects of weaknesses in a wordrsquos path inthe graph where the path branches is of interest againstthe discussion about the function of weak links in diphonetransitions in the literature on comprehension [133ndash138] Forcomprehension it has been argued that bigram or diphonelsquotroughsrsquo ie low transitional probabilities in a chain of hightransitional probabilities provide points where sequencesare segmented and parsed into their constituents Fromthe perspective of discrimination learning however low-probability transitions function in exactly the opposite wayfor comprehension [54 124] Naive discriminative learningalso predicts that troughs should give rise to shorter process-ing times in comprehension but not because morphologicaldecomposition would proceed more effectively Since high-frequency boundary bigrams and diphones are typically usedword-internally across many words they have a low cuevalidity for thesewords Conversely low-frequency boundarybigrams are much more typical for very specific base+affixcombinations and hence are better discriminative cues thatafford enhanced activation of wordsrsquo lexomes This in turngives rise to faster processing (see also Ramscar et al [78] forthe discriminative function of low-frequency bigrams in low-frequency monolexomic words)

There is remarkable convergence between the directedgraphs for speech production such as illustrated in Figure 9and computational models using temporal self-organizing

Complexity 25

Table 7 Statistics for the partial effects of a generalized additive mixed model fitted to the relative duration of edge segments in the Buckeyecorpus The trend for log frequency is positive accelerating TPRS thin plate regression spline

A parametric coefficients Estimate Std Error t-value p-valueIntercept -19165 00592 -323908 lt 00001Log LDL Edge Weight -01358 00400 -33960 00007Phrase Rate 00047 00021 22449 00248Log Ncount 01114 00189 58837 lt 00001B smooth terms edf Refdf F-value p-valueTPRS log Frequency 19007 19050 75495 00004random intercepts word 11498839 13230000 314919 lt 00001random intercepts speaker 302995 390000 345579 lt 00001

maps (TSOMs [139ndash141]) TSOMs also use error-drivenlearning and in recent work attention is also drawn toweak edges in wordsrsquo paths in the TSOM and the conse-quences thereof for lexical processing [142] We are not usingTSOMs however one reason being that in our experiencethey do not scale up well to realistically sized lexiconsA second reason is that we are pursuing the hypothesisthat form serves meaning and that self-organization ofform by itself is not necessary This hypothesis is likelytoo strong especially as we do not provide a rationale forthe trigram and triphone units that we use to representaspects of form It is worth noting that spatial organizationof form features is not restricted to TSOMs Although inour approach the spatial organization of triphone features isleft unspecified such an organization can be easily enforced(see [85] for further details) by using an algorithm suchas graphopt which self-organizes the vertices of a graphinto a two-dimensional plane Figure 9 was obtained withthis algorithm (Graphopt was developed for the layout oflarge graphs httpwwwschmuhlorggraphopt andis implemented in the igraph package Graphopt uses basicprinciples of physics to iteratively determine an optimallayout Each node in the graph is given both mass and anelectric charge and edges between nodes are modeled asspringsThis sets up a system inwhich there are attracting andrepelling forces between the vertices of the graph and thisphysical system is simulated until it reaches an equilibrium)

54 Discussion Our production network reconstructsknown wordsrsquo forms with a high accuracy 999 formonolexomic words 92 for inflected words and 99 forderived words For novel complex words accuracy under10-fold cross-validation was 62 for inflected words and 75for derived words

The drop in performance for novel forms is perhapsunsurprising given that speakers understand many morewords than they themselves produce even though they hearnovel forms on a fairly regular basis as they proceed throughlife [79 143]

However we also encountered several technical problemsthat are not due to the algorithm but to the representationsthat the algorithm has had to work with First it is surprisingthat accuracy is as high as it is given that the semanticvectors are constructed from a small corpus with a very

simple discriminative algorithm Second we encounteredinconsistencies in the phonological forms retrieved from thecelex database inconsistencies that in part are due to theuse of discrete triphones Third many cases where the modelpredictions do not match the targeted triphone sequence thetargeted forms have a minor irregularity (eg resound with[z] instead of [s]) Fourth several of the typical errors that themodel makes are known kinds of speech errors or reducedforms that one might encounter in engaged conversationalspeech

It is noteworthy that themodel is almost completely data-driven The triphones are derived from celex and otherthan somemanual corrections for inconsistencies are derivedautomatically from wordsrsquo phone sequences The semanticvectors are based on the tasa corpus and were not in any wayoptimized for the production model Given the triphone andsemantic vectors the transformation matrix is completelydetermined No by-hand engineering of rules and exceptionsis required nor is it necessary to specify with hand-codedlinks what the first segment of a word is what its secondsegment is etc as in the weaver model [37] It is only in theheuristic graph algorithm that three thresholds are requiredin order to avoid that graphs become too large to remaincomputationally tractable

The speech production system that emerges from thisapproach comprises first of all an excellent memory forforms that have been encountered before Importantly thismemory is not a static memory but a dynamic one It is not arepository of stored forms but it reconstructs the forms fromthe semantics it is requested to encode For regular unseenforms however a second network is required that projects aregularized semantic space (obtained by accumulation of thesemantic vectors of content and inflectional or derivationalfunctions) onto the triphone output space Importantly itappears that no separate transformations or rules are requiredfor individual inflectional or derivational functions

Jointly the two networks both of which can also betrained incrementally using the learning rule ofWidrow-Hoff[44] define a dual route model attempts to build a singleintegrated network were not successfulThe semantic vectorsof derived words are too idiosyncratic to allow generalizationfor novel forms It is the property of the semantic vectorsof derived lexomes described in Section 323 namely thatthey are close to but not inside the cloud of their content

26 Complexity

lexomes which makes them productive Novel forms do notpartake in the idiosyncracies of lexicalized existing wordsbut instead remain bound to base and derivational lexomesIt is exactly this property that makes them pronounceableOnce produced a novel form will then gravitate as experi-ence accumulates towards the cloud of semantic vectors ofits morphological category meanwhile also developing itsown idiosyncracies in pronunciation including sometimeshighly reduced pronunciation variants (eg tyk for Dutchnatuurlijk ([natyrlk]) [111 144 145] It is perhaps possibleto merge the two routes into one but for this much morefine-grained semantic vectors are required that incorporateinformation about discourse and the speakerrsquos stance withrespect to the adressee (see eg [115] for detailed discussionof such factors)

6 Bringing in Time

Thus far we have not considered the role of time The audiosignal comes in over time longerwords are typically readwithmore than one fixation and likewise articulation is a temporalprocess In this section we briefly outline how time can bebrought into the model We do so by discussing the readingof proper names

Proper names (morphologically compounds) pose achallenge to compositional theories as the people referredto by names such as Richard Dawkins Richard NixonRichardThompson and SandraThompson are not obviouslysemantic composites of each other We therefore assignlexomes to names irrespective of whether the individu-als referred to are real or fictive alive or dead Further-more we assume that when the personal name and familyname receive their own fixations both names are under-stood as pertaining to the same named entity which istherefore coupled with its own unique lexome Thus forthe example sentences in Table 8 the letter trigrams ofJohn are paired with the lexome JohnClark and like-wise the letter trigrams of Clark are paired with this lex-ome

By way of illustration we obtained thematrix of semanticvectors training on 100 randomly ordered tokens of thesentences of Table 8 We then constructed a trigram cuematrix C specifying for each word (John Clark wrote )which trigrams it contains In parallel a matrix L specify-ing for each word its corresponding lexomes (JohnClarkJohnClark write ) was set up We then calculatedthe matrix F by solving CF = L and used F to cal-culate estimated (predicted) semantic vectors L Figure 10presents the correlations of the estimated semantic vectorsfor the word forms John John Welsch Janet and Clarkwith the targeted semantic vectors for the named entitiesJohnClark JohnWelsch JohnWiggam AnneHastieand JaneClark For the sequentially read words John andWelsch the semantic vector generated by John and thatgenerated by Welsch were summed to obtain the integratedsemantic vector for the composite name

Figure 10 upper left panel illustrates that upon readingthe personal name John there is considerable uncertainty

about which named entity is at issue When subsequentlythe family name is read (upper right panel) uncertainty isreduced and John Welsch now receives full support whereasother named entities have correlations close to zero or evennegative correlations As in the small world of the presenttoy example Janet is a unique personal name there is nouncertainty about what named entity is at issue when Janetis read For the family name Clark on its own by contrastJohn Clark and Janet Clark are both viable options

For the present example we assumed that every ortho-graphic word received one fixation However depending onwhether the eye lands far enough into the word namessuch as John Clark and compounds such as graph theorycan also be read with a single fixation For single-fixationreading learning events should comprise the joint trigrams ofthe two orthographic words which are then simultaneouslycalibrated against the targeted lexome (JohnClark graph-theory)

Obviously the true complexities of reading such asparafoveal preview are not captured by this example Never-theless as a rough first approximation this example showsa way forward for integrating time into the discriminativelexicon

7 General Discussion

We have presented a unified discrimination-driven modelfor visual and auditory word comprehension and wordproduction the discriminative lexicon The layout of thismodel is visualized in Figure 11 Input (cochlea and retina)and output (typing and speaking) systems are presented inlight gray the representations of themodel are shown in darkgray For auditory comprehension we make use of frequencyband summary (FBS) features low-level cues that we deriveautomatically from the speech signal The FBS features serveas input to the auditory network F119886 that maps vectors of FBSfeatures onto the semantic vectors of S For reading lettertrigrams represent the visual input at a level of granularitythat is optimal for a functional model of the lexicon (see [97]for a discussion of lower-level visual features) Trigrams arerelatively high-level features compared to the FBS featuresFor features implementing visual input at a much lower levelof visual granularity using histograms of oriented gradients[146] see Linke et al [15] Letter trigrams are mapped by thevisual network F119900 onto S

For reading we also implemented a network heredenoted by K119886 that maps trigrams onto auditory targets(auditory verbal images) represented by triphones Theseauditory triphones in turn are mapped by network H119886onto the semantic vectors S This network is motivated notonly by our observation that for predicting visual lexicaldecision latencies triphones outperform trigrams by a widemargin Network H119886 is also part of the control loop forspeech production see Hickok [147] for discussion of thiscontrol loop and the necessity of auditory targets in speechproduction Furthermore the acoustic durations with whichEnglish speakers realize the stem vowels of English verbs[148] as well as the duration of word final s in English [149]

Complexity 27

Table 8 Example sentences for the reading of proper names To keep the example simple inflectional lexomes are not taken into account

John Clark wrote great books about antsJohn Clark published a great book about dangerous antsJohn Welsch makes great photographs of airplanesJohn Welsch has a collection of great photographs of airplanes landingJohn Wiggam will build great harpsichordsJohn Wiggam will be a great expert in tuning harpsichordsAnne Hastie has taught statistics to great second year studentsAnne Hastie has taught probability to great first year studentsJanet Clark teaches graph theory to second year studentsJohnClark write great book about antJohnClark publish a great book about dangerous antJohnWelsch make great photograph of airplaneJohnWelsch have a collection of great photograph airplane landingJohnWiggam future build great harpsichordJohnWiggam future be a great expert in tuning harpsichordAnneHastie have teach statistics to great second year studentAnneHastie have teach probability to great first year studentJanetClark teach graphtheory to second year student

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

John

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

John Welsch

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

Janet

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

Clark

Pear

son

corr

elat

ion

minus0

50

00

51

0

Figure 10 Correlations of estimated semantic vectors for John JohnWelsch Janet andClarkwith the targeted semantic vectors of JohnClarkJohnWelsch JohnWiggam AnneHastie and JaneClark

can be predicted from ndl networks using auditory targets todiscriminate between lexomes [150]

Speech production is modeled by network G119886 whichmaps semantic vectors onto auditory targets which in turnserve as input for articulation Although current modelsof speech production assemble articulatory targets fromphonemes (see eg [126 147]) a possibility that is certainlycompatible with our model when phones are recontextual-ized as triphones we think that it is worthwhile investigatingwhether a network mapping semantic vectors onto vectors ofarticulatory parameters that in turn drive the articulators canbe made to work especially since many reduced forms arenot straightforwardly derivable from the phone sequences oftheir lsquocanonicalrsquo forms [111 144]

We have shown that the networks of our model havereasonable to high production and recognition accuraciesand also generate a wealth of well-supported predictions forlexical processing Central to all networks is the hypothesisthat the relation between form and meaning can be modeleddiscriminatively with large but otherwise surprisingly simplelinear networks the underlyingmathematics of which (linearalgebra) is well-understood Given the present results it isclear that the potential of linear algebra for understandingthe lexicon and lexical processing has thus far been severelyunderestimated (the model as outlined in Figure 11 is amodular one for reasons of computational convenience andinterpretabilityThe different networks probably interact (seefor some evidence [47]) One such interaction is explored

28 Complexity

typing speaking

orthographic targets

trigrams

auditory targets

triphones

semantic vectors

TASA

auditory cues

FBS features

orthographic cues

trigrams

cochlea retina

To Ta

Ha

GoGa

KaS

Fa Fo

CoCa

Figure 11 Overview of the discriminative lexicon Input and output systems are presented in light gray the representations of the model areshown in dark gray Each of these representations is a matrix with its own set of features Arcs are labeled with the discriminative networks(transformationmatrices) thatmapmatrices onto each otherThe arc from semantic vectors to orthographic vectors is in gray as thismappingis not explored in the present study

in Baayen et al [85] In their production model for Latininflection feedback from the projected triphone paths backto the semantics (synthesis by analysis) is allowed so thatof otherwise equally well-supported paths the path that bestexpresses the targeted semantics is selected For informationflows between systems in speech production see Hickok[147]) In what follows we touch upon a series of issues thatare relevant for evaluating our model

Incremental learning In the present study we esti-mated the weights on the connections of these networkswith matrix operations but importantly these weights canalso be estimated incrementally using the learning rule ofWidrow and Hoff [44] further improvements in accuracyare expected when using the Kalman filter [151] As allmatrices can be updated incrementally (and this holds aswell for the matrix with semantic vectors which are alsotime-variant) and in theory should be updated incrementally

whenever information about the order of learning events isavailable the present theory has potential for studying lexicalacquisition and the continuing development of the lexiconover the lifetime [79]

Morphology without compositional operations Wehave shown that our networks are predictive for a rangeof experimental findings Important from the perspective ofdiscriminative linguistics is that there are no compositionaloperations in the sense of Frege and Russell [1 2 152] Thework on compositionality in logic has deeply influencedformal linguistics (eg [3 153]) and has led to the beliefthat the ldquoarchitecture of the language facultyrdquo is groundedin a homomorphism between a calculus (or algebra) ofsyntactic representations and a calculus (or algebra) based onsemantic primitives Within this tradition compositionalityarises when rules combining representations of form arematched with rules combining representations of meaning

Complexity 29

The approach of Marelli and Baroni [60] who derive thesemantic vector of a complex word from the semantic vectorof its base and a dedicated affix-specific mapping from thepertinent base words to the corresponding derived wordsis in the spirit of compositionality in that a change in form(through affixation) goes hand in hand with a change inmeaning (modeled with a linear mapping in semantic space)

The perspective on morphological processing developedin the present study breaks with this tradition Althoughsemantic vectors for inflected words are obtained by summa-tion of the semantic vectors of the base word and those ofits inflectional functions (see [85] for modeling of the richverbal paradigms of Latin) the form vectors correspondingto the resulting semantic vectors are not obtained by thesummation of vectors for chunks of form nor by any otheroperations on forms Attempts to align chunks of wordsrsquo formwith their semantic structures mdash typically represented bygraphs constructed over semantic primitives (see eg [154])mdash are destined to end in quagmires of arbitrary decisionsabout the number of inflectional classes required (Estonian30 or 300) what the chunks should look like (does Germanhave a morpheme d in its articles and demonstratives derdie das diese ) whether Semitic morphology requiresoperations on form on one or two tiers [155ndash157] andhow many affixal position slots are required for Athebascanlanguages such asNavajo [158] A language that is particularlypainful for compositional theories is Yelı Dnye an Indo-Pacific language spoken on Rossel island (in the south-east ofPapuaNewGuinea)The language has a substantial inventoryof over a 1000 function words used for verb agreementTypically these words are monosyllables that simultaneouslyspecify values for negation tense aspect person and numberof subject deixis evidentiality associated motion and coun-terfactuality Furthermore verbs typically have no less thaneight different stems the use of which depends on particularconstellations of inflectional functions [159 160] Yelı Dnyethus provides a striking counterexample to the hypothesisthat a calculus of meaning would be paralleled by a calculusof form

Whereas the theory of linear discriminative learningradically rejects the idea that morphological processing isgrounded in compositionality as defined in mainstream for-mal linguistic theory it does allow for the conjoint expressionin form of lexical meanings and inflectional functions andfor forms to map onto such conjoint semantic vectors Butjust asmultiple inflectional functions can be expressed jointlyin a single word multiple words can map onto nonconjointsemantic vectors as was illustrated above for named entityrecognition John Wiggam can be understood to refer to avery specific person without this person being a composi-tional function of the proper name John and the surnameWiggam In both these cases there is no isomorphy betweeninflectional functions and lexical meanings on the one handand chunks of form on the other hand

Of course conjoint vectors for lexical meanings andinflectional functions can be implemented andmade to workonly because a Latin verb form such as sapıvissemus isunderstood when building the semantic space S to expresslexomes for person (first) number (plural) voice (active)

tense (pluperfect) mood (subjunctive) and lexical meaning(to know) (see [85] for computational modeling details)Although this could loosely be described as a decomposi-tional approach to inflected words we prefer not to use theterm lsquodecompositionalrsquo as in formal linguistics this termas outlined above has a very different meaning A moreadequate terminology would be that the system is conjointrealizational for production and conjoint inferential for com-prehension (we note here that we have focused on temporallyconjoint realization and inference leaving temporally disjointrealization and inference for sequences of words and possiblycompounds and fixed expression for further research)

Naive versus linear discriminative learning There isone important technical difference between the learningengine of the current study ldl and naive discriminativelearning (ndl) Because ndl makes the simplifying assump-tion that outcomes are orthogonal the weights from the cuesto a given outcome are independent of the weights from thecues to the other outcomes This independence assumptionwhich motivates the word naive in naive discriminativelearning makes it possible to very efficiently update networkweights during incremental learning In ldl by contrast thisindependence no longer holds learning is no longer naiveWe therefore refer to our networks not as ndl networksbut simply as linear discriminative learning networks Actualincremental updating of ldl networks is computationallymore costly but further numerical optimization is possibleand implementations are underway

Scaling An important property of the present approachis that good results are obtained already for datasets ofmodestsize Our semantic matrix is derived from the tasa corpuswhich with only 10 million words is dwarfed by the 28 billionwords of the ukWaC corpus used by Marelli and Baroni [60]more words than any individual speaker can ever encounterin their lifetime Similarly Arnold et al [18] report goodresults for word identification when models are trained on20 hours of speech which contrasts with the huge volumesof speech required for training deep learning networks forspeech recognition Although performance increases some-whatwithmore training data [161] it is clear that considerableheadway can be made with relatively moderate volumesof speech It thus becomes feasible to train models onfor instance Australian English and New Zealand Englishusing already existing speech corpora and to implementnetworks that make precise quantitative predictions for howunderstanding is mediated by listenersrsquo expectations abouttheir interlocutors (see eg [162])

In this study we have obtained good results with asemantic matrix with a dimensionality of around 4000lexomes By itself we find it surprising that a semantic vectorspace can be scaffolded with a mere 4000 words But thisfindingmay also shed light on the phenomenon of childhoodamnesia [163 164] Bauer and Larkina [165] pointed outthat young children have autobiographical memories thatthey may not remember a few years later They argue thatthe rate of forgetting diminishes as we grow older andstabilizes in adulthood Part of this process of forgettingmay relate to the changes in the semantic matrix Cruciallywe expect the correlational structure of lexomes to change

30 Complexity

considerably over time as experience accumulates If this isindeed the case the semantic vectors for the lexomes foryoung children are different from the corresponding lexomesfor older children which will again differ from those ofadults As a consequence autobiographical memories thatwere anchored to the lexomes at a young age can no longerbe accessed as the semantic vectors to which these memorieswere originally anchored no longer exist

No examplars Given how we estimate the transforma-tion matrices with which we navigate between form andmeaning it might seem that our approach is in essence animplementation of an exemplar-based theory of the mentallexicon After all for instance the C119886 matrix specifies foreach auditory word form (exemplar) its pertinent frequencyband summary features However the computational imple-mentation of the present study should not be confused withthe underlying algorithmic conceptualization The underly-ing conceptualization is that learning proceeds incrementallytrial by trial with the weights of transformation networksbeing updated with the learning rule of Widrow and Hoff[44] In the mean trial-by-trial updating with the Widrow-Hoff learning rule results in the same expected connectionweights as obtained by the present estimates using thegeneralized inverse The important conceptual advantage ofincremental learning is however that there is no need to storeexemplars By way of example for auditory comprehensionacoustic tokens are given with the input to the learningsystemThese tokens the ephemeral result of interactingwiththe environment leave their traces in the transformationmatrix F119886 but are not themselves stored The same holdsfor speech production We assume that speakers dynamicallyconstruct the semantic vectors from their past experiencesrather than retrieving these semantic vectors from somedictionary-like fixed representation familiar from currentfile systems The dynamic nature of these semantic vectorsemerged loud and clear from the modeling of novel complexwords for speech production under cross-validation In otherwords our hypothesis is that all input and output vectors areephemeral in the sense that they represent temporary statesof the cognitive system that are not themselves represented inthat system but that leave their traces in the transformationmatrices that jointly characterize and define the cognitivesystem that we approximate with the discriminative lexicon

This dynamic perspective on the mental lexicon asconsisting of several coupled networks that jointly bringthe cognitive system into states that in turn feed into fur-ther networks (not modeled here) for sentence integration(comprehension) and articulation (production) raises thequestion about the status of units in this theory Units playan important role when constructing numeric vectors forform and meaning Units for letter trigrams and phonetrigrams are of course context-enriched letter and phoneunits Content lexomes as well as lexomes for derivationaland inflectional functions are crutches for central semanticfunctions ranging from functions realizing relatively concreteonomasiological functions (lsquothis is a dog not a catrsquo) toabstract situational functions allowing entities events andstates to be specified on the dimensions of time spacequantity aspect etc However crucial to the central thrust of

the present study there are no morphological symbols (unitscombining form andmeaning) normorphological formunitssuch as stems and exponents of any kind in the modelAbove we have outlined the many linguistic considerationsthat have led us not to want to incorporate such units aspart of our theory Importantly as there are no hidden layersin our model it is not possible to seek for lsquosubsymbolicrsquoreflexes of morphological units in patterns of activation overhidden layers Here we depart from earlier connectionistmodels which rejected symbolic representations but retainedthe belief that there should be morphological representationalbeit subsymbolic ones Seidenberg and Gonnerman [166]for instance discuss morphological representations as beinglsquointerlevel representationsrsquo that are emergent reflections ofcorrelations among orthography phonology and semanticsThis is not to say that themore agglutinative a language is themore the present model will seem to the outside observer tobe operating withmorphemes But themore a language tendstowards fusional or polysyntheticmorphology the less strongthis impression will be

Model complexity Next consider the question of howdifficult lexical processing actually is Sidestepping the com-plexities of the neural embedding of lexical processing [126147] here we narrow down this question to algorithmiccomplexity State-of-the-art models in psychology [37 114167] implementmany layers of hierarchically organized unitsand many hold it for an established fact that such units arein some sense psychologically real (see eg [19 91 168])However empirical results can be equally well understoodwithout requiring the theoretical construct of the morpheme(see eg [47 54 58 166 169 170]) The present studyillustrates this point for lsquoboundaryrsquo effects in written andoral production When paths in the triphone graph branchuncertainty increases and processing slows down Althoughobserved lsquoboundaryrsquo effects are compatible with a post-Bloomfieldian lexicon the very same effects arise in ourmodel even though morphemes do not figure in any way inour computations

Importantly the simplicity of the linear mappings in ourmodel of the discriminative lexicon allows us to sidestep theproblem that typically arise in computationally implementedfull-scale hierarchical systems namely that errors arisingat lower levels propagate to higher levels The major stepsforward made in recent end-to-end models in languageengineering suggest that end-to-end cognitive models arealso likely to perform much better One might argue thatthe hidden layers of deep learning networks represent thetraditional lsquohidden layersrsquo of phonemes morphemes andword forms mediating between form and meaning in lin-guistic models and offshoots thereof in psychology Howeverthe case of baboon lexical learning [12 13] illustrates thatthis research strategy is not without risk and that simplerdiscriminative networks can substantially outperform deeplearning networks [15] We grant however that it is conceiv-able that model predictions will improve when the presentlinear networks are replaced by deep learning networks whenpresented with the same input and output representationsused here (see Zhao et al [171] for a discussion of lossfunctions for deep learning targeting numeric instead of

Complexity 31

categorical output vectors) What is unclear is whethersuch networks will improve our understandingmdashthe currentlinear networks offer the analyst connection weights betweeninput and output representations that are straightforwardlylinguistically interpretable Where we think deep learningnetworks really will come into their own is bridging the gapbetween for instance the cochlea and retina on the one handand the heuristic representations (letter trigrams) that wehave used as a starting point for our functional theory of themappings between form and meaning

Network flexibility An important challenge for deeplearning networks designed to model human lexical process-ing is to keep the networks flexible and open to rapid updat-ing This openness to continuous learning is required notonly by findings on categorization [40 41 41 42] includingphonetic categorization [172 173] but is also visible in timeseries of reaction times Baayen and Hendrix [17] reportedimproved prediction of visual lexical decision reaction timesin the British Lexicon Project when a naive discriminativelearning network is updated from trial to trial simulating thewithin-experiment learning that goes on as subjects proceedthrough the experiment In natural speech the phenomenonof phonetic convergence between interlocutors [174] likewisebears witness to a lexicon that is constantly recalibratedin interaction with its environment (for the many ways inwhich interlocutorsrsquo speech aligns see [175]) Likewise therapidity with which listeners adapt to foreign accents [176]within a few minutes shows that lexical networks exhibitfast local optimization As deep learning networks typicallyrequire huge numbers of cycles through the training dataonce trained they are at risk of not having the requiredflexibility for fast local adaptation This risk is reduced for(incremental) wide learning as illustrated by the phoneticconvergence evidenced by the model of Arnold et al [18] forauditory comprehension (a very similar point was made byLevelt [177] in his critique of connectionist models in thenineties)

Open questions This initial study necessarily leavesmany questions unanswered Compounds and inflectedderived words have not been addressed and morphologicaloperations such as reduplication infixation and nonconcate-native morphology provide further testing grounds for thepresent approach We also need to address the processingof compounds with multiple constituents ubiquitous inlanguages as different as German Mandarin and EstonianFurthermore in actual utterances words undergo assimila-tion at their boundaries and hence a crucial test case forthe discriminative lexicon is to model the production andcomprehension of words in multi-word utterances For ourproduction model we have also completely ignored stresssyllabification and pitch which however likely has renderedthe production task more difficult than necessary

A challenge for discriminative morphology is the devel-opment of proper semantic matrices For languages withrich morphology such as Estonian current morphologicalparsers will identify case-inflected words as nominativesgenitives or partitives (among others) but these labelsfor inflectional forms do not correspond to the semanticfunctions encoded in these forms (see eg [178 179]) What

is required are computational tools that detect the appropriateinflectional lexomes for these words in the sentences orutterances in which they are embedded

A related problem specifically for the speech productionmodel is how to predict the strongly reduced word formsthat are rampant in conversational speech [111] Our auditorycomprehension network F119886 is confronted with reduced formsin training and the study of Arnold et al [18] indicates thatpresent identification accuracymay approximate human per-formance for single-word recognition Whereas for auditorycomprehension we are making progress what is required forspeech production is a much more detailed understanding ofthe circumstances under which speakers actually use reducedword forms The factors driving reduction may be in partcaptured by the semantic vectors but we anticipate thataspects of the unfolding discourse play an important role aswell which brings us back to the unresolved question of howto best model not isolated words but words in context

An important challenge for our theory which formalizesincremental implicit learning is how to address and modelthe wealth of explicit knowledge that speakers have abouttheir language We play with the way words sound in poetrywe are perplexed by wordsrsquo meanings and reinterpret themso that they make more sense (eg the folk etymology [180]reflected in modern English crawfish a crustacean and nota fish originally Middle English crevis compare ecrevissein French) we teach morphology in schools and we evenfind that making morphological relations explicit may bebeneficial for aphasic patients [181] The rich culture of worduse cannot but interact with the implicit system but howexactly this interaction unfolds andwhat its consequences areat both sides of the consciousunconscious divide is presentlyunclear

Although many questions remain our model is remark-ably successful in capturing many aspects of lexical andmorphological processing in both comprehension and pro-duction Since themodel builds on a combination of linguisti-callymotivated lsquosmartrsquo low-level representations for form andmeaning and large but otherwise straightforward andmathe-matically well-understood two-layer linear networks (to beclear networks without any hidden layers) the conclusionseems justified that algorithmically the ldquomental lexiconrdquomaybe much simpler than previously thought

Appendix

A Vectors Matrices and MatrixMultiplication

Figure 12 presents two data points 119886 and 119887 with coordinates(3 4) and (-2 -3) Arrows drawn from the origin to thesepoints are shown in blue with a solid and dashed linerespectively We refer to these points as vectors Vectors aredenoted with lower case letters in bold and we place the 119909coordinate of a point next to the 119910 coordinate

a = (3 4) b = (minus2 minus3) (A1)

32 Complexity

minus4 minus2 0 2 4

minus4

minus2

02

4

x

y

a

b

x

y

Figure 12 Points 119886 and 119887 can be transformed into points 119901 and 119902 bya linear transformation

We can represent a and b jointly by placing them next to eachother in a matrix For matrices we use capital letters in boldfont

A = ( 3 4minus2 minus3) (A2)

In addition to the points 119886 and 119887 Figure 12 also shows twoother datapoints 119909 and 119910 The matrix for these two points is

B = (minus5 24 minus1) (A3)

Let us assume that points 119886 and 119887 represent the forms oftwo words 1205961 and 1205962 and that the points 119909 and 119910 representthe meanings of these two words (below we discuss in detailhow exactly wordsrsquo forms and meanings can be representedas vectors of numbers) As morphology is the study of therelation between wordsrsquo forms and their meanings we areinterested in how to transform points 119886 and b into points119909 and 119910 and likewise in how to transform points 119909 and119910 back into points 119886 and 119887 The first transformation isrequired for comprehension and the second transformationfor production

We begin with the transformation for comprehensionFormally we have matrix A which we want to transforminto matrix B A way of mapping A onto B is by means ofa linear transformation using matrix multiplication For 2 times2 matrices matrix multiplication is defined as follows(1198861 11988621198871 1198872) (1199091 11990921199101 1199102) = (11988611199091 + 11988621199101 11988611199092 + 1198862119910211988711199091 + 11988721199101 11988711199092 + 11988721199102) (A4)

Such a mapping of A onto B is given by a matrix F

F = ( 1 2minus2 minus1) (A5)

and it is straightforward to verify that indeed( 3 4minus2 minus3) ( 1 2minus2 minus1) = (minus5 24 minus1) (A6)

Using matrix notation we can write

AF = B (A7)

Given A and B how do we obtain F The answeris straightforward but we need two further concepts theidentity matrix and the matrix inverse For multiplication ofnumbers the identity multiplication is multiplication with 1For matrices multiplication with the identity matrix I leavesthe locations of the points unchangedThe identitymatrix hasones on the main diagonal and zeros elsewhere It is easilyverified that indeed(1 00 1) (1199091 11990921199101 1199102) = (1199091 11990921199101 1199102) (1 00 1) = (1199091 11990921199101 1199102) (A8)

For numbers the inverse of multiplication with 119904 is dividingby 119904 (1119904 ) (119904119909) = 119909 (A9)

For matrices the inverse of a square matrix X is that matrixXminus1 such that their product is the identity matrix

Xminus1X = XXminus1 = I (A10)

For nonsquare matrices the inverse Yminus1 is defined such that

Yminus1 (YX) = (XY) 119884minus1 = X (A11)

We find the square matrix F that maps the square matrices Aonto B as follows

AF = B

Aminus1AF = Aminus1B

IF = Aminus1B

F = Aminus1B(A12)

Since for the present example A and its inverse happen to beidentical (Aminus1 = A) we obtain (A6)

Fmaps wordsrsquo form vectors onto wordsrsquo semantic vectorsLet us now consider the reverse and see how we can obtainwordsrsquo form vectors from wordsrsquo semantic vectors That isgiven B we want to find that matrix G which maps B ontoA ie

BG = A (A13)

Solving this equation proceeds exactly as above

BG = A

Bminus1BG = Bminus1A

IG = Bminus1A

G = Bminus1A(A14)

Complexity 33

The inverse of B is

Bminus1 = (13 2343 53) (A15)

and hence we have

(13 2343 53) ( 3 4minus2 minus3) = (minus13 minus2323 13 ) = G (A16)

The inverse of a matrix needs not exist A square matrixthat does not have an inverse is referred to as a singularmatrix Almost all matrices with which we work in theremainder of this study are singular For singular matricesan approximation of the inverse can be used such as theMoore-Penrose generalized inverse (In R the generalizedinverse is available in theMASS package function ginv Thenumeric library that is used by ginv is LAPACK availableat httpwwwnetliborglapack) In this study wedenote the generalized inverse of a matrix X by X1015840

The examples presented here are restricted to 2 times 2matrices but themathematics generalize tomatrices of largerdimensions When multiplying two matrices X and Y theonly constraint is that the number of columns ofX is the sameas the number of rows ofYThe resultingmatrix has the samenumber of rows as X and the same number of columns as Y

In the present study the rows of matrices represent wordsand columns the features of thesewords For instance the rowvectors of A can represent wordsrsquo form features and the rowvectors of and B the corresponding semantic features Thefirst row vector of A is mapped onto the first row vector ofB as follows (3 4) ( 1 2minus2 minus1) = (minus5 2) (A17)

A transformationmatrix such as F can be represented as atwo-layer networkThe network corresponding to F is shownin Figure 13 When the input vector (3 4) is presented to thenetwork the nodes 1198941 and 1198942 are set to 3 and 4 respectively Toobtain the activations 1199001 = minus5 and 1199002 = 2 for the output vector(minus5 2) we cumulate the evidence at the input weighted bythe connection strengths specified on the edges of the graphThus for 1199001 we obtain the value 3 times 1 + 4 times minus2 = minus5 and for1199002 we have 3 times 2 + 4 times minus1 = 2 In other words the networkmaps the input vector (3 4) onto the output vector (minus5 2)

An important property of linear maps is that they areproductive We can present a novel form vector say

s = (2 2) (A18)

to the network and it will map this vector onto a novelsemantic vector Using matrices this new semantic vector isstraightforwardly calculated(2 2) ( 1 2minus2 minus1) = (minus2 2) (A19)

1 2 minus2 minus1

i1 i2

o1 o2

semantic vectors come out here

form features go in here

connection weightsare the transformationmatrix

Figure 13 The linear network corresponding to the transformationmatrix F There are two input nodes 1198941 and 1198942 (in blue)and twooutput nodes 1199001 and 1199002 in red When the input vector (3 4) ispresented to the network the value at 1199001 is 3 times 1 + 4 times minus2 = minus5and that at 1199002 is 3 times 2 + 4 times minus1 = 2 The network maps the point(3 4) onto the point (minus5 2)B Graph-Based Triphone Sequencing

The hypothesis underlying graph-based triphone sequencingis that in the directed graph that has triphones as verticesand an edge between any two vertices that overlap (iesegments 2 and 3 of triphone 1 are identical to segments 1and 2 of triphone 2) the path from a wordrsquos initial triphone(the triphone starting with ) to a wordrsquos final triphone (thetriphone ending with a ) is the path receiving the bestsupport of all possible paths from the initial to the finaltriphone Unfortunately the directed graph containing allvertices and edges is too large tomake computations tractablein a reasonable amount of computation time The followingalgorithm is a heuristic algorithm that first collects potentiallyrelevant vertices and edges and then calculates all pathsstarting fromany initial vertex using theall simple pathsfunction from the igraph package [127] As a first step allvertices that are supported by the stem and whole wordwith network support exceeding 01 are selected To theresulting set those vertices are added that are supported bythe affix which is conditional on these vertices having anetwork support exceeding 095 A directed graph can nowbe constructed and for each initial vertex all paths startingat these vertices are calculated The subset of those pathsreaching a final vertex is selected and the support for eachpath is calculated As longer paths trivially will have greatersupport the support for a path is weighted for its lengthsimply by dividing the raw support by the path length Pathsare ranked by weighted path support and the path withmaximal support is selected as the acoustic image drivingarticulation

It is possible that no path from an initial vertex to a finalvertex is found due to critical boundary triphones not beinginstantiated in the data on which the model was trainedThis happens when the model is assessed on novel inflected

34 Complexity

nobo

dyso

met

hing

ever

ythi

ngno

thin

gev

eryo

nean

ythi

ngan

yone

som

eone

hers

him his

via

than I

exce

ptou

tside

thei

rsw

hoev

erth

ence

min

ein

side

amon

gst its

thei

rou

rm

yyo

uyo

ur her

she

your

selv

esw

hate

ver us we it

they he

them m

em

yself

him

self

hers

elfou

rselv

esyo

urse

lfits

elfth

emse

lves

abou

tw

ithou

tw

ithin

beyo

ndam

ong

betw

een

thro

ugho

utdu

ring

since by

from in of

with at fo

r tobe

low

abov

e up offdo

wn

upon on

agai

nst

tow

ard

tow

ards

arou

ndne

aral

ong

unde

rbe

hind

besid

eth

roug

hon

toin

toov

erac

ross

nobodysomethingeverythingnothingeveryoneanythinganyonesomeonehershimhisviathanIexceptoutsidetheirswhoeverthencemineinsideamongstitstheirourmyyouyourhersheyourselveswhateverusweittheyhethemmemyselfhimselfherselfourselvesyourselfitselfthemselvesaboutwithoutwithinbeyondamongbetweenthroughoutduringsincebyfrominofwithatfortobelowaboveupoffdownupononagainsttowardtowardsaroundnearalongunderbehindbesidethroughontointooveracross

minus01 0 01 02Value

Color Keyand Histogram

050

015

0025

00C

ount

Figure 14 Heatmap for the correlations of the semantic vectors of pronouns and prepositions Note that quantifier pronouns form a groupof their own that prepositions and the other pronouns form distinguisable groups and that within groups further subclusters are visible egforwe and us she and her and across and over All diagonal elements represent correlations equal to 1 (this is not brought out by the heatmapwhich color-codes diagonal elements with the color for the highest correlation of the range we specified 025)

and derived words under cross-validationTherefore verticesand edges are added to the graph for all nonfinal verticesthat are at the end of any of the paths starting at any initialtriphone Such novel vertices and edges are assigned zeronetwork support Typically paths from the initial vertex tothe final vertex can now be constructed and path support isevaluated as above

For an algorithm that allows triphone paths to includecycles which may be the case in languages with much richer

morphology than English see Baayen et al [85] and for codethe R packageWpmWithLdl described therein

C A Heatmap for Function Words(Pronouns and Prepositions)

The heatmap for the correlations of the semantic vectors ofpronouns and prepositions is presented in Figure 14

Complexity 35

Data Availability

With the exception of the data from theDistributed Little RedHen Lab all primary data are publicly available from the citedsources

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The authors are indebted to Geoff Hollis Jessie Nixon ChrisWestbury and LuisMienhardt for their constructive feedbackon earlier versions of this manuscript This research wassupported by an ERC advancedGrant (no 742545) to the firstauthor

References

[1] G Frege ldquoBegriffsschrift a formula language modeled uponthat of arithmetic for pure thoughtrdquo in From Frege to GodelA Source Book in Mathematical Logic 1879-1931 pp 1ndash82 1967

[2] B RussellAn Inquiry intoMeaning andTruth Allen andUnwinLondon UK 1942

[3] R Montague ldquoThe proper treatment of quantification in ordi-nary englishrdquo in Approaches to Natural Language pp 221ndash242Springer 1973

[4] N Chomsky Syntactic Structures vol 33 Mouton The Hague1957

[5] W J M Levelt Formal Grammars in Linguistics And Psycholin-guistics Volume 1 An Introduction to the Theory of FormalLanguages and Automata Volume 2 Applications in LinguisticTheory Volume 3 Psycholinguistic Applications John BenjaminsPublishing 2008

[6] D F Kleinschmidt and T Florian Jaeger ldquoRobust speechperception Recognize the familiar generalize to the similar andadapt to the novelrdquo Psychological Review vol 122 no 2 pp 148ndash203 2015

[7] L Breiman ldquoRandom forestsrdquo Machine Learning vol 45 no 1pp 5ndash32 2001

[8] C Strobl A-L Boulesteix A Zeileis and T Hothorn ldquoBiasin random forest variable importance measures illustrationssources and a solutionrdquo BMC Bioinformatics vol 8 2007

[9] A Hannun C Case J Casper et al ldquoDeep speechScaling up end-to-end speech recognitionrdquo 2014httpsarxivorgabs14125567

[10] Z Tuske P Golik R Schluter and H Ney ldquoAcoustic modelingwith deep neural networks using raw time signal for lvcsrrdquo inProceedings of the INTERSPEECH pp 890ndash894 2014

[11] J Schmidhuber ldquoDeep learning in neural networks anoverviewrdquo Neural Networks vol 61 pp 85ndash117 2015

[12] T Hannagan J C Ziegler S Dufau J Fagot and J GraingerldquoDeep learning of orthographic representations in baboonsrdquoPLoS ONE vol 9 no 1 2014

[13] J Grainger S Dufau M Montant J C Ziegler and J FagotldquoOrthographic processing in baboons (Papio papio)rdquo Sciencevol 336 no 6078 pp 245ndash248 2012

[14] D Scarf K Boy A U Reinert J Devine O Gunturkun andM Colombo ldquoOrthographic processing in pigeons (columba

livia)rdquo Proceedings of the National Academy of Sciences vol 113no 40 pp 11272ndash11276 2016

[15] M Linke F Broker M Ramscar and R H Baayen ldquoArebaboons learning ldquoorthographicrdquo representations Probablynotrdquo PLoS ONE vol 12 no 8 2017

[16] D H Hubel and T N Wiesel ldquoReceptive fields binocularinteraction and functional architecture in the catrsquos visualcortexrdquo The Journal of Physiology vol 160 pp 106ndash154 1962

[17] R H Baayen and P Hendrix ldquoTwo-layer networks non-linear separation and human learningrdquo in From Semantics toDialectometry Festschrift in honor of JohnNerbonne Tributes 32M Wieling M Kroon G Van Noord and G Bouma Eds pp13ndash22 College Publications 2017

[18] D Arnold F Tomaschek K Sering F Lopez and R H BaayenldquoWords from spontaneous conversational speech can be rec-ognized with human-like accuracy by an error-driven learningalgorithm that discriminates between meanings straight fromsmart acoustic features bypassing the phoneme as recognitionunitrdquo PLoS ONE vol 12 no 4 2017

[19] P Zwitserlood ldquoProcessing and representation of morphologi-cal complexity in native language comprehension and produc-tionrdquo in The Construction of Words G E Booij Ed pp 583ndash602 Springer 2018

[20] G Booij ldquoConstruction morphologyrdquo Language and LinguisticsCompass vol 4 no 7 pp 543ndash555 2010

[21] J P Blevins ldquoWord-based morphologyrdquo Journal of Linguisticsvol 42 no 3 pp 531ndash573 2006

[22] G Stump Inflectional Morphology A Theory of ParadigmStructure Cambridge University Press 2001

[23] R Beard ldquoOn the extent and nature of irregularity in thelexiconrdquo Lingua vol 42 no 4 pp 305ndash341 1977

[24] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1974

[25] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1991

[26] C Hockett ldquoThe origin of speechrdquo Scientific American vol 203pp 89ndash97 1960

[27] M Erelt Estonian Language M Erelt Ed Estonian AcademyPublishers Tallinn Estonia 2003

[28] P H Matthews Grammatical Theory in the United States fromBloomfield to Chomsky Cambridge University Press Cam-bridge 1993

[29] N Chomsky andMHalleTheSound Pattern of English Harperand Row NY USA 1968

[30] T Landauer and S Dumais ldquoA solution to Platorsquos problem Thelatent semantic analysis theory of acquisition induction andrepresentation of knowledgerdquo Psychological Review vol 104 no2 pp 211ndash240 1997

[31] W Weaver ldquoTranslationrdquo in Machine Translation of LanguagesFourteen Essays W N Locke and A D Booth Eds pp 15ndash23MIT Press Cambridge 1955

[32] J R Firth Selected Papers of J R Firth 1952-59 IndianaUniversity Press 1968

[33] K Lund and C Burgess ldquoProducing high-dimensional seman-tic spaces from lexical co-occurrencerdquo Behavior Research Meth-ods Instruments and Computers vol 28 no 2 pp 203ndash2081996

[34] C Shaoul and C Westbury ldquoExploring lexical co-occurrencespace using HiDExrdquo Behavior Research Methods vol 42 no 2pp 393ndash413 2010

36 Complexity

[35] T Mikolov I Sutskever K Chen G Corrado and J DeanldquoDistributed representations of words and phrases and theircompositionalityrdquo inAdvances in Neural Information ProcessingSystems pp 3111ndash3119 2013

[36] A Roelofs ldquoThe WEAVER model of word-form encoding inspeech productionrdquo Cognition vol 64 no 3 pp 249ndash284 1997

[37] W J M Levelt A Roelofs and A S Meyer ldquoA theory of lexicalaccess in speech productionrdquoBehavioral and Brain Sciences vol22 no 1 pp 1ndash38 1999

[38] M Taft ldquoInteractive-activation as a Framework for Under-standing Morphological Processingrdquo Language and CognitiveProcesses vol 9 no 3 pp 271ndash294 1994

[39] R Schreuder and R H Baayen ldquoModeling morphologicalprocessingrdquo inMorphological Aspects of Language Processing LB Feldman Ed pp 131ndash154 Lawrence ErlbaumHillsdale NewJersey USA 1995

[40] B C Love D L Medin and T M Gureckis ldquoSUSTAIN ANetwork Model of Category Learningrdquo Psychological Reviewvol 111 no 2 pp 309ndash332 2004

[41] C J Marsolek ldquoWhat antipriming reveals about primingrdquoTrends in Cognitive Sciences vol 12 no 5 pp 176ndash181 2008

[42] M Ramscar and R Port ldquoCategorization (without categories)rdquoin Handbook of Cognitive Linguistics E Dabrowska and DDivjak Eds pp 75ndash99 De Gruyter Berlin Germany 2015

[43] R A Rescorla and A R Wagner ldquoA theory of Pavlovianconditioning Variations in the effectiveness of reinforcementand nonreinforcementrdquo in Classical Conditioning II CurrentResearch and Theory A H Black and W F Prokasy Eds pp64ndash99 Appleton Century Crofts NY USA 1972

[44] B Widrow and M E Hoff ldquoAdaptive switching circuitsrdquo 1960WESCON Convention Record Part IV pp 96ndash104 1960

[45] R H Baayen R Schreuder and R Sproat ldquoMorphology inthe Mental Lexicon A Computational Model for Visual WordRecognitionrdquo in Lexicon Development for Speech and LanguageProcessing F van Eynde and D Gibbon Eds pp 267ndash291Kluwer Academic Publishers 2000

[46] M Taft ldquoA morphological-decomposition model of lexicalrepresentationrdquo Linguistics vol 26 no 4 pp 657ndash667 1988

[47] MWHarmandM S Seidenberg ldquoComputing themeanings ofwords in reading Cooperative division of labor between visualand phonological processesrdquo Psychological Review vol 111 no3 pp 662ndash720 2004

[48] P C Trimmer J M McNamara A Houston and J A RMarshall ldquoDoes natural selection favour the Rescorla-Wagnerrulerdquo Journal of Theoretical Biology vol 302 pp 39ndash52 2012

[49] R R Miller R C Barnet and N J Grahame ldquoAssessment ofthe Rescorla-Wagner modelrdquo Psychological Bulletin vol 117 no3 pp 363ndash386 1995

[50] L J Kamin ldquoPredictability surprise attention and condition-ingrdquo in Punishment and Aversive Behavior B A Campbell andRMChurch Eds pp 276ndash296Appleton-Century-CroftsNYUSA 1969

[51] R A Rescorla ldquoPavlovian conditioning Itrsquos not what you thinkit isrdquo American Psychologist vol 43 no 3 pp 151ndash160 1988

[52] M Ramscar D Yarlett M Dye K Denny and K ThorpeldquoThe Effects of Feature-Label-Order and Their Implications forSymbolic Learningrdquo Cognitive Science vol 34 no 6 pp 909ndash957 2010

[53] M Ramscar and D Yarlett ldquoLinguistic self-correction in theabsence of feedback A new approach to the logical problem oflanguage acquisitionrdquo Cognitive Science vol 31 no 6 pp 927ndash960 2007

[54] R H Baayen P Milin D F ETHurđevic P Hendrix and MMarelli ldquoAn Amorphous Model for Morphological Process-ing in Visual Comprehension Based on Naive DiscriminativeLearningrdquo Psychological Review vol 118 no 3 pp 438ndash4812011

[55] F M Del Prado Martın R Bertram T Haikio R Schreuderand R H Baayen ldquoMorphological family size in a morphologi-cally rich languageThe case of Finnish compared to Dutch andHebrewrdquo Journal of Experimental Psychology LearningMemoryand Cognition vol 30 pp 1271ndash1278 2004

[56] P Milin D F Durdevic and F M Del Prado Martın ldquoThesimultaneous effects of inflectional paradigms and classes onlexical recognition Evidence from Serbianrdquo Journal of Memoryand Language vol 60 no 1 pp 50ndash64 2009

[57] B K Bergen ldquoThe psychological reality of phonaesthemesrdquoLanguage vol 80 no 2 pp 290ndash311 2004

[58] P Milin L B Feldman M Ramscar P Hendrix and R HBaayen ldquoDiscrimination in lexical decisionrdquo PLoS ONE vol 12no 2 2017

[59] R H Baayen P Milin and M Ramscar ldquoFrequency in lexicalprocessingrdquo Aphasiology vol 30 no 11 pp 1174ndash1220 2016

[60] M Marelli and M Baroni ldquoAffixation in semantic space Mod-eling morpheme meanings with compositional distributionalsemanticsrdquo Psychological Review vol 122 no 3 pp 485ndash5152015

[61] T Sering PMilin andRH Baayen ldquoLanguage comprehensionas amultiple label classification problemrdquo StatisticaNeerlandicapp 1ndash15 2018

[62] R Kaye and RWilson Linear Algebra Oxford University Press1998

[63] S H Ivens and B L Koslin Demands for Reading LiteracyRequire New Accountability Methods Touchstone Applied Sci-ence Associates 1991

[64] T K Landauer P W Foltz and D Laham ldquoAn introduction tolatent semantic analysisrdquoDiscourse Processes vol 25 no 2-3 pp259ndash284 1998

[65] H Schmid ldquoImprovements in Part-of-Speech Tagging with anApplication to Germanrdquo in Proceedings of the ACL SIGDAT-Workshop pp 13ndash25 Dublin Ireland 1995

[66] R H Baayen R Piepenbrock and L Gulikers The CELEXLexical Database (CD-ROM) Linguistic Data ConsortiumUniversity of Pennsylvania Philadelphia PA 1995

[67] J Mitchell and M Lapata ldquoVector-based models of semanticcompositionrdquo in Proceedings of the ACL pp 236ndash244 2008

[68] A Lazaridou M Marelli R Zamparelli and M BaronildquoCompositionally derived representations of morphologicallycomplex words in distributional semanticsrdquo in Proceedings ofthe ACL (1) pp 1517ndash1526 2013

[69] R Cotterell H Schutze and J Eisner ldquoMorphological smooth-ing and extrapolation of word embeddingsrdquo in Proceedings ofthe 54th Annual Meeting of the Association for ComputationalLinguistics vol 1 pp 1651ndash1660 2016

[70] B D Zeller S Pado and J Snajder ldquoTowards semantic valida-tion of a derivational lexiconrdquo in Proceedings of the COLING2014 the 25th International Conference on Computational Lin-guistics Technical Papers pp 1728ndash1739 2014

[71] H Marchand The Categories and Types of Present-Day EnglishWord Formation A Synchronic-Diachronic Approach BeckrsquoscheVerlagsbuchhandlung Munchen Germany 1969

[72] L Bauer EnglishWord-Formation Cambridge University PressCambridge 1983

Complexity 37

[73] I PlagWord-Formation in English CambridgeUniversity PressCambridge 2003

[74] R Beard Lexeme-Morpheme Base Morphology A GeneralTheory of Inflection and Word Formation State University ofNew York Press Albany NY USA 1995

[75] K Geeraert J Newman and R H Baayen ldquoIdiom VariationExperimental Data and a Blueprint of a Computational ModelrdquoTopics in Cognitive Science vol 9 no 3 pp 653ndash669 2017

[76] C Shaoul S Bitschau N Schilling et al ldquondl2 Naive discrim-inative learning an implementation in R R packagerdquo 2015

[77] P Milin D Divjak and R H Baayen ldquoA learning perspectiveon individual differences in skilled reading Exploring andexploiting orthographic and semantic discrimination cuesrdquoJournal of Experimental Psychology Learning Memory andCognition vol 43 no 11 pp 1730ndash1751 2017

[78] M Ramscar P Hendrix C Shaoul P Milin and R H BaayenldquoNonlinear dynamics of lifelong learning the myth of cognitivedeclinerdquo Topics in Cognitive Science vol 6 pp 5ndash42 2014

[79] M Ramscar C C Sun P Hendrix and R H Baayen ldquoTheMis-measurement of Mind Life-Span Changes in Paired-Associate-Learning Scores Reflect the ldquoCostrdquo of Learning Not CognitiveDeclinerdquo Psychological Science vol 28 no 8 pp 1171ndash1179 2017

[80] G desRosiers and D Ivison ldquoPaired associate learning Form 1and form 2 of the Wechsler memory scalerdquo Archives of ClinicalNeuropsychology vol 3 no 1 pp 47ndash67 1988

[81] E Bruni N K Tran andM Baroni ldquoMultimodal distributionalsemanticsrdquo Journal of Artificial Intelligence Research vol 49 pp1ndash47 2014

[82] W N Venables and B D RipleyModern Applied Statistics withS-PLUS Springer NY USA 2002

[83] A B Warriner V Kuperman and M Brysbaert ldquoNorms ofvalence arousal and dominance for 13915 English lemmasrdquoBehavior Research Methods vol 45 no 4 pp 1191ndash1207 2013

[84] D Kastovsky ldquoProductivity in word formationrdquo Linguistics vol24 no 3 pp 585ndash600 1986

[85] R H Baayen Y Chuang and J P Blevins ldquoInflectionalmorphology with linear mappingsrdquoTheMental Lexicon vol 13no 2 pp 232ndash270 2018

[86] Burnage CELEX A Guide for Users Centre for Lexical Infor-mation Nijmegen The Netherlands 1988

[87] C McBride-Chang ldquoModels of Speech Perception and Phono-logical Processing in Readingrdquo Child Development vol 67 no4 pp 1836ndash1856 1996

[88] G C Van Orden H Kloos et al ldquoThe question of phonologyand readingrdquo inThe Science of Reading A Handbook pp 61ndash782005

[89] D Jared and K OrsquoDonnell ldquoSkilled adult readers activate themeanings of high-frequency words using phonology Evidencefrom eye trackingrdquoMemoryampCognition vol 45 no 2 pp 334ndash346 2017

[90] G S Dell ldquoA Spreading-Activation Theory of Retrieval inSentence Productionrdquo Psychological Review vol 93 no 3 pp283ndash321 1986

[91] A Marantz ldquoNo escape from morphemes in morphologicalprocessingrdquo Language and Cognitive Processes vol 28 no 7 pp905ndash916 2013

[92] M Bozic W D Marslen-Wilson E A Stamatakis M HDavis and L K Tyler ldquoDifferentiating morphology formand meaning Neural correlates of morphological complexityrdquoCognitive Neuroscience vol 19 no 9 pp 1464ndash1475 2007

[93] R F Port andA P Leary ldquoAgainst formal phonologyrdquoLanguagevol 81 no 4 pp 927ndash964 2005

[94] J P Blevins ldquoStems and paradigmsrdquo Language vol 79 no 4 pp737ndash767 2003

[95] R Forsyth and D Holmes ldquoFeature-finding for test classifica-tionrdquo Literary and Linguistic Computing vol 11 no 4 pp 163ndash174 1996

[96] H Pham and R H Baayen ldquoVietnamese compounds showan anti-frequency effect in visual lexical decisionrdquo LanguageCognition and Neuroscience vol 30 no 9 pp 1077ndash1095 2015

[97] L Cohen and S Dehaene ldquoVentral and dorsal contributions toword readingrdquo inTheCognitive Neurosciences M S GazzanigaEd pp 789ndash804 Massachusetts Institute of Technology 2009

[98] E Keuleers P Lacey K Rastle and M Brysbaert ldquoThe BritishLexicon Project Lexical decision data for 28730 monosyllabicand disyllabic English wordsrdquo Behavior Research Methods vol44 no 1 pp 287ndash304 2012

[99] R K Wagner J K Torgesen and C A Rashotte ldquoDevel-opment of reading-related phonological processing abilitiesnew evidence of bidirectional causality from a latent variablelongitudinal studyrdquoDevelopmental Psychology vol 30 no 1 pp73ndash87 1994

[100] K F EWong andH-C Chen ldquoOrthographic and phonologicalprocessing in reading Chinese text Evidence from eye fixa-tionsrdquo Language and Cognitive Processes vol 14 no 5-6 pp461ndash480 1999

[101] R L Newman D Jared and C A Haigh ldquoDoes phonologyplay a role when skilled readers read high-frequency wordsEvidence from ERPsrdquo Language and Cognitive Processes vol 27no 9 pp 1361ndash1384 2012

[102] D Jared J Ashby S J Agauas and B A Levy ldquoPhonologicalactivation of word meanings in grade 5 readersrdquo Journal ofExperimental Psychology LearningMemory andCognition vol42 no 4 pp 524ndash541 2016

[103] T Bitan A Kaftory A Meiri-Leib Z Eviatar and O PelegldquoPhonological ambiguity modulates resolution of semanticambiguity during reading An fMRI study of Hebrewrdquo Neu-ropsychology vol 31 no 7 pp 759ndash777 2017

[104] D Jared and S Bainbridge ldquoReading homophone puns Evi-dence from eye trackingrdquo Canadian Journal of ExperimentalPsychology vol 71 no 1 pp 2ndash13 2017

[105] S Amenta M Marelli and S Sulpizio ldquoFrom sound tomeaning Phonology-to-Semantics mapping in visual wordrecognitionrdquo Psychonomic Bulletin amp Review vol 24 no 3 pp887ndash893 2017

[106] M Perrone-Bertolotti J Kujala J R Vidal et al ldquoHow silent issilent reading intracerebral evidence for top-down activationof temporal voice areas during readingrdquo The Journal of Neuro-science vol 32 no 49 pp 17554ndash17562 2012

[107] B Yao P Belin andC Scheepers ldquoSilent reading of direct versusindirect speech activates voice-selective areas in the auditorycortexrdquo Journal of Cognitive Neuroscience vol 23 no 10 pp3146ndash3152 2011

[108] M Coltheart B Curtis P Atkins and M Haller ldquoMod-els of Reading Aloud Dual-Route and Parallel-Distributed-Processing Approachesrdquo Psychological Review vol 100 no 4pp 589ndash608 1993

[109] M Coltheart ldquoModeling reading The dual-route approachrdquo inThe Science of Reading A Handbook pp 6ndash23 2005

38 Complexity

[110] M Pitt K Johnson E Hume S Kiesling and W RaymondldquoThe Buckeye corpus of conversational speech Labeling con-ventions and a test of transcriber reliabilityrdquo Speech Communi-cation vol 45 no 1 pp 89ndash95 2005

[111] K Johnson ldquoMassive reduction in conversational AmericanEnglishrdquo in Proceedings of the Spontaneous Speech Data andAnalysis Proceedings of the 1st Session of the 10th InternationalSymposium pp 29ndash54 The National International Institute forJapanese Language Tokyo Japan 2004

[112] C Phillips ldquoLevels of representation in the electrophysiology ofspeech perceptionrdquo Cognitive Science vol 25 no 5 pp 711ndash7312001

[113] R L Diehl A J Lotto and L L Holt ldquoSpeech perceptionrdquoAnnual Review of Psychology vol 55 pp 149ndash179 2004

[114] D Norris and J M McQueen ldquoShortlist B a Bayesian model ofcontinuous speech recognitionrdquo Psychological Review vol 115no 2 pp 357ndash395 2008

[115] S Hawkins ldquoRoles and representations of systematic finephonetic detail in speech understandingrdquo Journal of Phoneticsvol 31 no 3-4 pp 373ndash405 2003

[116] CCucchiarini andH Strik ldquoAutomatic Phonetic TranscriptionAn overviewrdquo in Proceedings of the 15th ICPhS pp 347ndash350Barcelona Spain 2003

[117] K JohnsonTheAuditoryPerceptual Basis for Speech Segmenta-tion Ohio State University Working Papers in Linguistics 1997

[118] H Fletcher ldquoAuditory patternsrdquo Reviews of Modern Physics vol12 no 1 pp 47ndash65 1940

[119] D Arnold ldquoAcousticndlcoder Coding sound files for use withndl R package version 101rdquo 2017

[120] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2016

[121] J Pickett and I Pollack ldquoIntelligibility of excerpts from fluentspeech Effects of rate of utterance and duration of excerptrdquoLanguage and Speech vol 6 pp 151ndash164 1963

[122] L Shockey ldquoPerception of reduced forms by non-native speak-ers of Englishrdquo in Sound Patterns of Spontaneous Speech DDuez Ed pp 97ndash100 ESCA Aix 1998

[123] M Ernestus H Baayen and R Schreuder ldquoThe recognition ofreduced word formsrdquo Brain and Language vol 81 no 1-3 pp162ndash173 2002

[124] R H Baayen C Shaoul J Willits and M Ramscar ldquoCom-prehension without segmentation a proof of concept withnaive discriminative learningrdquo Language Cognition and Neu-roscience vol 31 no 1 pp 106ndash128 2016

[125] C P Browman and L Goldstein ldquoArticulatory phonology Anoverviewrdquo Phonetica vol 49 no 3-4 pp 155ndash180 1992

[126] J A Tourville and F H Guenther ldquoThe DIVA model A neuraltheory of speech acquisition and productionrdquo Language andCognitive Processes vol 26 no 7 pp 952ndash981 2011

[127] G Csardi and T Nepusz ldquoThe igraph software package forcomplex network researchrdquo InterJournal Complex Systems vol1695 no 5 pp 1ndash9 2006

[128] M Ernestus and R H Baayen ldquoPredicting the unpredictableInterpreting neutralized segments in Dutchrdquo Language vol 79no 1 pp 5ndash38 2003

[129] R Weingarten G Nottbusch and U Will ldquoMorphemes sylla-bles and graphemes in written word productionrdquo inMultidisci-plinary Approaches to Speech Production T Pechmann and CHabel Eds pp 529ndash572 Mouton de Gruyter Berlin Germany2004

[130] R Bertram F E Toslashnnessen S Stromqvist J Hyona andP Niemi ldquoCascaded processing in written compound wordproductionrdquo Frontiers in Human Neuroscience vol 9 pp 1ndash102015

[131] T Cho ldquoEffects of morpheme boundaries on intergesturaltiming Evidence from Koreanrdquo Phonetica vol 58 no 3 pp129ndash162 2001

[132] S N Wood Generalized Additive Models Chapman ampHallCRC NY USA 2017

[133] M Seidenberg ldquoSublexical structures in visual word recogni-tion Access units or orthographic redundancyrdquo in Attentionand Performance XII M Coltheart Ed pp 245ndash264 LawrenceErlbaum Associates Hove 1987

[134] J M McQueen ldquoSegmentation of continuous speech usingphonotacticsrdquo Journal of Memory and Language vol 39 no 1pp 21ndash46 1998

[135] J B Hay Causes and Consequences of Word Structure Rout-ledge NY USA London UK 2003

[136] J B Hay ldquoFrom speech perception to morphology Affix-ordering revisitedrdquo Language vol 78 pp 527ndash555 2002

[137] J B Hay and R H Baayen ldquoPhonotactics parsing and produc-tivityrdquo Italian Journal of Linguistics vol 15 no 1 pp 99ndash1302003

[138] J Hay J Pierrehumbert and M Beckman ldquoSpeech perceptionwell-formedness and the statistics of the lexiconrdquo Papers inlaboratory phonology VI pp 58ndash74 2004

[139] M Ferro C Marzi and V Pirrelli ldquoA self-organizing modelof word storage and processing implications for morphologylearningrdquo Lingue e Linguaggio vol 10 no 2 pp 209ndash226 2011

[140] F ChersiM Ferro G Pezzulo andV Pirrelli ldquoTopological self-organization and prediction learning support both action andlexical chains in the brainrdquo Topics in Cognitive Science vol 6no 3 pp 476ndash491 2014

[141] V Pirrelli M Ferro and C Marzi ldquoComputational complexityof abstractive morphologyrdquo in Understanding and MeasuringMorphological Complexity M Bearman D Brown and G GCorbett Eds pp 141ndash166 Oxford University Press 2015

[142] C Marzi M Ferro and V Pirrelli ldquoIs inflectional irregularitydysfunctional to human processingrdquo in Abstract Booklet TheMental Lexicon 2018 V Kuperman Ed University of AlbertaEdmonton 2018

[143] E Keuleers M Stevens P Mandera and M Brysbaert ldquoWordknowledge in the crowd Measuring vocabulary size and wordprevalence in a massive online experimentrdquo The QuarterlyJournal of Experimental Psychology vol 68 no 8 pp 1665ndash16922015

[144] M Ernestus Voice Assimilation And Segment Reduction inCasual Dutch A Corpus-Based Study of the Phonology-PhoneticsInterface LOT Utrecht the Netherlands 2000

[145] R Kemps M Ernestus R Schreuder and H Baayen ldquoProcess-ing reducedword formsThe suffix restoration effectrdquoBrain andLanguage vol 90 no 1-3 pp 117ndash127 2004

[146] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) vol 1 pp 886ndash893 June 2005

[147] G Hickok ldquoThe architecture of speech production and the roleof the phoneme in speech processingrdquo Language Cognition andNeuroscience vol 29 no 1 pp 2ndash20 2014

[148] B V Tucker M Sims and R H Baayen Opposing Forceson Acoustic Duration Manuscript University of Alberta andUniversity of Tubingen 2018

Complexity 39

[149] I Plag J Homann and G Kunter ldquoHomophony and mor-phology The acoustics of word-final S in Englishrdquo Journal ofLinguistics vol 53 no 1 pp 181ndash216 2017

[150] F Tomaschek I Plag M Ernestus and R H BaayenModeling the Duration of Word-Final s in English withNaive Discriminative Learning Manuscript University ofSiegenTubingenNijmegen 2018

[151] R E Kalman ldquoA new approach to linear filtering and predictionproblemsrdquo Journal of Fluids Engineering vol 82 no 1 pp 35ndash451960

[152] B Russell ldquoOn denotingrdquo Mind vol 14 no 56 pp 479ndash4931905

[153] N Hornstein Logical Form FromGB toMinimalism Blackwell1995

[154] R Jackendoff Semantic StructuresMITPress Cambridge 1990[155] J J McCarthy ldquoA prosodic theory of non-concatenative mor-

phologyrdquo Linguistic Inquiry vol 12 pp 373ndash418 1981[156] A Ussishkin ldquoA Fixed Prosodic Theory of Nonconcatenative

Templaticmorphologyrdquo Natural Language amp Linguistic Theoryvol 23 no 1 pp 169ndash218 2005

[157] A Ussishkin ldquoAffix-favored contrast inequity and psycholin-guistic grounding for non-concatenative morphologyrdquo Mor-phology vol 16 no 1 pp 107ndash125 2006

[158] RW Young andWMorganTheNavajo Language A Grammarand Colloquial Dictionary University of New Mexico Press1980

[159] S C Levinson and A Majid ldquoThe island of time Yelı Dnye thelanguage of Rossel islandrdquo Frontiers in psychology vol 4 2013

[160] S C Levinson ldquoThe language of space in Yelı Dnyerdquo inGrammars of Space Explorations in Cognitive Diversity pp 157ndash203 Cambridge University Press 2006

[161] E Shafaei-Bajestan and R H Baayen ldquoWide Learning forAuditory Comprehensionrdquo in Proceedings of the Interspeech2018 2018

[162] J Hay and K Drager ldquoStuffed toys and speech perceptionrdquoLinguistics vol 48 no 4 pp 865ndash892 2010

[163] V Henri and C Henri ldquoOn our earliest recollections ofchildhoodrdquo Psychological Review vol 2 pp 215-216 1895

[164] S Freud ldquoChildhood and concealing memoriesrdquo in The BasicWritings of Sigmund Freud The Modern Library NY USA19051953

[165] P J Bauer and M Larkina ldquoThe onset of childhood amnesiain childhood A prospective investigation of the course anddeterminants of forgetting of early-life eventsrdquoMemory vol 22no 8 pp 907ndash924 2014

[166] M S Seidenberg and L M Gonnerman ldquoExplaining deriva-tional morphology as the convergence of codesrdquo Trends inCognitive Sciences vol 4 no 9 pp 353ndash361 2000

[167] M Coltheart K Rastle C Perry R Langdon and J ZieglerldquoDRC a dual route cascaded model of visual word recognitionand reading aloudrdquoPsychological Review vol 108 no 1 pp 204ndash256 2001

[168] M V Butz and E F Kutter How the Mind Comes into BeingIntroducing Cognitive Science from a Functional and Computa-tional Perspective Oxford University Press 2016

[169] M W Harm and M S Seidenberg ldquoPhonology readingacquisition and dyslexia Insights from connectionist modelsrdquoPsychological Review vol 106 no 3 pp 491ndash528 1999

[170] L M Gonnerman M S Seidenberg and E S AndersenldquoGraded semantic and phonological similarity effects in prim-ing Evidence for a distributed connectionist approach to

morphologyrdquo Journal of Experimental Psychology General vol136 no 2 pp 323ndash345 2007

[171] H Zhao O Gallo I Frosio and J Kautz ldquoLoss Functions forImage Restoration With Neural Networksrdquo IEEE Transactionson Computational Imaging vol 3 no 1 pp 47ndash57 2017

[172] C Clarke and P Luce ldquoPerceptual adaptation to speakercharacteristics Vot boundaries in stop voicing categorizationrdquoin Proceedings of the ISCA Workshop on Plasticity in SpeechPerception 2005

[173] DNorris JMMcQueen andACutler ldquoPerceptual learning inspeechrdquo Cognitive Psychology vol 47 no 2 pp 204ndash238 2003

[174] A Schweitzer and N Lewandowski ldquoSocial factors in conver-gence of f1 and f2 in spontaneous speechrdquo in Proceedings ofthe 10th International Seminar on Speech Production CologneGermany 2014

[175] M J Pickering and S Garrod ldquoToward a mechanistic psychol-ogy of dialoguerdquo Behavioral and Brain Sciences vol 27 no 2 pp169ndash190 2004

[176] C M Clarke and M F Garrett ldquoRapid adaptation to foreign-accented Englishrdquo The Journal of the Acoustical Society ofAmerica vol 116 no 6 pp 3647ndash3658 2004

[177] W J Levelt ldquoDie konnektionistische moderdquo Sprache UndKognition vol 10 no 2 pp 61ndash72 1991

[178] A Kostic ldquoInformational load constraints on processinginflected morphologyrdquo in Morphological Aspects of LanguageProcessing L B Feldman Ed Lawrence Erlbaum Inc Publish-ers New Jersey USA 1995

[179] J P BlevinsWord andParadigmMorphology OxfordUniversityPress 2016

[180] E Forstemann Uber Deutsche volksetymologie Zeitschrift furvergleichende Sprachforschung auf dem Gebiete des DeutschenGriechischen und Lateinischen 1852

[181] K Nault Morphological Therapy Protocal [Ph D thesis] Uni-versity of Alberta Edmonton 2010

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 11: The Discriminative Lexicon: A Unified Computational Model ...downloads.hindawi.com/journals/complexity/2019/4895891.pdf · as developed within computational linguistics and com-puter

Complexity 11

NES

Ssle

eple

ssne

ssas

sert

iven

ess

dire

ctne

ssso

undn

ess

bign

ess

perm

issiv

enes

sap

prop

riate

ness

prom

ptne

sssh

rew

dnes

sw

eakn

ess

dark

ness

wild

erne

ssbl

ackn

ess

good

ness

kind

ness

lone

lines

sfo

rgiv

enes

ssti

llnes

sco

nsci

ousn

ess

happ

ines

ssic

knes

saw

aren

ess

blin

dnes

sbr

ight

ness

wild

ness

mad

ness

grea

tnes

sso

ftnes

sth

ickn

ess

tend

erne

ssbi

ttern

ess

effec

tiven

ess

hard

ness

empt

ines

sw

illin

gnes

sfo

olish

ness

eage

rnes

sne

atne

sspo

liten

ess

fresh

ness

high

ness

nerv

ousn

ess

wet

ness

unea

sines

slo

udne

ssco

olne

ssid

lene

ssdr

ynes

sbl

uene

ssda

mpn

ess

fond

ness

liken

ess

clean

lines

ssa

dnes

ssw

eetn

ess

read

ines

scle

vern

ess

noth

ingn

ess

cold

ness

unha

ppin

ess

serio

usne

ssal

ertn

ess

light

ness

uglin

ess

flatn

ess

aggr

essiv

enes

sse

lfminusco

nsci

ousn

ess

ruth

lessn

ess

quie

tnes

sva

guen

ess

shor

tnes

sbu

sines

sho

lines

sop

enne

ssfit

ness

chee

rful

ness

earn

estn

ess

right

eous

ness

unpl

easa

ntne

ssqu

ickn

ess

corr

ectn

ess

disti

nctiv

enes

sun

cons

ciou

snes

sfie

rcen

ess

orde

rline

ssw

hite

ness

talln

ess

heav

ines

sw

icke

dnes

sto

ughn

ess

lawle

ssne

ssla

zine

ssnu

mbn

ess

sore

ness

toge

ther

ness

resp

onsiv

enes

scle

arne

ssne

arne

ssde

afne

ssill

ness

dizz

ines

sse

lfish

ness

fatn

ess

rest

lessn

ess

shyn

ess

richn

ess

thor

ough

ness

care

less

ness

tired

ness

fairn

ess

wea

rines

stig

htne

ssge

ntle

ness

uniq

uene

ssfr

iend

lines

sva

stnes

sco

mpl

eten

ess

than

kful

ness

slow

ness

drun

kenn

ess

wre

tche

dnes

sfr

ankn

ess

exac

tnes

she

lples

snes

sat

trac

tiven

ess

fulln

ess

roug

hnes

sm

eann

ess

hope

lessn

ess

drow

sines

ssti

ffnes

sclo

sene

ssgl

adne

ssus

eful

ness

firm

ness

rude

ness

bold

ness

shar

pnes

sdu

llnes

sw

eigh

tless

ness

stra

ngen

ess

smoo

thne

ss

NESSsleeplessnessassertivenessdirectnesssoundnessbignesspermissivenessappropriatenesspromptnessshrewdnessweaknessdarknesswildernessblacknessgoodnesskindnesslonelinessforgivenessstillnessconsciousnesshappinesssicknessawarenessblindnessbrightnesswildnessmadnessgreatnesssoftnessthicknesstendernessbitternesseffectivenesshardnessemptinesswillingnessfoolishnesseagernessneatnesspolitenessfreshnesshighnessnervousnesswetnessuneasinessloudnesscoolnessidlenessdrynessbluenessdampnessfondnesslikenesscleanlinesssadnesssweetnessreadinessclevernessnothingnesscoldnessunhappinessseriousnessalertnesslightnessuglinessflatnessaggressivenessselfminusconsciousnessruthlessnessquietnessvaguenessshortnessbusinessholinessopennessfitnesscheerfulnessearnestnessrighteousnessunpleasantnessquicknesscorrectnessdistinctivenessunconsciousnessfiercenessorderlinesswhitenesstallnessheavinesswickednesstoughnesslawlessnesslazinessnumbnesssorenesstogethernessresponsivenessclearnessnearnessdeafnessillnessdizzinessselfishnessfatnessrestlessnessshynessrichnessthoroughnesscarelessnesstirednessfairnesswearinesstightnessgentlenessuniquenessfriendlinessvastnesscompletenessthankfulnessslownessdrunkennesswretchednessfranknessexactnesshelplessnessattractivenessfullnessroughnessmeannesshopelessnessdrowsinessstiffnessclosenessgladnessusefulnessfirmnessrudenessboldnesssharpnessdullnessweightlessnessstrangenesssmoothness

Color Keyand Histogram

0 05minus05

Value

020

0040

0060

0080

00C

ount

Figure 3 Heatmap for the correlation matrix of lexomes for content words with ness as well as the derivational lexome of ness itself Thislexome is found at the very left edge of the dendrograms and is negatively correlated with almost all content lexomes (Individual words willbecome legible by zooming in on the figure at maximal magnification)

lexome ness are co-present cues the derivational lexomeoccurs in many other words 119895 and each time anotherword 119895 is encountered weights are reduced from ness to119894 As this happens for all content lexomes the derivationallexome is during learning slowly but steadily discriminatedaway from its content lexomes We shall see that this is animportant property for our model to capture morphologicalproductivity for comprehension and speech production

When the additive model of Mitchell and Lapata [67]is used to construct a semantic vector for ness ie when

the average vector is computed for the vectors obtained bysubtracting the vector of the derived word from that of thebase word the result is a vector that is embedded insidethe cluster of derived vectors and hence inherits semanticidiosyncracies from all these derived words

324 Semantic Plausibility and Transparency Ratings forDerived Words In order to obtain further evidence for thevalidity of inflectional and derivational lexomes we re-analyzed the semantic plausibility judgements for word pairs

12 Complexity

consisting of a base and a novel derived form (eg accentaccentable) reported byMarelli and Baroni [60] and availableat httpcliccimecunitnitcomposesFRACSSThe number of pairs available to us for analysis 236 wasrestricted compared to the original 2559 pairs due to theconstraints that the base words had to occur in the tasacorpus Furthermore we also explored the role of wordsrsquoemotional valence arousal and dominance as availablein Warriner et al [83] which restricted the number ofitems even further The reason for doing so is that humanjudgements such as those of age of acquisition may reflectdimensions of emotion [59]

A measure derived from the semantic vectors of thederivational lexomes activation diversity and the L1-normof the semantic vector (the sum of the absolute values of thevector elements ie its city-block distance) turned out tobe predictive for these plausibility ratings as documented inTable 3 and the left panel of Figure 4 Activation diversity is ameasure of lexicality [58] In an auditory word identificationtask for instance speech that gives rise to a low activationdiversity elicits fast rejections whereas speech that generateshigh activation diversity elicits higher acceptance rates but atthe cost of longer response times [18]

Activation diversity interacted with word length Agreater word length had a strong positive effect on ratedplausibility but this effect progressively weakened as acti-vation diversity increases In turn activation diversity hada positive effect on plausibility for shorter words and anegative effect for longer words Apparently the evidencefor lexicality that comes with higher L1-norms contributespositively when there is little evidence coming from wordlength As word length increases and the relative contributionof the formative in theword decreases the greater uncertaintythat comes with higher lexicality (a greater L1-norm impliesmore strong links to many other lexomes) has a detrimentaleffect on the ratings Higher arousal scores also contributedto higher perceived plausibility Valence and dominance werenot predictive and were therefore left out of the specificationof themodel reported here (word frequency was not includedas predictor because all derived words are neologisms withzero frequency addition of base frequency did not improvemodel fit 119901 gt 091)

The degree to which a complex word is semanticallytransparent with respect to its base word is of both practi-cal and theoretical interest An evaluation of the semantictransparency of complex words using transformation matri-ces is developed in Marelli and Baroni [60] Within thepresent framework semantic transparency can be examinedstraightforwardly by comparing the correlations of (i) thesemantic vectors of the base word to which the semanticvector of the affix is added with (ii) the semantic vector ofthe derived word itself The more distant the two vectors arethe more negative their correlation should be This is exactlywhat we find For ness for instance the six most negativecorrelations are business (119903 = minus066) effectiveness (119903 = minus051)awareness (119903 = minus050) loneliness (119903 = minus045) sickness (119903 =minus044) and consciousness (119903 = minus043) Although ness canhave an anaphoric function in discourse [84] words such asbusiness and consciousness have a much deeper and richer

semantics than just reference to a previously mentioned stateof affairs A simple comparison of the wordrsquos actual semanticvector in S (derived from the TASA corpus) and its semanticvector obtained by accumulating evidence over base and affix(ie summing the vectors of base and affix) brings this outstraightforwardly

However evaluating correlations for transparency isprone to researcher bias We therefore also investigated towhat extent the semantic transparency ratings collected byLazaridou et al [68] can be predicted from the activationdiversity of the semantic vector of the derivational lexomeThe original dataset of Lazaridou and colleagues comprises900 words of which 330 meet the criterion of occurringmore than 8 times in the tasa corpus and for which wehave semantic vectors available The summary of a Gaussianlocation-scale additivemodel is given in Table 4 and the rightpanel of Figure 4 visualizes the interaction of word lengthby activation diversity Although modulated by word lengthoverall transparency ratings decrease as activation diversityis increased Apparently the stronger the connections aderivational lexome has with other lexomes the less clearit becomes what its actual semantic contribution to a novelderived word is In other words under semantic uncertaintytransparency ratings decrease

4 Comprehension

Now that we have established that the present semanticvectors make sense even though they are based on a smallcorpus we next consider a comprehension model that hasform vectors as input and semantic vectors as output (a pack-age for R implementing the comprehension and productionalgorithms of linear discriminative learning is available athttpwwwsfsuni-tuebingendesimhbaayenpublicationsWpmWithLdl 10targz The package isdescribed in Baayen et al [85]) We begin with introducingthe central concepts underlying mappings from form tomeaning We then discuss visual comprehension and thenturn to auditory comprehension

41 Setting Up the Mapping Let C denote the cue matrix amatrix that specifies for each word (rows) the form cues ofthat word (columns) For a toy lexicon with the words onetwo and three the Cmatrix is

C

= wV wVn Vn tu tu Tr Tri rionetwothree

( 100100

100010

010001

001001 ) (12)

where we use the disc keyboard phonetic alphabet (theldquoDIstinct SingleCharacterrdquo representationwith one characterfor each phoneme was introduced by CELEX [86]) fortriphones the symbol denotes the word boundary Supposethat the semantic vectors for these words are the row vectors

Complexity 13

26 27

28

29

29

3 3

31

31

32

32

33

33

34

34

35

35

3

3

35

4

45 36

38

40

42

44

Activ

atio

n D

iver

sity

of th

e Der

ivat

iona

l Lex

ome

8 10 12 14 16 186Word Length

36

38

40

42

44

Activ

atio

n D

iver

sity

of th

e Der

ivat

iona

l Lex

ome

6 8 10 12 144Word Length

Figure 4 Interaction of word length by activation diversity in GAMs fitted to plausibility ratings for complex words (left) and to semantictransparency ratings (right)

Table 3 GAM fitted to plausibility ratings for derivational neologisms with the derivational lexomes able again agent ist less and notdata from Marelli and Baroni [60]

A parametric coefficients Estimate Std Error t-value p-valueIntercept -18072 27560 -06557 05127Word Length 05901 02113 27931 00057Activation Diversity 11221 06747 16632 00976Arousal 03847 01521 25295 00121Word Length times Activation Diversity -01318 00500 -26369 00089B smooth terms edf Refdf F-value p-valueRandom Intercepts Affix 33610 40000 556723 lt 00001

of the following matrix S

S = one two threeonetwothree

( 100201031001

040110 ) (13)

We are interested in a transformation matrix 119865 such that

CF = S (14)

The transformation matrix is straightforward to obtain LetC1015840 denote the Moore-Penrose generalized inverse of Cavailable in R as the ginv function in theMASS package [82]Then

F = C1015840S (15)

For the present example

F =one two three

wVwVnVntutuTrTriri

((((((((((((

033033033010010003003003

010010010050050003003003

013013013005005033033033

))))))))))))

(16)

and for this simple example CF is exactly equal to SIn the remainder of this section we investigate how well

this very simple end-to-end model performs for visual wordrecognition as well as auditory word recognition For visualword recognition we use the semantic matrix S developed inSection 3 but we consider two different cue matrices C oneusing letter trigrams (following [58]) and one using phonetrigramsA comparison of the performance of the twomodels

14 Complexity

Table 4 Gaussian location-scale additive model fitted to the semantic transparency ratings for derived words te tensor product smooth ands thin plate regression spline

A parametric coefficients Estimate Std Error t-value p-valueintercept [location] 55016 02564 214537 lt 00001intercept [scale] -04135 00401 -103060 lt 00001B smooth terms edf Refdf F-value p-valuete(activation diversity word length) [location] 53879 62897 237798 00008s(derivational lexome) [location] 143180 150000 3806254 lt 00001s(activation diversity) [scale] 21282 25993 284336 lt 00001s(word length) [scale] 17722 21403 1327311 lt 00001

sheds light on the role of wordsrsquo ldquosound imagerdquo on wordrecognition in reading (cf [87ndash89] for phonological effectsin visual word Recognition) For auditory word recognitionwe make use of the acoustic features developed in Arnold etal [18]

Although the features that we selected as cues are basedon domain knowledge they do not have an ontological statusin our framework in contrast to units such as phonemes andmorphemes in standard linguistic theories In these theoriesphonemes and morphemes are Russelian atomic units offormal phonological and syntactic calculi and considerableresearch has been directed towards showing that these unitsare psychologically real For instance speech errors involvingmorphemes have been interpreted as clear evidence for theexistence in the mind of morphemes [90] Research hasbeen directed at finding behavioral and neural correlates ofphonemes and morphemes [6 91 92] ignoring fundamentalcriticisms of both the phoneme and the morpheme astheoretical constructs [25 93 94]

The features that we use for representing aspects of formare heuristic features that have been selected or developedprimarily because they work well as discriminators We willgladly exchange the present features for other features if theseother features can be shown to afford higher discriminativeaccuracy Nevertheless the features that we use are groundedin domain knowledge For instance the letter trigrams usedfor modeling reading (see eg[58]) are motivated by thefinding in stylometry that letter trigrams are outstandingfeatures for discriminating between authorial hands [95] (inwhat follows we work with letter triplets and triphoneswhich are basically contextually enriched letter and phoneunits For languages with strong phonotactic restrictionssuch that the syllable inventory is quite small (eg Viet-namese) digraphs work appear to work better than trigraphs[96] Baayen et al [85] show that working with four-gramsmay enhance performance and current work in progress onmany other languages shows that for highly inflecting lan-guages with long words 4-grams may outperform 3-gramsFor computational simplicity we have not experimented withmixtures of units of different length nor with algorithmswithwhich such units might be determined) Letter n-grams arealso posited by Cohen and Dehaene [97] for the visual wordform system at the higher endof a hierarchy of neurons tunedto increasingly large visual features of words

Triphones the units that we use for representing thelsquoacoustic imagersquo or lsquoauditory verbal imageryrsquo of canonicalword forms have the same discriminative potential as lettertriplets but have as additional advantage that they do betterjustice compared to phonemes to phonetic contextual inter-dependencies such as plosives being differentiated primarilyby formant transitions in adjacent vowels The acousticfeatures that we use for modeling auditory comprehensionto be introduced in more detail below are motivated in partby the sensitivity of specific areas on the cochlea to differentfrequencies in the speech signal

Evaluation of model predictions proceeds by comparingthe predicted semantic vector s obtained by multiplying anobserved cue vector c with the transformation matrix F (s =cF) with the corresponding target row vector s of S A word 119894is counted as correctly recognized when s119894 is most stronglycorrelated with the target semantic vector s119894 of all targetvectors s119895 across all words 119895

Inflected words do not have their own semantic vectors inS We therefore created semantic vectors for inflected wordsby adding the semantic vectors of stem and affix and addedthese as additional row vectors to S before calculating thetransformation matrix F

We note here that there is only one transformationmatrix ie one discrimination network that covers allaffixes inflectional and derivational as well as derived andmonolexomic (simple) words This approach contrasts withthat of Marelli and Baroni [60] who pair every affix with itsown transformation matrix

42 Visual Comprehension For visual comprehension wefirst consider a model straightforwardly mapping form vec-tors onto semantic vectors We then expand the model withan indirect route first mapping form vectors onto the vectorsfor the acoustic image (derived from wordsrsquo triphone repre-sentations) and thenmapping the acoustic image vectors ontothe semantic vectors

The dataset on which we evaluated our models com-prised 3987 monolexomic English words 6595 inflectedvariants of monolexomic words and 898 derived words withmonolexomic base words to a total of 11480 words Thesecounts follow from the simultaneous constraints of (i) a wordappearing with sufficient frequency in TASA (ii) the wordbeing available in the British Lexicon Project (BLP [98]) (iii)

Complexity 15

the word being available in the CELEX database and theword being of the abovementioned morphological type Inthe present study we did not include inflected variants ofderived words nor did we include compounds

421 The Direct Route Straight from Orthography to Seman-tics To model single word reading as gauged by the visuallexical decision task we used letter trigrams as cues In ourdataset there were a total of 3465 unique trigrams resultingin a 11480 times 3465 orthographic cue matrix C The semanticvectors of the monolexomic and derived words were takenfrom the semantic weight matrix described in Section 3 Forinflected words semantic vectors were obtained by summa-tion of the semantic vectors of base words and inflectionalfunctions For the resulting semantic matrix S we retainedthe 5030 column vectors with the highest variances settingthe cutoff value for the minimal variance to 034times10minus7 FromS and C we derived the transformation matrix F which weused to obtain estimates s of the semantic vectors s in S

For 59 of the words s had the highest correlation withthe targeted semantic vector s (for the correctly predictedwords the mean correlation was 083 and the median was086 With respect to the incorrectly predicted words themean and median correlation were both 048) The accuracyobtained with naive discriminative learning using orthog-onal lexomes as outcomes instead of semantic vectors was27Thus as expected performance of linear discriminationlearning (ldl) is substantially better than that of naivediscriminative learning (ndl)

To assess whether this model is productive in the sensethat it canmake sense of novel complex words we considereda separate set of inflected and derived words which were notincluded in the original dataset For an unseen complexwordboth base word and the inflectional or derivational functionappeared in the training set but not the complex word itselfThe network F therefore was presented with a novel formvector which it mapped onto a novel vector in the semanticspace Recognition was successful if this novel vector wasmore strongly correlated with the semantic vector obtainedby summing the semantic vectors of base and inflectional orderivational function than with any other semantic vector

Of 553 unseen inflected words 43 were recognized suc-cessfully The semantic vectors predicted from their trigramsby the network F were overall well correlated in the meanwith the targeted semantic vectors of the novel inflectedwords (obtained by summing the semantic vectors of theirbase and inflectional function) 119903 = 067 119901 lt 00001The predicted semantic vectors also correlated well with thesemantic vectors of the inflectional functions (119903 = 061 119901 lt00001) For the base words the mean correlation dropped to119903 = 028 (119901 lt 00001)

For unseen derivedwords (514 in total) we also calculatedthe predicted semantic vectors from their trigram vectorsThe resulting semantic vectors s had moderate positivecorrelations with the semantic vectors of their base words(119903 = 040 119901 lt 00001) but negative correlations withthe semantic vectors of their derivational functions (119903 =minus013 119901 lt 00001) They did not correlate with the semantic

vectors obtained by summation of the semantic vectors ofbase and affix (119903 = 001 119901 = 056)

The reduced correlations for the derived words as com-pared to those for the inflected words likely reflects tosome extent that the model was trained on many moreinflected words (in all 6595) than derived words (898)whereas the number of different inflectional functions (7)was much reduced compared to the number of derivationalfunctions (24) However the negative correlations of thederivational semantic vectors with the semantic vectors oftheir derivational functions fit well with the observation inSection 323 that derivational functions are antiprototypesthat enter into negative correlations with the semantic vectorsof the corresponding derived words We suspect that theabsence of a correlation of the predicted vectors with thesemantic vectors obtained by integrating over the vectorsof base and derivational function is due to the semanticidiosyncracies that are typical for derivedwords For instanceausterity can denote harsh discipline or simplicity or apolicy of deficit cutting or harshness to the taste Anda worker is not just someone who happens to work butsomeone earning wages or a nonreproductive bee or waspor a thread in a computer program Thus even within theusages of one and the same derived word there is a lotof semantic heterogeneity that stands in stark contrast tothe straightforward and uniform interpretation of inflectedwords

422 The Indirect Route from Orthography via Phonology toSemantics Although reading starts with orthographic inputit has been shown that phonology actually plays a role duringthe reading process as well For instance developmentalstudies indicate that childrenrsquos reading development can bepredicted by their phonological abilities [87 99] Evidence ofthe influence of phonology on adultsrsquo reading has also beenreported [89 100ndash105] As pointed out by Perrone-Bertolottiet al [106] silent reading often involves an imagery speechcomponent we hear our own ldquoinner voicerdquo while readingSince written words produce a vivid auditory experiencealmost effortlessly they argue that auditory verbal imageryshould be part of any neurocognitive model of readingInterestingly in a study using intracranial EEG recordingswith four epileptic neurosurgical patients they observed thatsilent reading elicits auditory processing in the absence ofany auditory stimulation suggesting that auditory images arespontaneously evoked during reading (see also [107])

To explore the role of auditory images in silent readingwe operationalized sound images bymeans of phone trigrams(in our model phone trigrams are independently motivatedas intermediate targets of speech production see Section 5for further discussion) We therefore ran the model againwith phone trigrams instead of letter triplets The semanticmatrix S was exactly the same as above However a newtransformation matrix F was obtained for mapping a 11480times 5929 cue matrix C of phone trigrams onto S To our sur-prise the triphone model outperformed the trigram modelsubstantially Overall accuracy rose to 78 an improvementof almost 20

16 Complexity

In order to integrate this finding into our model weimplemented an additional network K that maps ortho-graphic vectors (coding the presence or absence of trigrams)onto phonological vectors (indicating the presence or absenceof triphones) The result is a dual route model for (silent)reading with a first route utilizing a network mappingstraight from orthography to semantics and a secondindirect route utilizing two networks one mapping fromtrigrams to triphones and the other subsequently mappingfrom triphones to semantics

The network K which maps trigram vectors onto tri-phone vectors provides good support for the relevant tri-phones but does not provide guidance as to their orderingUsing a graph-based algorithm detailed in the Appendix (seealso Section 52) we found that for 92 of the words thecorrect sequence of triphones jointly spanning the wordrsquossequence of letters received maximal support

We then constructed amatrix Twith the triphone vectorspredicted from the trigrams by networkK and defined a newnetwork H mapping these predicted triphone vectors ontothe semantic vectors SThemean correlation for the correctlyrecognized words was 090 and the median correlation was094 For words that were not recognized correctly the meanand median correlation were 045 and 043 respectively Wenow have for each word two estimated semantic vectors avector s1 obtained with network F of the first route and avector s2 obtained with network H from the auditory targetsthat themselves were predicted by the orthographic cuesThemean of the correlations of these pairs of vectors was 119903 = 073(119901 lt 00001)

To assess to what extent the networks that we definedare informative for actual visual word recognition we madeuse of the reaction times in the visual lexical decision taskavailable in the BLPAll 11480words in the current dataset arealso included in BLPWe derived threemeasures from each ofthe two networks

The first measure is the sum of the L1-norms of s1 ands2 to which we will refer as a wordrsquos total activation diversityThe total activation diversity is an estimate of the support fora wordrsquos lexicality as provided by the two routes The secondmeasure is the correlation of s1 and s2 to which we will referas a wordrsquos route congruency The third measure is the L1-norm of a lexomersquos column vector in S to which we referas its prior This last measure has previously been observedto be a strong predictor for lexical decision latencies [59] Awordrsquos prior is an estimate of a wordrsquos entrenchment and prioravailability

We fitted a generalized additive model to the inverse-transformed response latencies in the BLP with these threemeasures as key covariates of interest including as controlpredictors word length andword type (derived inflected andmonolexomic with derived as reference level) As shown inTable 5 inflected words were responded to more slowly thanderived words and the same holds for monolexomic wordsalbeit to a lesser extent Response latencies also increasedwith length Total activation diversity revealed a U-shapedeffect (Figure 5 left panel) For all but the lowest totalactivation diversities we find that response times increase as

total activation diversity increases This result is consistentwith previous findings for auditory word recognition [18]As expected given the results of Baayen et al [59] the priorwas a strong predictor with larger priors affording shorterresponse times There was a small effect of route congruencywhich emerged in a nonlinear interaction with the priorsuch that for large priors a greater route congruency affordedfurther reduction in response time (Figure 5 right panel)Apparently when the two routes converge on the samesemantic vector uncertainty is reduced and a faster responsecan be initiated

For comparison the dual-route model was implementedwith ndl as well Three measures were derived from thenetworks and used to predict the same response latenciesThese measures include activation activation diversity andprior all of which have been reported to be reliable predictorsfor RT in visual lexical decision [54 58 59] However sincehere we are dealing with two routes the three measures canbe independently derived from both routes We thereforesummed up the measures of the two routes obtaining threemeasures total activation total activation diversity and totalprior With word type and word length as control factorsTable 6 shows that thesemeasures participated in a three-wayinteraction presented in Figure 6 Total activation showed aU-shaped effect on RT that is increasingly attenuated as totalactivation diversity is increased (left panel) Total activationalso interacted with the total prior (right panel) For mediumrange values of total activation RTs increased with totalactivation and as expected decreased with the total priorThecenter panel shows that RTs decrease with total prior formostof the range of total activation diversity and that the effect oftotal activation diversity changes sign going from low to highvalues of the total prior

Model comparison revealed that the generalized additivemodel with ldl measures (Table 5) provides a substantiallyimproved fit to the data compared to the GAM using ndlmeasures (Table 6) with a difference of no less than 65101AIC units At the same time the GAM based on ldl is lesscomplex

Given the substantially better predictability of ldl mea-sures on human behavioral data one remaining question iswhether it is really the case that two routes are involved insilent reading After all the superior accuracy of the secondroute might be an artefact of a simple machine learningtechnique performing better for triphones than for trigramsThis question can be addressed by examining whether thefit of the GAM summarized in Table 5 improves or worsensdepending on whether the activation diversity of the first orthe second route is taken out of commission

When the GAM is provided access to just the activationdiversity of s2 the AIC increased by 10059 units Howeverwhen themodel is based only on the activation diversity of s1themodel fit increased by no less than 14790 AIC units Fromthis we conclude that at least for the visual lexical decisionlatencies in the BLP the second route first mapping trigramsto triphones and then mapping triphones onto semanticvectors plays the more important role

The superiority of the second route may in part bedue to the number of triphone features being larger

Complexity 17

minus0295

minus00175

026

part

ial e

ffect

1 2 3 4 50Total activation diversity

000

005

010

015

Part

ial e

ffect

05

10

15

20

Prio

r

02 04 06 08 1000Selfminuscorrelation

Figure 5 The partial effects of total activation diversity (left) and the interaction of route congruency and prior (right) on RT in the BritishLexicon Project

Table 5 Summary of a generalized additive model fitted to response latencies in visual lexical decision using measures based on ldl s thinplate regression spline smooth te tensor product smooth

A parametric coefficients Estimate Std Error t-value p-valueintercept -17774 00087 -205414 lt0001word typeInflected 01110 00059 18742 lt0001word typeMonomorphemic 00233 00054 4322 lt0001word length 00117 00011 10981 lt0001B smooth terms edf Refdf F p-values(total activation diversity) 6002 7222 236 lt0001te(route congruency prior) 14673 17850 2137 lt0001

02 06 10 14

NDL total activation

minus0171

0031

0233

part

ial e

ffect

minus4 minus2 0 2 4

10

15

20

25

30

35

40

NDL total prior

ND

L to

tal a

ctiv

atio

n di

vers

ity

minus0416

01915

0799

part

ial e

ffect

02 06 10 14

minus4

minus2

02

4

NDL total activation

ND

L to

tal p

rior

minus0158

01635

0485

part

ial e

ffect

10

15

20

25

30

35

40

ND

L to

tal a

ctiv

atio

n di

vers

ity

Figure 6The interaction of total activation and total activation diversity (left) of total prior by total activation diversity (center) and of totalactivation by total prior (right) on RT in the British Lexicon Project based on ndl

18 Complexity

Table 6 Summary of a generalized additive model fitted to response latencies in visual lexical decision using measures from ndl te tensorproduct smooth

A parametric coefficients Estimate Std Error t-value p-valueintercept -16934 00086 -195933 lt0001word typeInflected 00286 00052 5486 lt0001word typeMonomorphemic 00042 00055 0770 = 4word length 00066 00011 5893 lt0001B smooth terms edf Refdf F p-valuete(activation activation diversity prior) 4134 4972 1084 lt0001

than the number of trigram features (3465 trigrams ver-sus 5929 triphones) More features which mathematicallyamount to more predictors enable more precise map-pings Furthermore the heavy use made in English ofletter sequences such as ough (with 10 different pro-nunciations httpswwwdictionarycomesough)reduces semantic discriminability compared to the corre-sponding spoken forms It is noteworthy however that thebenefits of the triphone-to-semantics mapping are possibleonly thanks to the high accuracy with which orthographictrigram vectors are mapped onto phonological triphonevectors (92)

It is unlikely that the lsquophonological routersquo is always dom-inant in silent reading Especially in fast lsquodiagonalrsquo readingthe lsquodirect routersquomay bemore dominantThere is remarkablealthough for the present authors unexpected convergencewith the dual route model for reading aloud of Coltheartet al [108] and Coltheart [109] However while the text-to-phonology route of their model has as primary function toexplain why nonwords can be pronounced our results showthat both routes can actually be active when silently readingreal words A fundamental difference is however that in ourmodel wordsrsquo semantics play a central role

43 Auditory Comprehension For the modeling of readingwe made use of letter trigrams as cues These cues abstractaway from the actual visual patterns that fall on the retinapatterns that are already transformed at the retina beforebeing sent to the visual cortex Our hypothesis is that lettertrigrams represent those high-level cells or cell assemblies inthe visual system that are critical for reading andwe thereforeleave the modeling possibly with deep learning networksof how patterns on the retina are transformed into lettertrigrams for further research

As a consequence of the high level of abstraction of thetrigrams a word form is represented by a unique vectorspecifying which of a fixed set of letter trigrams is present inthe word Although one might consider modeling auditorycomprehension with phone triplets (triphones) replacing theletter trigrams of visual comprehension such an approachwould not do justice to the enormous variability of actualspeech Whereas the modeling of reading printed wordscan depart from the assumption that the pixels of a wordrsquosletters on a computer screen are in a fixed configurationindependently of where the word is shown on the screenthe speech signal of the same word type varies from token

to token as illustrated in Figure 7 for the English wordcrisis A survey of the Buckeye corpus [110] of spontaneousconversations recorded at Columbus Ohio [111] indicatesthat around 5 of the words are spoken with one syllablemissing and that a little over 20 of words have at least onephone missing

It is widely believed that the phoneme as an abstractunit of sound is essential for coming to grips with thehuge variability that characterizes the speech signal [112ndash114]However the phoneme as theoretical linguistic constructis deeply problematic [93] and for many spoken formscanonical phonemes do not do justice to the phonetics ofthe actual sounds [115] Furthermore if words are defined assequences of phones the problem arises what representationsto posit for words with two ormore reduced variants Addingentries for reduced forms to the lexicon turns out not toafford better overal recognition [116] Although exemplarmodels have been put forward to overcome this problem[117] we take a different approach here and following Arnoldet al [18] lay out a discriminative approach to auditorycomprehension

The cues that we make use of to represent the acousticsignal are the Frequency Band Summary Features (FBSFs)introduced by Arnold et al [18] as input cues FBSFs summa-rize the information present in the spectrogram of a speechsignal The algorithm that derives FBSFs first chunks theinput at the minima of the Hilbert amplitude envelope of thesignalrsquos oscillogram (see the upper panel of Figure 8) Foreach chunk the algorithm distinguishes 21 frequency bandsin the MEL scaled spectrum and intensities are discretizedinto 5 levels (lower panel in Figure 8) for small intervalsof time For each chunk and for each frequency band inthese chunks a discrete feature is derived that specifies chunknumber frequency band number and a description of thetemporal variation in the band bringing together minimummaximum median initial and final intensity values The21 frequency bands are inspired by the 21 receptive areason the cochlear membrane that are sensitive to differentranges of frequencies [118] Thus a given FBSF is a proxyfor cell assemblies in the auditory cortex that respond to aparticular pattern of changes over time in spectral intensityThe AcousticNDLCodeR package [119] for R [120] wasemployed to extract the FBSFs from the audio files

We tested ldl on 20 hours of speech sampled from theaudio files of the ucla library broadcast newsscapedata a vast repository of multimodal TV news broadcasts

Complexity 19

0 5000 10000 15000 20000 25000 30000 35000 40000minus6000

minus4000

minus2000

0

2000

4000

6000

8000

0 5000 10000 15000 20000 25000 30000 35000 40000minus6000

minus4000

minus2000

0

2000

4000

6000

8000

Figure 7Oscillogram for twodifferent realizations of theword crisiswith different degrees of reduction in theNewsScape archive [kraız](left)and [khraısıs](right)

00 01 02 03 04 05 06time

minus020minus015minus010minus005

000005010015

ampl

itude

1

21

frequ

ency

ban

d

Figure 8 Oscillogram of the speech signal for a realization of the word economic with Hilbert envelope (in orange) outlining the signal isshown on the top panel The lower panel depicts the discretized MEL scaled spectrum of the signalThe vertical bars are the boundary pointsthat partition the signal into 4 chunks For one chunk horizontal lines (in red) highlight one of the frequency bands for which the FBSFsprovide summaries of the variation over time in that frequency band The FBSF for the highlighted band is ldquoband11-start3-median3-min2-max5-end3-part2rdquo

provided to us by the Distributed Little Red Hen Lab Theaudio files of this resource were automatically classified asclean for relatively clean parts where there is speech withoutbackground noise or music and noisy for speech snippetswhere background noise or music is present Here we reportresults for 20 hours of clean speech to a total of 131673word tokens (representing 4779word types) with in all 40639distinct FBSFs

The FBSFs for the word tokens are brought together ina matrix C119886 with dimensions 131673 audio tokens times 40639FBSFs The targeted semantic vectors are taken from the Smatrix which is expanded to a matrix with 131673 rows onefor each audio token and 4609 columns the dimension of

the semantic vectors Although the transformation matrix Fcould be obtained by calculating C1015840S the calculation of C1015840is numerically expensive To reduce computational costs wecalculated F as follows (the transpose of a square matrix Xdenoted by X119879 is obtained by replacing the upper triangleof the matrix by the lower triangle and vice versa Thus( 3 87 2 )119879 = ( 3 78 2 ) For a nonsquare matrix the transpose isobtained by switching rows and columnsThus a 4times2 matrixbecomes a 2 times 4 matrix when transposed)

CF = S

C119879CF = C119879S

20 Complexity

(C119879C)minus1 (C119879C) F = (C119879C)minus1 C119879SF = (C119879C)minus1 C119879S

(17)

In this way matrix inversion is required for a much smallermatrixC119879C which is a square matrix of size 40639 times 40639

To evaluate model performance we compared the esti-mated semantic vectors with the targeted semantic vectorsusing the Pearson correlation We therefore calculated the131673 times 131673 correlation matrix for all possible pairs ofestimated and targeted semantic vectors Precisionwas calcu-lated by comparing predicted vectors with the gold standardprovided by the targeted vectors Recognition was defined tobe successful if the correlation of the predicted vector withthe targeted gold vector was the highest of all the pairwisecorrelations of this predicted vector with any of the other goldsemantic vectors Precision defined as the proportion of cor-rect recognitions divided by the total number of audio tokenswas at 3361 For the correctly identified words the meancorrelation was 072 and for the incorrectly identified wordsit was 055 To place this in perspective a naive discrimi-nation model with discrete lexomes as output performed at12 and a deep convolution network Mozilla DeepSpeech(httpsgithubcommozillaDeepSpeech based onHannun et al [9]) performed at 6 The low performanceofMozilla DeepSpeech is due primarily due to its dependenceon a languagemodelWhenpresentedwith utterances insteadof single words it performs remarkably well

44 Discussion A problem with naive discriminative learn-ing that has been puzzling for a long time is that measuresbased on ndl performed well as predictors of processingtimes [54 58] whereas accuracy for lexical decisions waslow An accuracy of 27 for a model performing an 11480-classification task is perhaps reasonable but the lack ofprecision is unsatisfactory when the goal is to model humanvisual lexicality decisions Bymoving fromndl to ldlmodelaccuracy is substantially improved (to 59) At the sametime predictions for reaction times improved considerablyas well As lexicality decisions do not require word identifica-tion further improvement in predicting decision behavior isexpected to be possible by considering not only whether thepredicted semantic is closest to the targeted vector but alsomeasures such as how densely the space around the predictedsemantic vector is populated

Accuracy for auditory comprehension is lower for thedata we considered above at around 33 Interestingly aseries of studies indicates that recognizing isolated wordstaken out of running speech is a nontrivial task also forhuman listeners [121ndash123] Correct identification by nativespeakers of 1000 randomly sampled word tokens from aGerman corpus of spontaneous conversational speech rangedbetween 21 and 44 [18] For both human listeners andautomatic speech recognition systems recognition improvesconsiderably when words are presented in their naturalcontext Given that ldl with FBSFs performs very wellon isolated word recognition it seems worth investigating

further whether the present approach can be developed into afully-fledged model of auditory comprehension that can takefull utterances as input For a blueprint of how we plan toimplement such a model see Baayen et al [124]

5 Speech Production

This section examines whether we can predict wordsrsquo formsfrom the semantic vectors of S If this is possible for thepresent dataset with reasonable accuracy we have a proofof concept that discriminative morphology is feasible notonly for comprehension but also for speech productionThe first subsection introduces the computational imple-mentation The next subsection reports on the modelrsquosperformance which is evaluated first for monomorphemicwords then for inflected words and finally for derivedwords Section 53 provides further evidence for the pro-duction network by showing that as the support from thesemantics for the triphones becomes weaker the amount oftime required for articulating the corresponding segmentsincreases

51 Computational Implementation For a productionmodelsome representational format is required for the output thatin turn drives articulation In what follows we make useof triphones as output features Triphones capture part ofthe contextual dependencies that characterize speech andthat render problematic the phoneme as elementary unit ofa phonological calculus [93] Triphones are in many waysnot ideal in that they inherit the limitations that come withdiscrete units Other output features structured along thelines of gestural phonology [125] or time series of movementsof key articulators registered with electromagnetic articulog-raphy or ultrasound are on our list for further explorationFor now we use triphones as a convenience construct andwe will show that given the support from the semantics forthe triphones the sequence of phones can be reconstructedwith high accuracy As some models of speech productionassemble articulatory targets from phone segments (eg[126]) the present implementation can be seen as a front-endfor this type of model

Before putting the model to the test we first clarify theway the model works by means of our toy lexicon with thewords one two and three Above we introduced the semanticmatrix S (13) which we repeat here for convenience

S = one two threeonetwothree

( 100201031001

040110 ) (18)

as well as an C indicator matrix specifying which triphonesoccur in which words (12) As in what follows this matrixspecifies the triphones targeted for productionwe henceforthrefer to this matrix as the Tmatrix

Complexity 21

T

= wV wVn Vn tu tu Tr Tri rionetwothree

( 100100

100010

010001

001001 ) (19)

For production our interest is in thematrixG that transformsthe row vectors of S into the row vectors of T ie we need tosolve

SG = T (20)

Given G we can predict for any semantic vector s the vectorof triphones t that quantifies the support for the triphonesprovided by s simply by multiplying s with G

t = sG (21)

As before the transformation matrix G is straightforwardto estimate Let S1015840 denote the Moore-Penrose generalizedinverse of S Since

S1015840SG = S1015840T (22)

we have

G = S1015840T (23)

Given G we can predict the triphone matrix T from thesemantic matrix S

SG = T (24)

For the present example the inverse of S and S1015840 is

S1015840 = one two threeonetwothree

( 110minus021minus009minus029107minus008

minus041minus002104 ) (25)

and the transformation matrix G is

G = wV wVn Vn tu tu Tr Tri ri

onetwothree

( 110minus021minus009110minus021minus009

110minus021minus009minus029107minus008

minus029107minus008minus041minus002104

minus041minus002104minus041minus002104 ) (26)

For this simple example T is virtually identical to T Forrealistic data T will not be identical to T but will be anapproximation of it that is optimal in the least squares senseThe triphones with the strongest support are expected to bethe most likely to be the triphones defining a wordrsquos form

We made use of the S matrix which we derived fromthe tasa corpus as described in Section 3 The majority ofcolumns of the 23561 times 23561 matrix S show very smalldeviations from zero and hence are uninformative As beforewe reduced the number of columns of S by removing columnswith very low variance Here one option is to remove allcolumns with a variance below a preset threshold 120579 Howeverto avoid adding a threshold as a free parameter we setthe number of columns retained to the number of differenttriphones in a given dataset a number which is around 119899 =4500 In summary S denotes a 119908times119899 matrix that specifies foreach of 119908 words an 119899-dimensional semantic vector

52 Model Performance

521 Performance on Monolexomic Words We first exam-ined model performance on a dataset comprising monolex-omicwords that did not carry any inflectional exponentsThisdataset of 3987words comprised three irregular comparatives

(elder less andmore) two irregular superlatives (least most)28 irregular past tense forms and one irregular past participle(smelt) For this set of words we constructed a 3987 times 4446matrix of semantic vectors S119898 (a submatrix of the S matrixintroduced above in Section 3) and a 3987 times 4446 triphonematrix T119898 (a submatrix of T) The number of colums of S119898was set to the number of different triphones (the columndimension of T119898) Those column vectors of S119898 were retainedthat had the 4446 highest variances We then estimated thetransformation matrix G and used this matrix to predict thetriphones that define wordsrsquo forms

We evaluated model performance in two ways First weinspected whether the triphones with maximal support wereindeed the targeted triphones This turned out to be the casefor all words Targeted triphones had an activation valueclose to one and nontargeted triphones an activation closeto zero As the triphones are not ordered we also investi-gated whether the sequence of phones could be constructedcorrectly for these words To this end we set a thresholdof 099 extracted all triphones with an activation exceedingthis threshold and used the all simple paths (a path issimple if the vertices it visits are not visited more than once)function from the igraph package [127] to calculate all pathsstarting with any left-edge triphone in the set of extractedtriphones From the resulting set of paths we selected the

22 Complexity

longest path which invariably was perfectly aligned with thesequence of triphones that defines wordsrsquo forms

We also evaluated model performance with a secondmore general heuristic algorithm that also makes use of thesame algorithm from graph theory Our algorithm sets up agraph with vertices collected from the triphones that are bestsupported by the relevant semantic vectors and considers allpaths it can find that lead from an initial triphone to a finaltriphoneThis algorithm which is presented inmore detail inthe appendix and which is essential for novel complex wordsproduced the correct form for 3982 out of 3987 words Itselected a shorter form for five words int for intent lin forlinnen mis for mistress oint for ointment and pin for pippinThe correct forms were also found but ranked second due toa simple length penalty that is implemented in the algorithm

From these analyses it is clear that mapping nearly4000 semantic vectors on their corresponding triphone pathscan be accomplished with very high accuracy for Englishmonolexomic words The question to be addressed nextis how well this approach works for complex words Wefirst address inflected forms and limit ourselves here to theinflected variants of the present set of 3987 monolexomicwords

522 Performance on Inflected Words Following the classicdistinction between inflection and word formation inflectedwords did not receive semantic vectors of their own Never-theless we can create semantic vectors for inflected words byadding the semantic vector of an inflectional function to thesemantic vector of its base However there are several ways inwhich themapping frommeaning to form for inflectedwordscan be set up To explain this we need some further notation

Let S119898 and T119898 denote the submatrices of S and T thatcontain the semantic and triphone vectors of monolexomicwords Assume that a subset of 119896 of these monolexomicwords is attested in the training corpus with inflectionalfunction 119886 Let T119886 denote the matrix with the triphonevectors of these inflected words and let S119886 denote thecorresponding semantic vectors To obtain S119886 we take thepertinent submatrix S119898119886 from S119898 and add the semantic vectors119886 of the affix

S119886 = S119898119886 + i⨂ s119886 (27)

Here i is a unit vector of length 119896 and ⨂ is the generalizedKronecker product which in (27) stacks 119896 copies of s119886 row-wise As a first step we could define a separate mapping G119886for each inflectional function 119886

S119886G119886 = T119886 (28)

but in this set-up learning of inflected words does not benefitfrom the knowledge of the base words This can be remediedby a mapping for augmented matrices that contain the rowvectors for both base words and inflected words[S119898

S119886]G119886 = [T119898

T119886] (29)

The dimension of G119886 (length of semantic vector by lengthof triphone vector) remains the same so this option is notmore costly than the preceding one Nevertheless for eachinflectional function a separate large matrix is required Amuch more parsimonious solution is to build augmentedmatrices for base words and all inflected words jointly

[[[[[[[[[[S119898S1198861S1198862S119886119899

]]]]]]]]]]G = [[[[[[[[[[

T119898T1198861T1198862T119886119899

]]]]]]]]]](30)

The dimension of G is identical to that of G119886 but now allinflectional functions are dealt with by a single mappingIn what follows we report the results obtained with thismapping

We selected 6595 inflected variants which met the cri-terion that the frequency of the corresponding inflectionalfunction was at least 50 This resulted in a dataset with 91comparatives 97 superlatives 2401 plurals 1333 continuousforms (eg walking) 859 past tense forms and 1086 formsclassed as perfective (past participles) as well as 728 thirdperson verb forms (eg walks) Many forms can be analyzedas either past tenses or as past participles We followed theanalyses of the treetagger which resulted in a dataset inwhichboth inflectional functions are well-attested

Following (30) we obtained an (augmented) 10582 times5483 semantic matrix S whereas before we retained the 5483columns with the highest column variance The (augmented)triphone matrix T for this dataset had the same dimensions

Inspection of the activations of the triphones revealedthat targeted triphones had top activations for 85 of themonolexomic words and 86 of the inflected words Theproportion of words with at most one intruding triphone was97 for both monolexomic and inflected words The graph-based algorithm performed with an overall accuracy of 94accuracies broken down bymorphology revealed an accuracyof 99 for the monolexomic words and an accuracy of 92for inflected words One source of errors for the productionalgorithm is inconsistent coding in the celex database Forinstance the stem of prosper is coded as having a final schwafollowed by r but the inflected forms are coded without ther creating a mismatch between a partially rhotic stem andcompletely non-rhotic inflected variants

We next put model performance to a more stringent testby using 10-fold cross-validation for the inflected variantsFor each fold we trained on all stems and 90 of all inflectedforms and then evaluated performance on the 10of inflectedforms that were not seen in training In this way we can ascer-tain the extent to which our production system (network plusgraph-based algorithm for ordering triphones) is productiveWe excluded from the cross-validation procedure irregularinflected forms forms with celex phonological forms withinconsistent within-paradigm rhoticism as well as forms thestem of which was not available in the training set Thus

Complexity 23

cross-validation was carried out for a total of 6236 inflectedforms

As before the semantic vectors for inflected words wereobtained by addition of the corresponding content and inflec-tional semantic vectors For each training set we calculatedthe transformationmatrixG from the S andTmatrices of thattraining set For an out-of-bag inflected form in the test setwe calculated its semantic vector and multiplied this vectorwith the transformation matrix (using (21)) to obtain thepredicted triphone vector t

The proportion of forms that were predicted correctlywas 062 The proportion of forms ranked second was 025Forms that were incorrectly ranked first typically were otherinflected forms (including bare stems) that happened toreceive stronger support than the targeted form Such errorsare not uncommon in spoken English For instance inthe Buckeye corpus [110] closest is once reduced to closFurthermore Dell [90] classified forms such as concludementfor conclusion and he relax for he relaxes as (noncontextual)errors

The algorithm failed to produce the targeted form for3 of the cases Examples of the forms produced instead arethe past tense for blazed being realized with [zId] insteadof [zd] the plural mouths being predicted as having [Ts] ascoda rather than [Ds] and finest being reduced to finst Thevoiceless production formouth does not follow the dictionarynorm but is used as attested by on-line pronunciationdictionaries Furthermore the voicing alternation is partlyunpredictable (see eg [128] for final devoicing in Dutch)and hence model performance here is not unreasonable Wenext consider model accuracy for derived words

523 Performance on Derived Words When building thevector space model we distinguished between inflectionand word formation Inflected words did not receive theirown semantic vectors By contrast each derived word wasassigned its own lexome together with a lexome for itsderivational function Thus happiness was paired with twolexomes happiness and ness Since the semantic matrix forour dataset already contains semantic vectors for derivedwords we first investigated how well forms are predictedwhen derived words are assessed along with monolexomicwords without any further differentiation between the twoTo this end we constructed a semantic matrix S for 4885words (rows) by 4993 (columns) and constructed the trans-formation matrix G from this matrix and the correspondingtriphone matrix T The predicted form vectors of T =SG supported the targeted triphones above all other tri-phones without exception Furthermore the graph-basedalgorithm correctly reconstructed 99 of all forms withonly 5 cases where it assigned the correct form secondrank

Next we inspected how the algorithm performs whenthe semantic vectors of derived forms are obtained fromthe semantic vectors of their base words and those of theirderivational lexomes instead of using the semantic vectors ofthe derived words themselves To allow subsequent evalua-tion by cross-validation we selected those derived words that

contained an affix that occured at least 30 times in our data set(again (38) agent (177) ful (45) instrument (82) less(54) ly (127) and ness (57)) to a total of 770 complex words

We first combined these derived words with the 3987monolexomic words For the resulting 4885 words the Tand S matrices were constructed from which we derivedthe transformation matrix G and subsequently the matrixof predicted triphone strengths T The proportion of wordsfor which the targeted triphones were the best supportedtriphones was 096 and the graph algorithm performed withan accuracy of 989

In order to assess the productivity of the system weevaluated performance on derived words by means of 10-fold cross-validation For each fold we made sure that eachaffix was present proportionally to its frequency in the overalldataset and that a derived wordrsquos base word was included inthe training set

We first examined performance when the transformationmatrix is estimated from a semantic matrix that contains thesemantic vectors of the derived words themselves whereassemantic vectors for unseen derived words are obtained bysummation of the semantic vectors of base and affix It turnsout that this set-up results in a total failure In the presentframework the semantic vectors of derived words are tooidiosyncratic and too finely tuned to their own collocationalpreferences They are too scattered in semantic space tosupport a transformation matrix that supports the triphonesof both stem and affix

We then used exactly the same cross-validation procedureas outlined above for inflected words constructing semanticvectors for derived words from the semantic vectors of baseand affix and calculating the transformation matrix G fromthese summed vectors For unseen derivedwordsGwas usedto transform the semantic vectors for novel derived words(obtained by summing the vectors of base and affix) intotriphone space

For 75 of the derived words the graph algorithmreconstructed the targeted triphones For 14 of the derivednouns the targeted form was not retrieved These includecases such as resound which the model produced with [s]instead of the (unexpected) [z] sewer which the modelproduced with R instead of only R and tumbler where themodel used syllabic [l] instead of nonsyllabic [l] given in thecelex target form

53 Weak Links in the Triphone Graph and Delays in SpeechProduction An important property of the model is that thesupport for trigrams changes where a wordrsquos graph branchesout for different morphological variants This is illustratedin Figure 9 for the inflected form blending Support for thestem-final triphone End is still at 1 but then blend can endor continue as blends blended or blending The resultinguncertainty is reflected in the weights on the edges leavingEnd In this example the targeted ing form is driven by theinflectional semantic vector for continuous and hence theedge to ndI is best supported For other forms the edgeweights will be different and hence other paths will be bettersupported

24 Complexity

1

1

1051

024023

03

023

1

001 001

1

02

03

023

002

1 0051

bl

blE

lEn

EndndI

dIN

IN

INI

NIN

dId

Id

IdI

dII

IIN

ndz

dz

dzI

zIN

nd

EnI

nIN

lEI

EIN

Figure 9 The directed graph for blending Vertices represent triphones including triphones such as IIN that were posited by the graphalgorithm to bridge potentially relevant transitions that are not instantiated in the training data Edges are labelled with their edge weightsThe graph the edge weights of which are specific to the lexomes blend and continuous incorporates not only the path for blending butalso the paths for blend blends and blended

The uncertainty that arises at branching points in thetriphone graph is of interest in the light of several exper-imental results For instance inter keystroke intervals intyping become longer at syllable and morph boundaries[129 130] Evidence from articulography suggests variabilityis greater at morph boundaries [131] Longer keystroke exe-cution times and greater articulatory variability are exactlywhat is expected under reduced edge support We thereforeexamined whether the edge weights at the first branchingpoint are predictive for lexical processing To this end weinvestigated the acoustic duration of the segment at the centerof the first triphone with a reduced LDL edge weight (in thepresent example d in ndI henceforth lsquobranching segmentrsquo)for those words in our study that are attested in the Buckeyecorpus [110]

The dataset we extracted from the Buckeye corpus com-prised 15105 tokens of a total of 1327 word types collectedfrom 40 speakers For each of these words we calculatedthe relative duration of the branching segment calculatedby dividing segment duration by word duration For thepurposes of statistical evaluation the distribution of thisrelative duration was brought closer to normality bymeans ofa logarithmic transformationWe fitted a generalized additivemixed model [132] to log relative duration with randomintercepts for word and speaker log LDL edge weight aspredictor of interest and local speech rate (Phrase Rate)log neighborhood density (Ncount) and log word frequencyas control variables A summary of this model is presentedin Table 7 As expected an increase in LDL Edge Weight

goes hand in hand with a reduction in the duration of thebranching segment In other words when the edge weight isreduced production is slowed

The adverse effects of weaknesses in a wordrsquos path inthe graph where the path branches is of interest againstthe discussion about the function of weak links in diphonetransitions in the literature on comprehension [133ndash138] Forcomprehension it has been argued that bigram or diphonelsquotroughsrsquo ie low transitional probabilities in a chain of hightransitional probabilities provide points where sequencesare segmented and parsed into their constituents Fromthe perspective of discrimination learning however low-probability transitions function in exactly the opposite wayfor comprehension [54 124] Naive discriminative learningalso predicts that troughs should give rise to shorter process-ing times in comprehension but not because morphologicaldecomposition would proceed more effectively Since high-frequency boundary bigrams and diphones are typically usedword-internally across many words they have a low cuevalidity for thesewords Conversely low-frequency boundarybigrams are much more typical for very specific base+affixcombinations and hence are better discriminative cues thatafford enhanced activation of wordsrsquo lexomes This in turngives rise to faster processing (see also Ramscar et al [78] forthe discriminative function of low-frequency bigrams in low-frequency monolexomic words)

There is remarkable convergence between the directedgraphs for speech production such as illustrated in Figure 9and computational models using temporal self-organizing

Complexity 25

Table 7 Statistics for the partial effects of a generalized additive mixed model fitted to the relative duration of edge segments in the Buckeyecorpus The trend for log frequency is positive accelerating TPRS thin plate regression spline

A parametric coefficients Estimate Std Error t-value p-valueIntercept -19165 00592 -323908 lt 00001Log LDL Edge Weight -01358 00400 -33960 00007Phrase Rate 00047 00021 22449 00248Log Ncount 01114 00189 58837 lt 00001B smooth terms edf Refdf F-value p-valueTPRS log Frequency 19007 19050 75495 00004random intercepts word 11498839 13230000 314919 lt 00001random intercepts speaker 302995 390000 345579 lt 00001

maps (TSOMs [139ndash141]) TSOMs also use error-drivenlearning and in recent work attention is also drawn toweak edges in wordsrsquo paths in the TSOM and the conse-quences thereof for lexical processing [142] We are not usingTSOMs however one reason being that in our experiencethey do not scale up well to realistically sized lexiconsA second reason is that we are pursuing the hypothesisthat form serves meaning and that self-organization ofform by itself is not necessary This hypothesis is likelytoo strong especially as we do not provide a rationale forthe trigram and triphone units that we use to representaspects of form It is worth noting that spatial organizationof form features is not restricted to TSOMs Although inour approach the spatial organization of triphone features isleft unspecified such an organization can be easily enforced(see [85] for further details) by using an algorithm suchas graphopt which self-organizes the vertices of a graphinto a two-dimensional plane Figure 9 was obtained withthis algorithm (Graphopt was developed for the layout oflarge graphs httpwwwschmuhlorggraphopt andis implemented in the igraph package Graphopt uses basicprinciples of physics to iteratively determine an optimallayout Each node in the graph is given both mass and anelectric charge and edges between nodes are modeled asspringsThis sets up a system inwhich there are attracting andrepelling forces between the vertices of the graph and thisphysical system is simulated until it reaches an equilibrium)

54 Discussion Our production network reconstructsknown wordsrsquo forms with a high accuracy 999 formonolexomic words 92 for inflected words and 99 forderived words For novel complex words accuracy under10-fold cross-validation was 62 for inflected words and 75for derived words

The drop in performance for novel forms is perhapsunsurprising given that speakers understand many morewords than they themselves produce even though they hearnovel forms on a fairly regular basis as they proceed throughlife [79 143]

However we also encountered several technical problemsthat are not due to the algorithm but to the representationsthat the algorithm has had to work with First it is surprisingthat accuracy is as high as it is given that the semanticvectors are constructed from a small corpus with a very

simple discriminative algorithm Second we encounteredinconsistencies in the phonological forms retrieved from thecelex database inconsistencies that in part are due to theuse of discrete triphones Third many cases where the modelpredictions do not match the targeted triphone sequence thetargeted forms have a minor irregularity (eg resound with[z] instead of [s]) Fourth several of the typical errors that themodel makes are known kinds of speech errors or reducedforms that one might encounter in engaged conversationalspeech

It is noteworthy that themodel is almost completely data-driven The triphones are derived from celex and otherthan somemanual corrections for inconsistencies are derivedautomatically from wordsrsquo phone sequences The semanticvectors are based on the tasa corpus and were not in any wayoptimized for the production model Given the triphone andsemantic vectors the transformation matrix is completelydetermined No by-hand engineering of rules and exceptionsis required nor is it necessary to specify with hand-codedlinks what the first segment of a word is what its secondsegment is etc as in the weaver model [37] It is only in theheuristic graph algorithm that three thresholds are requiredin order to avoid that graphs become too large to remaincomputationally tractable

The speech production system that emerges from thisapproach comprises first of all an excellent memory forforms that have been encountered before Importantly thismemory is not a static memory but a dynamic one It is not arepository of stored forms but it reconstructs the forms fromthe semantics it is requested to encode For regular unseenforms however a second network is required that projects aregularized semantic space (obtained by accumulation of thesemantic vectors of content and inflectional or derivationalfunctions) onto the triphone output space Importantly itappears that no separate transformations or rules are requiredfor individual inflectional or derivational functions

Jointly the two networks both of which can also betrained incrementally using the learning rule ofWidrow-Hoff[44] define a dual route model attempts to build a singleintegrated network were not successfulThe semantic vectorsof derived words are too idiosyncratic to allow generalizationfor novel forms It is the property of the semantic vectorsof derived lexomes described in Section 323 namely thatthey are close to but not inside the cloud of their content

26 Complexity

lexomes which makes them productive Novel forms do notpartake in the idiosyncracies of lexicalized existing wordsbut instead remain bound to base and derivational lexomesIt is exactly this property that makes them pronounceableOnce produced a novel form will then gravitate as experi-ence accumulates towards the cloud of semantic vectors ofits morphological category meanwhile also developing itsown idiosyncracies in pronunciation including sometimeshighly reduced pronunciation variants (eg tyk for Dutchnatuurlijk ([natyrlk]) [111 144 145] It is perhaps possibleto merge the two routes into one but for this much morefine-grained semantic vectors are required that incorporateinformation about discourse and the speakerrsquos stance withrespect to the adressee (see eg [115] for detailed discussionof such factors)

6 Bringing in Time

Thus far we have not considered the role of time The audiosignal comes in over time longerwords are typically readwithmore than one fixation and likewise articulation is a temporalprocess In this section we briefly outline how time can bebrought into the model We do so by discussing the readingof proper names

Proper names (morphologically compounds) pose achallenge to compositional theories as the people referredto by names such as Richard Dawkins Richard NixonRichardThompson and SandraThompson are not obviouslysemantic composites of each other We therefore assignlexomes to names irrespective of whether the individu-als referred to are real or fictive alive or dead Further-more we assume that when the personal name and familyname receive their own fixations both names are under-stood as pertaining to the same named entity which istherefore coupled with its own unique lexome Thus forthe example sentences in Table 8 the letter trigrams ofJohn are paired with the lexome JohnClark and like-wise the letter trigrams of Clark are paired with this lex-ome

By way of illustration we obtained thematrix of semanticvectors training on 100 randomly ordered tokens of thesentences of Table 8 We then constructed a trigram cuematrix C specifying for each word (John Clark wrote )which trigrams it contains In parallel a matrix L specify-ing for each word its corresponding lexomes (JohnClarkJohnClark write ) was set up We then calculatedthe matrix F by solving CF = L and used F to cal-culate estimated (predicted) semantic vectors L Figure 10presents the correlations of the estimated semantic vectorsfor the word forms John John Welsch Janet and Clarkwith the targeted semantic vectors for the named entitiesJohnClark JohnWelsch JohnWiggam AnneHastieand JaneClark For the sequentially read words John andWelsch the semantic vector generated by John and thatgenerated by Welsch were summed to obtain the integratedsemantic vector for the composite name

Figure 10 upper left panel illustrates that upon readingthe personal name John there is considerable uncertainty

about which named entity is at issue When subsequentlythe family name is read (upper right panel) uncertainty isreduced and John Welsch now receives full support whereasother named entities have correlations close to zero or evennegative correlations As in the small world of the presenttoy example Janet is a unique personal name there is nouncertainty about what named entity is at issue when Janetis read For the family name Clark on its own by contrastJohn Clark and Janet Clark are both viable options

For the present example we assumed that every ortho-graphic word received one fixation However depending onwhether the eye lands far enough into the word namessuch as John Clark and compounds such as graph theorycan also be read with a single fixation For single-fixationreading learning events should comprise the joint trigrams ofthe two orthographic words which are then simultaneouslycalibrated against the targeted lexome (JohnClark graph-theory)

Obviously the true complexities of reading such asparafoveal preview are not captured by this example Never-theless as a rough first approximation this example showsa way forward for integrating time into the discriminativelexicon

7 General Discussion

We have presented a unified discrimination-driven modelfor visual and auditory word comprehension and wordproduction the discriminative lexicon The layout of thismodel is visualized in Figure 11 Input (cochlea and retina)and output (typing and speaking) systems are presented inlight gray the representations of themodel are shown in darkgray For auditory comprehension we make use of frequencyband summary (FBS) features low-level cues that we deriveautomatically from the speech signal The FBS features serveas input to the auditory network F119886 that maps vectors of FBSfeatures onto the semantic vectors of S For reading lettertrigrams represent the visual input at a level of granularitythat is optimal for a functional model of the lexicon (see [97]for a discussion of lower-level visual features) Trigrams arerelatively high-level features compared to the FBS featuresFor features implementing visual input at a much lower levelof visual granularity using histograms of oriented gradients[146] see Linke et al [15] Letter trigrams are mapped by thevisual network F119900 onto S

For reading we also implemented a network heredenoted by K119886 that maps trigrams onto auditory targets(auditory verbal images) represented by triphones Theseauditory triphones in turn are mapped by network H119886onto the semantic vectors S This network is motivated notonly by our observation that for predicting visual lexicaldecision latencies triphones outperform trigrams by a widemargin Network H119886 is also part of the control loop forspeech production see Hickok [147] for discussion of thiscontrol loop and the necessity of auditory targets in speechproduction Furthermore the acoustic durations with whichEnglish speakers realize the stem vowels of English verbs[148] as well as the duration of word final s in English [149]

Complexity 27

Table 8 Example sentences for the reading of proper names To keep the example simple inflectional lexomes are not taken into account

John Clark wrote great books about antsJohn Clark published a great book about dangerous antsJohn Welsch makes great photographs of airplanesJohn Welsch has a collection of great photographs of airplanes landingJohn Wiggam will build great harpsichordsJohn Wiggam will be a great expert in tuning harpsichordsAnne Hastie has taught statistics to great second year studentsAnne Hastie has taught probability to great first year studentsJanet Clark teaches graph theory to second year studentsJohnClark write great book about antJohnClark publish a great book about dangerous antJohnWelsch make great photograph of airplaneJohnWelsch have a collection of great photograph airplane landingJohnWiggam future build great harpsichordJohnWiggam future be a great expert in tuning harpsichordAnneHastie have teach statistics to great second year studentAnneHastie have teach probability to great first year studentJanetClark teach graphtheory to second year student

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

John

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

John Welsch

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

Janet

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

Clark

Pear

son

corr

elat

ion

minus0

50

00

51

0

Figure 10 Correlations of estimated semantic vectors for John JohnWelsch Janet andClarkwith the targeted semantic vectors of JohnClarkJohnWelsch JohnWiggam AnneHastie and JaneClark

can be predicted from ndl networks using auditory targets todiscriminate between lexomes [150]

Speech production is modeled by network G119886 whichmaps semantic vectors onto auditory targets which in turnserve as input for articulation Although current modelsof speech production assemble articulatory targets fromphonemes (see eg [126 147]) a possibility that is certainlycompatible with our model when phones are recontextual-ized as triphones we think that it is worthwhile investigatingwhether a network mapping semantic vectors onto vectors ofarticulatory parameters that in turn drive the articulators canbe made to work especially since many reduced forms arenot straightforwardly derivable from the phone sequences oftheir lsquocanonicalrsquo forms [111 144]

We have shown that the networks of our model havereasonable to high production and recognition accuraciesand also generate a wealth of well-supported predictions forlexical processing Central to all networks is the hypothesisthat the relation between form and meaning can be modeleddiscriminatively with large but otherwise surprisingly simplelinear networks the underlyingmathematics of which (linearalgebra) is well-understood Given the present results it isclear that the potential of linear algebra for understandingthe lexicon and lexical processing has thus far been severelyunderestimated (the model as outlined in Figure 11 is amodular one for reasons of computational convenience andinterpretabilityThe different networks probably interact (seefor some evidence [47]) One such interaction is explored

28 Complexity

typing speaking

orthographic targets

trigrams

auditory targets

triphones

semantic vectors

TASA

auditory cues

FBS features

orthographic cues

trigrams

cochlea retina

To Ta

Ha

GoGa

KaS

Fa Fo

CoCa

Figure 11 Overview of the discriminative lexicon Input and output systems are presented in light gray the representations of the model areshown in dark gray Each of these representations is a matrix with its own set of features Arcs are labeled with the discriminative networks(transformationmatrices) thatmapmatrices onto each otherThe arc from semantic vectors to orthographic vectors is in gray as thismappingis not explored in the present study

in Baayen et al [85] In their production model for Latininflection feedback from the projected triphone paths backto the semantics (synthesis by analysis) is allowed so thatof otherwise equally well-supported paths the path that bestexpresses the targeted semantics is selected For informationflows between systems in speech production see Hickok[147]) In what follows we touch upon a series of issues thatare relevant for evaluating our model

Incremental learning In the present study we esti-mated the weights on the connections of these networkswith matrix operations but importantly these weights canalso be estimated incrementally using the learning rule ofWidrow and Hoff [44] further improvements in accuracyare expected when using the Kalman filter [151] As allmatrices can be updated incrementally (and this holds aswell for the matrix with semantic vectors which are alsotime-variant) and in theory should be updated incrementally

whenever information about the order of learning events isavailable the present theory has potential for studying lexicalacquisition and the continuing development of the lexiconover the lifetime [79]

Morphology without compositional operations Wehave shown that our networks are predictive for a rangeof experimental findings Important from the perspective ofdiscriminative linguistics is that there are no compositionaloperations in the sense of Frege and Russell [1 2 152] Thework on compositionality in logic has deeply influencedformal linguistics (eg [3 153]) and has led to the beliefthat the ldquoarchitecture of the language facultyrdquo is groundedin a homomorphism between a calculus (or algebra) ofsyntactic representations and a calculus (or algebra) based onsemantic primitives Within this tradition compositionalityarises when rules combining representations of form arematched with rules combining representations of meaning

Complexity 29

The approach of Marelli and Baroni [60] who derive thesemantic vector of a complex word from the semantic vectorof its base and a dedicated affix-specific mapping from thepertinent base words to the corresponding derived wordsis in the spirit of compositionality in that a change in form(through affixation) goes hand in hand with a change inmeaning (modeled with a linear mapping in semantic space)

The perspective on morphological processing developedin the present study breaks with this tradition Althoughsemantic vectors for inflected words are obtained by summa-tion of the semantic vectors of the base word and those ofits inflectional functions (see [85] for modeling of the richverbal paradigms of Latin) the form vectors correspondingto the resulting semantic vectors are not obtained by thesummation of vectors for chunks of form nor by any otheroperations on forms Attempts to align chunks of wordsrsquo formwith their semantic structures mdash typically represented bygraphs constructed over semantic primitives (see eg [154])mdash are destined to end in quagmires of arbitrary decisionsabout the number of inflectional classes required (Estonian30 or 300) what the chunks should look like (does Germanhave a morpheme d in its articles and demonstratives derdie das diese ) whether Semitic morphology requiresoperations on form on one or two tiers [155ndash157] andhow many affixal position slots are required for Athebascanlanguages such asNavajo [158] A language that is particularlypainful for compositional theories is Yelı Dnye an Indo-Pacific language spoken on Rossel island (in the south-east ofPapuaNewGuinea)The language has a substantial inventoryof over a 1000 function words used for verb agreementTypically these words are monosyllables that simultaneouslyspecify values for negation tense aspect person and numberof subject deixis evidentiality associated motion and coun-terfactuality Furthermore verbs typically have no less thaneight different stems the use of which depends on particularconstellations of inflectional functions [159 160] Yelı Dnyethus provides a striking counterexample to the hypothesisthat a calculus of meaning would be paralleled by a calculusof form

Whereas the theory of linear discriminative learningradically rejects the idea that morphological processing isgrounded in compositionality as defined in mainstream for-mal linguistic theory it does allow for the conjoint expressionin form of lexical meanings and inflectional functions andfor forms to map onto such conjoint semantic vectors Butjust asmultiple inflectional functions can be expressed jointlyin a single word multiple words can map onto nonconjointsemantic vectors as was illustrated above for named entityrecognition John Wiggam can be understood to refer to avery specific person without this person being a composi-tional function of the proper name John and the surnameWiggam In both these cases there is no isomorphy betweeninflectional functions and lexical meanings on the one handand chunks of form on the other hand

Of course conjoint vectors for lexical meanings andinflectional functions can be implemented andmade to workonly because a Latin verb form such as sapıvissemus isunderstood when building the semantic space S to expresslexomes for person (first) number (plural) voice (active)

tense (pluperfect) mood (subjunctive) and lexical meaning(to know) (see [85] for computational modeling details)Although this could loosely be described as a decomposi-tional approach to inflected words we prefer not to use theterm lsquodecompositionalrsquo as in formal linguistics this termas outlined above has a very different meaning A moreadequate terminology would be that the system is conjointrealizational for production and conjoint inferential for com-prehension (we note here that we have focused on temporallyconjoint realization and inference leaving temporally disjointrealization and inference for sequences of words and possiblycompounds and fixed expression for further research)

Naive versus linear discriminative learning There isone important technical difference between the learningengine of the current study ldl and naive discriminativelearning (ndl) Because ndl makes the simplifying assump-tion that outcomes are orthogonal the weights from the cuesto a given outcome are independent of the weights from thecues to the other outcomes This independence assumptionwhich motivates the word naive in naive discriminativelearning makes it possible to very efficiently update networkweights during incremental learning In ldl by contrast thisindependence no longer holds learning is no longer naiveWe therefore refer to our networks not as ndl networksbut simply as linear discriminative learning networks Actualincremental updating of ldl networks is computationallymore costly but further numerical optimization is possibleand implementations are underway

Scaling An important property of the present approachis that good results are obtained already for datasets ofmodestsize Our semantic matrix is derived from the tasa corpuswhich with only 10 million words is dwarfed by the 28 billionwords of the ukWaC corpus used by Marelli and Baroni [60]more words than any individual speaker can ever encounterin their lifetime Similarly Arnold et al [18] report goodresults for word identification when models are trained on20 hours of speech which contrasts with the huge volumesof speech required for training deep learning networks forspeech recognition Although performance increases some-whatwithmore training data [161] it is clear that considerableheadway can be made with relatively moderate volumesof speech It thus becomes feasible to train models onfor instance Australian English and New Zealand Englishusing already existing speech corpora and to implementnetworks that make precise quantitative predictions for howunderstanding is mediated by listenersrsquo expectations abouttheir interlocutors (see eg [162])

In this study we have obtained good results with asemantic matrix with a dimensionality of around 4000lexomes By itself we find it surprising that a semantic vectorspace can be scaffolded with a mere 4000 words But thisfindingmay also shed light on the phenomenon of childhoodamnesia [163 164] Bauer and Larkina [165] pointed outthat young children have autobiographical memories thatthey may not remember a few years later They argue thatthe rate of forgetting diminishes as we grow older andstabilizes in adulthood Part of this process of forgettingmay relate to the changes in the semantic matrix Cruciallywe expect the correlational structure of lexomes to change

30 Complexity

considerably over time as experience accumulates If this isindeed the case the semantic vectors for the lexomes foryoung children are different from the corresponding lexomesfor older children which will again differ from those ofadults As a consequence autobiographical memories thatwere anchored to the lexomes at a young age can no longerbe accessed as the semantic vectors to which these memorieswere originally anchored no longer exist

No examplars Given how we estimate the transforma-tion matrices with which we navigate between form andmeaning it might seem that our approach is in essence animplementation of an exemplar-based theory of the mentallexicon After all for instance the C119886 matrix specifies foreach auditory word form (exemplar) its pertinent frequencyband summary features However the computational imple-mentation of the present study should not be confused withthe underlying algorithmic conceptualization The underly-ing conceptualization is that learning proceeds incrementallytrial by trial with the weights of transformation networksbeing updated with the learning rule of Widrow and Hoff[44] In the mean trial-by-trial updating with the Widrow-Hoff learning rule results in the same expected connectionweights as obtained by the present estimates using thegeneralized inverse The important conceptual advantage ofincremental learning is however that there is no need to storeexemplars By way of example for auditory comprehensionacoustic tokens are given with the input to the learningsystemThese tokens the ephemeral result of interactingwiththe environment leave their traces in the transformationmatrix F119886 but are not themselves stored The same holdsfor speech production We assume that speakers dynamicallyconstruct the semantic vectors from their past experiencesrather than retrieving these semantic vectors from somedictionary-like fixed representation familiar from currentfile systems The dynamic nature of these semantic vectorsemerged loud and clear from the modeling of novel complexwords for speech production under cross-validation In otherwords our hypothesis is that all input and output vectors areephemeral in the sense that they represent temporary statesof the cognitive system that are not themselves represented inthat system but that leave their traces in the transformationmatrices that jointly characterize and define the cognitivesystem that we approximate with the discriminative lexicon

This dynamic perspective on the mental lexicon asconsisting of several coupled networks that jointly bringthe cognitive system into states that in turn feed into fur-ther networks (not modeled here) for sentence integration(comprehension) and articulation (production) raises thequestion about the status of units in this theory Units playan important role when constructing numeric vectors forform and meaning Units for letter trigrams and phonetrigrams are of course context-enriched letter and phoneunits Content lexomes as well as lexomes for derivationaland inflectional functions are crutches for central semanticfunctions ranging from functions realizing relatively concreteonomasiological functions (lsquothis is a dog not a catrsquo) toabstract situational functions allowing entities events andstates to be specified on the dimensions of time spacequantity aspect etc However crucial to the central thrust of

the present study there are no morphological symbols (unitscombining form andmeaning) normorphological formunitssuch as stems and exponents of any kind in the modelAbove we have outlined the many linguistic considerationsthat have led us not to want to incorporate such units aspart of our theory Importantly as there are no hidden layersin our model it is not possible to seek for lsquosubsymbolicrsquoreflexes of morphological units in patterns of activation overhidden layers Here we depart from earlier connectionistmodels which rejected symbolic representations but retainedthe belief that there should be morphological representationalbeit subsymbolic ones Seidenberg and Gonnerman [166]for instance discuss morphological representations as beinglsquointerlevel representationsrsquo that are emergent reflections ofcorrelations among orthography phonology and semanticsThis is not to say that themore agglutinative a language is themore the present model will seem to the outside observer tobe operating withmorphemes But themore a language tendstowards fusional or polysyntheticmorphology the less strongthis impression will be

Model complexity Next consider the question of howdifficult lexical processing actually is Sidestepping the com-plexities of the neural embedding of lexical processing [126147] here we narrow down this question to algorithmiccomplexity State-of-the-art models in psychology [37 114167] implementmany layers of hierarchically organized unitsand many hold it for an established fact that such units arein some sense psychologically real (see eg [19 91 168])However empirical results can be equally well understoodwithout requiring the theoretical construct of the morpheme(see eg [47 54 58 166 169 170]) The present studyillustrates this point for lsquoboundaryrsquo effects in written andoral production When paths in the triphone graph branchuncertainty increases and processing slows down Althoughobserved lsquoboundaryrsquo effects are compatible with a post-Bloomfieldian lexicon the very same effects arise in ourmodel even though morphemes do not figure in any way inour computations

Importantly the simplicity of the linear mappings in ourmodel of the discriminative lexicon allows us to sidestep theproblem that typically arise in computationally implementedfull-scale hierarchical systems namely that errors arisingat lower levels propagate to higher levels The major stepsforward made in recent end-to-end models in languageengineering suggest that end-to-end cognitive models arealso likely to perform much better One might argue thatthe hidden layers of deep learning networks represent thetraditional lsquohidden layersrsquo of phonemes morphemes andword forms mediating between form and meaning in lin-guistic models and offshoots thereof in psychology Howeverthe case of baboon lexical learning [12 13] illustrates thatthis research strategy is not without risk and that simplerdiscriminative networks can substantially outperform deeplearning networks [15] We grant however that it is conceiv-able that model predictions will improve when the presentlinear networks are replaced by deep learning networks whenpresented with the same input and output representationsused here (see Zhao et al [171] for a discussion of lossfunctions for deep learning targeting numeric instead of

Complexity 31

categorical output vectors) What is unclear is whethersuch networks will improve our understandingmdashthe currentlinear networks offer the analyst connection weights betweeninput and output representations that are straightforwardlylinguistically interpretable Where we think deep learningnetworks really will come into their own is bridging the gapbetween for instance the cochlea and retina on the one handand the heuristic representations (letter trigrams) that wehave used as a starting point for our functional theory of themappings between form and meaning

Network flexibility An important challenge for deeplearning networks designed to model human lexical process-ing is to keep the networks flexible and open to rapid updat-ing This openness to continuous learning is required notonly by findings on categorization [40 41 41 42] includingphonetic categorization [172 173] but is also visible in timeseries of reaction times Baayen and Hendrix [17] reportedimproved prediction of visual lexical decision reaction timesin the British Lexicon Project when a naive discriminativelearning network is updated from trial to trial simulating thewithin-experiment learning that goes on as subjects proceedthrough the experiment In natural speech the phenomenonof phonetic convergence between interlocutors [174] likewisebears witness to a lexicon that is constantly recalibratedin interaction with its environment (for the many ways inwhich interlocutorsrsquo speech aligns see [175]) Likewise therapidity with which listeners adapt to foreign accents [176]within a few minutes shows that lexical networks exhibitfast local optimization As deep learning networks typicallyrequire huge numbers of cycles through the training dataonce trained they are at risk of not having the requiredflexibility for fast local adaptation This risk is reduced for(incremental) wide learning as illustrated by the phoneticconvergence evidenced by the model of Arnold et al [18] forauditory comprehension (a very similar point was made byLevelt [177] in his critique of connectionist models in thenineties)

Open questions This initial study necessarily leavesmany questions unanswered Compounds and inflectedderived words have not been addressed and morphologicaloperations such as reduplication infixation and nonconcate-native morphology provide further testing grounds for thepresent approach We also need to address the processingof compounds with multiple constituents ubiquitous inlanguages as different as German Mandarin and EstonianFurthermore in actual utterances words undergo assimila-tion at their boundaries and hence a crucial test case forthe discriminative lexicon is to model the production andcomprehension of words in multi-word utterances For ourproduction model we have also completely ignored stresssyllabification and pitch which however likely has renderedthe production task more difficult than necessary

A challenge for discriminative morphology is the devel-opment of proper semantic matrices For languages withrich morphology such as Estonian current morphologicalparsers will identify case-inflected words as nominativesgenitives or partitives (among others) but these labelsfor inflectional forms do not correspond to the semanticfunctions encoded in these forms (see eg [178 179]) What

is required are computational tools that detect the appropriateinflectional lexomes for these words in the sentences orutterances in which they are embedded

A related problem specifically for the speech productionmodel is how to predict the strongly reduced word formsthat are rampant in conversational speech [111] Our auditorycomprehension network F119886 is confronted with reduced formsin training and the study of Arnold et al [18] indicates thatpresent identification accuracymay approximate human per-formance for single-word recognition Whereas for auditorycomprehension we are making progress what is required forspeech production is a much more detailed understanding ofthe circumstances under which speakers actually use reducedword forms The factors driving reduction may be in partcaptured by the semantic vectors but we anticipate thataspects of the unfolding discourse play an important role aswell which brings us back to the unresolved question of howto best model not isolated words but words in context

An important challenge for our theory which formalizesincremental implicit learning is how to address and modelthe wealth of explicit knowledge that speakers have abouttheir language We play with the way words sound in poetrywe are perplexed by wordsrsquo meanings and reinterpret themso that they make more sense (eg the folk etymology [180]reflected in modern English crawfish a crustacean and nota fish originally Middle English crevis compare ecrevissein French) we teach morphology in schools and we evenfind that making morphological relations explicit may bebeneficial for aphasic patients [181] The rich culture of worduse cannot but interact with the implicit system but howexactly this interaction unfolds andwhat its consequences areat both sides of the consciousunconscious divide is presentlyunclear

Although many questions remain our model is remark-ably successful in capturing many aspects of lexical andmorphological processing in both comprehension and pro-duction Since themodel builds on a combination of linguisti-callymotivated lsquosmartrsquo low-level representations for form andmeaning and large but otherwise straightforward andmathe-matically well-understood two-layer linear networks (to beclear networks without any hidden layers) the conclusionseems justified that algorithmically the ldquomental lexiconrdquomaybe much simpler than previously thought

Appendix

A Vectors Matrices and MatrixMultiplication

Figure 12 presents two data points 119886 and 119887 with coordinates(3 4) and (-2 -3) Arrows drawn from the origin to thesepoints are shown in blue with a solid and dashed linerespectively We refer to these points as vectors Vectors aredenoted with lower case letters in bold and we place the 119909coordinate of a point next to the 119910 coordinate

a = (3 4) b = (minus2 minus3) (A1)

32 Complexity

minus4 minus2 0 2 4

minus4

minus2

02

4

x

y

a

b

x

y

Figure 12 Points 119886 and 119887 can be transformed into points 119901 and 119902 bya linear transformation

We can represent a and b jointly by placing them next to eachother in a matrix For matrices we use capital letters in boldfont

A = ( 3 4minus2 minus3) (A2)

In addition to the points 119886 and 119887 Figure 12 also shows twoother datapoints 119909 and 119910 The matrix for these two points is

B = (minus5 24 minus1) (A3)

Let us assume that points 119886 and 119887 represent the forms oftwo words 1205961 and 1205962 and that the points 119909 and 119910 representthe meanings of these two words (below we discuss in detailhow exactly wordsrsquo forms and meanings can be representedas vectors of numbers) As morphology is the study of therelation between wordsrsquo forms and their meanings we areinterested in how to transform points 119886 and b into points119909 and 119910 and likewise in how to transform points 119909 and119910 back into points 119886 and 119887 The first transformation isrequired for comprehension and the second transformationfor production

We begin with the transformation for comprehensionFormally we have matrix A which we want to transforminto matrix B A way of mapping A onto B is by means ofa linear transformation using matrix multiplication For 2 times2 matrices matrix multiplication is defined as follows(1198861 11988621198871 1198872) (1199091 11990921199101 1199102) = (11988611199091 + 11988621199101 11988611199092 + 1198862119910211988711199091 + 11988721199101 11988711199092 + 11988721199102) (A4)

Such a mapping of A onto B is given by a matrix F

F = ( 1 2minus2 minus1) (A5)

and it is straightforward to verify that indeed( 3 4minus2 minus3) ( 1 2minus2 minus1) = (minus5 24 minus1) (A6)

Using matrix notation we can write

AF = B (A7)

Given A and B how do we obtain F The answeris straightforward but we need two further concepts theidentity matrix and the matrix inverse For multiplication ofnumbers the identity multiplication is multiplication with 1For matrices multiplication with the identity matrix I leavesthe locations of the points unchangedThe identitymatrix hasones on the main diagonal and zeros elsewhere It is easilyverified that indeed(1 00 1) (1199091 11990921199101 1199102) = (1199091 11990921199101 1199102) (1 00 1) = (1199091 11990921199101 1199102) (A8)

For numbers the inverse of multiplication with 119904 is dividingby 119904 (1119904 ) (119904119909) = 119909 (A9)

For matrices the inverse of a square matrix X is that matrixXminus1 such that their product is the identity matrix

Xminus1X = XXminus1 = I (A10)

For nonsquare matrices the inverse Yminus1 is defined such that

Yminus1 (YX) = (XY) 119884minus1 = X (A11)

We find the square matrix F that maps the square matrices Aonto B as follows

AF = B

Aminus1AF = Aminus1B

IF = Aminus1B

F = Aminus1B(A12)

Since for the present example A and its inverse happen to beidentical (Aminus1 = A) we obtain (A6)

Fmaps wordsrsquo form vectors onto wordsrsquo semantic vectorsLet us now consider the reverse and see how we can obtainwordsrsquo form vectors from wordsrsquo semantic vectors That isgiven B we want to find that matrix G which maps B ontoA ie

BG = A (A13)

Solving this equation proceeds exactly as above

BG = A

Bminus1BG = Bminus1A

IG = Bminus1A

G = Bminus1A(A14)

Complexity 33

The inverse of B is

Bminus1 = (13 2343 53) (A15)

and hence we have

(13 2343 53) ( 3 4minus2 minus3) = (minus13 minus2323 13 ) = G (A16)

The inverse of a matrix needs not exist A square matrixthat does not have an inverse is referred to as a singularmatrix Almost all matrices with which we work in theremainder of this study are singular For singular matricesan approximation of the inverse can be used such as theMoore-Penrose generalized inverse (In R the generalizedinverse is available in theMASS package function ginv Thenumeric library that is used by ginv is LAPACK availableat httpwwwnetliborglapack) In this study wedenote the generalized inverse of a matrix X by X1015840

The examples presented here are restricted to 2 times 2matrices but themathematics generalize tomatrices of largerdimensions When multiplying two matrices X and Y theonly constraint is that the number of columns ofX is the sameas the number of rows ofYThe resultingmatrix has the samenumber of rows as X and the same number of columns as Y

In the present study the rows of matrices represent wordsand columns the features of thesewords For instance the rowvectors of A can represent wordsrsquo form features and the rowvectors of and B the corresponding semantic features Thefirst row vector of A is mapped onto the first row vector ofB as follows (3 4) ( 1 2minus2 minus1) = (minus5 2) (A17)

A transformationmatrix such as F can be represented as atwo-layer networkThe network corresponding to F is shownin Figure 13 When the input vector (3 4) is presented to thenetwork the nodes 1198941 and 1198942 are set to 3 and 4 respectively Toobtain the activations 1199001 = minus5 and 1199002 = 2 for the output vector(minus5 2) we cumulate the evidence at the input weighted bythe connection strengths specified on the edges of the graphThus for 1199001 we obtain the value 3 times 1 + 4 times minus2 = minus5 and for1199002 we have 3 times 2 + 4 times minus1 = 2 In other words the networkmaps the input vector (3 4) onto the output vector (minus5 2)

An important property of linear maps is that they areproductive We can present a novel form vector say

s = (2 2) (A18)

to the network and it will map this vector onto a novelsemantic vector Using matrices this new semantic vector isstraightforwardly calculated(2 2) ( 1 2minus2 minus1) = (minus2 2) (A19)

1 2 minus2 minus1

i1 i2

o1 o2

semantic vectors come out here

form features go in here

connection weightsare the transformationmatrix

Figure 13 The linear network corresponding to the transformationmatrix F There are two input nodes 1198941 and 1198942 (in blue)and twooutput nodes 1199001 and 1199002 in red When the input vector (3 4) ispresented to the network the value at 1199001 is 3 times 1 + 4 times minus2 = minus5and that at 1199002 is 3 times 2 + 4 times minus1 = 2 The network maps the point(3 4) onto the point (minus5 2)B Graph-Based Triphone Sequencing

The hypothesis underlying graph-based triphone sequencingis that in the directed graph that has triphones as verticesand an edge between any two vertices that overlap (iesegments 2 and 3 of triphone 1 are identical to segments 1and 2 of triphone 2) the path from a wordrsquos initial triphone(the triphone starting with ) to a wordrsquos final triphone (thetriphone ending with a ) is the path receiving the bestsupport of all possible paths from the initial to the finaltriphone Unfortunately the directed graph containing allvertices and edges is too large tomake computations tractablein a reasonable amount of computation time The followingalgorithm is a heuristic algorithm that first collects potentiallyrelevant vertices and edges and then calculates all pathsstarting fromany initial vertex using theall simple pathsfunction from the igraph package [127] As a first step allvertices that are supported by the stem and whole wordwith network support exceeding 01 are selected To theresulting set those vertices are added that are supported bythe affix which is conditional on these vertices having anetwork support exceeding 095 A directed graph can nowbe constructed and for each initial vertex all paths startingat these vertices are calculated The subset of those pathsreaching a final vertex is selected and the support for eachpath is calculated As longer paths trivially will have greatersupport the support for a path is weighted for its lengthsimply by dividing the raw support by the path length Pathsare ranked by weighted path support and the path withmaximal support is selected as the acoustic image drivingarticulation

It is possible that no path from an initial vertex to a finalvertex is found due to critical boundary triphones not beinginstantiated in the data on which the model was trainedThis happens when the model is assessed on novel inflected

34 Complexity

nobo

dyso

met

hing

ever

ythi

ngno

thin

gev

eryo

nean

ythi

ngan

yone

som

eone

hers

him his

via

than I

exce

ptou

tside

thei

rsw

hoev

erth

ence

min

ein

side

amon

gst its

thei

rou

rm

yyo

uyo

ur her

she

your

selv

esw

hate

ver us we it

they he

them m

em

yself

him

self

hers

elfou

rselv

esyo

urse

lfits

elfth

emse

lves

abou

tw

ithou

tw

ithin

beyo

ndam

ong

betw

een

thro

ugho

utdu

ring

since by

from in of

with at fo

r tobe

low

abov

e up offdo

wn

upon on

agai

nst

tow

ard

tow

ards

arou

ndne

aral

ong

unde

rbe

hind

besid

eth

roug

hon

toin

toov

erac

ross

nobodysomethingeverythingnothingeveryoneanythinganyonesomeonehershimhisviathanIexceptoutsidetheirswhoeverthencemineinsideamongstitstheirourmyyouyourhersheyourselveswhateverusweittheyhethemmemyselfhimselfherselfourselvesyourselfitselfthemselvesaboutwithoutwithinbeyondamongbetweenthroughoutduringsincebyfrominofwithatfortobelowaboveupoffdownupononagainsttowardtowardsaroundnearalongunderbehindbesidethroughontointooveracross

minus01 0 01 02Value

Color Keyand Histogram

050

015

0025

00C

ount

Figure 14 Heatmap for the correlations of the semantic vectors of pronouns and prepositions Note that quantifier pronouns form a groupof their own that prepositions and the other pronouns form distinguisable groups and that within groups further subclusters are visible egforwe and us she and her and across and over All diagonal elements represent correlations equal to 1 (this is not brought out by the heatmapwhich color-codes diagonal elements with the color for the highest correlation of the range we specified 025)

and derived words under cross-validationTherefore verticesand edges are added to the graph for all nonfinal verticesthat are at the end of any of the paths starting at any initialtriphone Such novel vertices and edges are assigned zeronetwork support Typically paths from the initial vertex tothe final vertex can now be constructed and path support isevaluated as above

For an algorithm that allows triphone paths to includecycles which may be the case in languages with much richer

morphology than English see Baayen et al [85] and for codethe R packageWpmWithLdl described therein

C A Heatmap for Function Words(Pronouns and Prepositions)

The heatmap for the correlations of the semantic vectors ofpronouns and prepositions is presented in Figure 14

Complexity 35

Data Availability

With the exception of the data from theDistributed Little RedHen Lab all primary data are publicly available from the citedsources

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The authors are indebted to Geoff Hollis Jessie Nixon ChrisWestbury and LuisMienhardt for their constructive feedbackon earlier versions of this manuscript This research wassupported by an ERC advancedGrant (no 742545) to the firstauthor

References

[1] G Frege ldquoBegriffsschrift a formula language modeled uponthat of arithmetic for pure thoughtrdquo in From Frege to GodelA Source Book in Mathematical Logic 1879-1931 pp 1ndash82 1967

[2] B RussellAn Inquiry intoMeaning andTruth Allen andUnwinLondon UK 1942

[3] R Montague ldquoThe proper treatment of quantification in ordi-nary englishrdquo in Approaches to Natural Language pp 221ndash242Springer 1973

[4] N Chomsky Syntactic Structures vol 33 Mouton The Hague1957

[5] W J M Levelt Formal Grammars in Linguistics And Psycholin-guistics Volume 1 An Introduction to the Theory of FormalLanguages and Automata Volume 2 Applications in LinguisticTheory Volume 3 Psycholinguistic Applications John BenjaminsPublishing 2008

[6] D F Kleinschmidt and T Florian Jaeger ldquoRobust speechperception Recognize the familiar generalize to the similar andadapt to the novelrdquo Psychological Review vol 122 no 2 pp 148ndash203 2015

[7] L Breiman ldquoRandom forestsrdquo Machine Learning vol 45 no 1pp 5ndash32 2001

[8] C Strobl A-L Boulesteix A Zeileis and T Hothorn ldquoBiasin random forest variable importance measures illustrationssources and a solutionrdquo BMC Bioinformatics vol 8 2007

[9] A Hannun C Case J Casper et al ldquoDeep speechScaling up end-to-end speech recognitionrdquo 2014httpsarxivorgabs14125567

[10] Z Tuske P Golik R Schluter and H Ney ldquoAcoustic modelingwith deep neural networks using raw time signal for lvcsrrdquo inProceedings of the INTERSPEECH pp 890ndash894 2014

[11] J Schmidhuber ldquoDeep learning in neural networks anoverviewrdquo Neural Networks vol 61 pp 85ndash117 2015

[12] T Hannagan J C Ziegler S Dufau J Fagot and J GraingerldquoDeep learning of orthographic representations in baboonsrdquoPLoS ONE vol 9 no 1 2014

[13] J Grainger S Dufau M Montant J C Ziegler and J FagotldquoOrthographic processing in baboons (Papio papio)rdquo Sciencevol 336 no 6078 pp 245ndash248 2012

[14] D Scarf K Boy A U Reinert J Devine O Gunturkun andM Colombo ldquoOrthographic processing in pigeons (columba

livia)rdquo Proceedings of the National Academy of Sciences vol 113no 40 pp 11272ndash11276 2016

[15] M Linke F Broker M Ramscar and R H Baayen ldquoArebaboons learning ldquoorthographicrdquo representations Probablynotrdquo PLoS ONE vol 12 no 8 2017

[16] D H Hubel and T N Wiesel ldquoReceptive fields binocularinteraction and functional architecture in the catrsquos visualcortexrdquo The Journal of Physiology vol 160 pp 106ndash154 1962

[17] R H Baayen and P Hendrix ldquoTwo-layer networks non-linear separation and human learningrdquo in From Semantics toDialectometry Festschrift in honor of JohnNerbonne Tributes 32M Wieling M Kroon G Van Noord and G Bouma Eds pp13ndash22 College Publications 2017

[18] D Arnold F Tomaschek K Sering F Lopez and R H BaayenldquoWords from spontaneous conversational speech can be rec-ognized with human-like accuracy by an error-driven learningalgorithm that discriminates between meanings straight fromsmart acoustic features bypassing the phoneme as recognitionunitrdquo PLoS ONE vol 12 no 4 2017

[19] P Zwitserlood ldquoProcessing and representation of morphologi-cal complexity in native language comprehension and produc-tionrdquo in The Construction of Words G E Booij Ed pp 583ndash602 Springer 2018

[20] G Booij ldquoConstruction morphologyrdquo Language and LinguisticsCompass vol 4 no 7 pp 543ndash555 2010

[21] J P Blevins ldquoWord-based morphologyrdquo Journal of Linguisticsvol 42 no 3 pp 531ndash573 2006

[22] G Stump Inflectional Morphology A Theory of ParadigmStructure Cambridge University Press 2001

[23] R Beard ldquoOn the extent and nature of irregularity in thelexiconrdquo Lingua vol 42 no 4 pp 305ndash341 1977

[24] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1974

[25] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1991

[26] C Hockett ldquoThe origin of speechrdquo Scientific American vol 203pp 89ndash97 1960

[27] M Erelt Estonian Language M Erelt Ed Estonian AcademyPublishers Tallinn Estonia 2003

[28] P H Matthews Grammatical Theory in the United States fromBloomfield to Chomsky Cambridge University Press Cam-bridge 1993

[29] N Chomsky andMHalleTheSound Pattern of English Harperand Row NY USA 1968

[30] T Landauer and S Dumais ldquoA solution to Platorsquos problem Thelatent semantic analysis theory of acquisition induction andrepresentation of knowledgerdquo Psychological Review vol 104 no2 pp 211ndash240 1997

[31] W Weaver ldquoTranslationrdquo in Machine Translation of LanguagesFourteen Essays W N Locke and A D Booth Eds pp 15ndash23MIT Press Cambridge 1955

[32] J R Firth Selected Papers of J R Firth 1952-59 IndianaUniversity Press 1968

[33] K Lund and C Burgess ldquoProducing high-dimensional seman-tic spaces from lexical co-occurrencerdquo Behavior Research Meth-ods Instruments and Computers vol 28 no 2 pp 203ndash2081996

[34] C Shaoul and C Westbury ldquoExploring lexical co-occurrencespace using HiDExrdquo Behavior Research Methods vol 42 no 2pp 393ndash413 2010

36 Complexity

[35] T Mikolov I Sutskever K Chen G Corrado and J DeanldquoDistributed representations of words and phrases and theircompositionalityrdquo inAdvances in Neural Information ProcessingSystems pp 3111ndash3119 2013

[36] A Roelofs ldquoThe WEAVER model of word-form encoding inspeech productionrdquo Cognition vol 64 no 3 pp 249ndash284 1997

[37] W J M Levelt A Roelofs and A S Meyer ldquoA theory of lexicalaccess in speech productionrdquoBehavioral and Brain Sciences vol22 no 1 pp 1ndash38 1999

[38] M Taft ldquoInteractive-activation as a Framework for Under-standing Morphological Processingrdquo Language and CognitiveProcesses vol 9 no 3 pp 271ndash294 1994

[39] R Schreuder and R H Baayen ldquoModeling morphologicalprocessingrdquo inMorphological Aspects of Language Processing LB Feldman Ed pp 131ndash154 Lawrence ErlbaumHillsdale NewJersey USA 1995

[40] B C Love D L Medin and T M Gureckis ldquoSUSTAIN ANetwork Model of Category Learningrdquo Psychological Reviewvol 111 no 2 pp 309ndash332 2004

[41] C J Marsolek ldquoWhat antipriming reveals about primingrdquoTrends in Cognitive Sciences vol 12 no 5 pp 176ndash181 2008

[42] M Ramscar and R Port ldquoCategorization (without categories)rdquoin Handbook of Cognitive Linguistics E Dabrowska and DDivjak Eds pp 75ndash99 De Gruyter Berlin Germany 2015

[43] R A Rescorla and A R Wagner ldquoA theory of Pavlovianconditioning Variations in the effectiveness of reinforcementand nonreinforcementrdquo in Classical Conditioning II CurrentResearch and Theory A H Black and W F Prokasy Eds pp64ndash99 Appleton Century Crofts NY USA 1972

[44] B Widrow and M E Hoff ldquoAdaptive switching circuitsrdquo 1960WESCON Convention Record Part IV pp 96ndash104 1960

[45] R H Baayen R Schreuder and R Sproat ldquoMorphology inthe Mental Lexicon A Computational Model for Visual WordRecognitionrdquo in Lexicon Development for Speech and LanguageProcessing F van Eynde and D Gibbon Eds pp 267ndash291Kluwer Academic Publishers 2000

[46] M Taft ldquoA morphological-decomposition model of lexicalrepresentationrdquo Linguistics vol 26 no 4 pp 657ndash667 1988

[47] MWHarmandM S Seidenberg ldquoComputing themeanings ofwords in reading Cooperative division of labor between visualand phonological processesrdquo Psychological Review vol 111 no3 pp 662ndash720 2004

[48] P C Trimmer J M McNamara A Houston and J A RMarshall ldquoDoes natural selection favour the Rescorla-Wagnerrulerdquo Journal of Theoretical Biology vol 302 pp 39ndash52 2012

[49] R R Miller R C Barnet and N J Grahame ldquoAssessment ofthe Rescorla-Wagner modelrdquo Psychological Bulletin vol 117 no3 pp 363ndash386 1995

[50] L J Kamin ldquoPredictability surprise attention and condition-ingrdquo in Punishment and Aversive Behavior B A Campbell andRMChurch Eds pp 276ndash296Appleton-Century-CroftsNYUSA 1969

[51] R A Rescorla ldquoPavlovian conditioning Itrsquos not what you thinkit isrdquo American Psychologist vol 43 no 3 pp 151ndash160 1988

[52] M Ramscar D Yarlett M Dye K Denny and K ThorpeldquoThe Effects of Feature-Label-Order and Their Implications forSymbolic Learningrdquo Cognitive Science vol 34 no 6 pp 909ndash957 2010

[53] M Ramscar and D Yarlett ldquoLinguistic self-correction in theabsence of feedback A new approach to the logical problem oflanguage acquisitionrdquo Cognitive Science vol 31 no 6 pp 927ndash960 2007

[54] R H Baayen P Milin D F ETHurđevic P Hendrix and MMarelli ldquoAn Amorphous Model for Morphological Process-ing in Visual Comprehension Based on Naive DiscriminativeLearningrdquo Psychological Review vol 118 no 3 pp 438ndash4812011

[55] F M Del Prado Martın R Bertram T Haikio R Schreuderand R H Baayen ldquoMorphological family size in a morphologi-cally rich languageThe case of Finnish compared to Dutch andHebrewrdquo Journal of Experimental Psychology LearningMemoryand Cognition vol 30 pp 1271ndash1278 2004

[56] P Milin D F Durdevic and F M Del Prado Martın ldquoThesimultaneous effects of inflectional paradigms and classes onlexical recognition Evidence from Serbianrdquo Journal of Memoryand Language vol 60 no 1 pp 50ndash64 2009

[57] B K Bergen ldquoThe psychological reality of phonaesthemesrdquoLanguage vol 80 no 2 pp 290ndash311 2004

[58] P Milin L B Feldman M Ramscar P Hendrix and R HBaayen ldquoDiscrimination in lexical decisionrdquo PLoS ONE vol 12no 2 2017

[59] R H Baayen P Milin and M Ramscar ldquoFrequency in lexicalprocessingrdquo Aphasiology vol 30 no 11 pp 1174ndash1220 2016

[60] M Marelli and M Baroni ldquoAffixation in semantic space Mod-eling morpheme meanings with compositional distributionalsemanticsrdquo Psychological Review vol 122 no 3 pp 485ndash5152015

[61] T Sering PMilin andRH Baayen ldquoLanguage comprehensionas amultiple label classification problemrdquo StatisticaNeerlandicapp 1ndash15 2018

[62] R Kaye and RWilson Linear Algebra Oxford University Press1998

[63] S H Ivens and B L Koslin Demands for Reading LiteracyRequire New Accountability Methods Touchstone Applied Sci-ence Associates 1991

[64] T K Landauer P W Foltz and D Laham ldquoAn introduction tolatent semantic analysisrdquoDiscourse Processes vol 25 no 2-3 pp259ndash284 1998

[65] H Schmid ldquoImprovements in Part-of-Speech Tagging with anApplication to Germanrdquo in Proceedings of the ACL SIGDAT-Workshop pp 13ndash25 Dublin Ireland 1995

[66] R H Baayen R Piepenbrock and L Gulikers The CELEXLexical Database (CD-ROM) Linguistic Data ConsortiumUniversity of Pennsylvania Philadelphia PA 1995

[67] J Mitchell and M Lapata ldquoVector-based models of semanticcompositionrdquo in Proceedings of the ACL pp 236ndash244 2008

[68] A Lazaridou M Marelli R Zamparelli and M BaronildquoCompositionally derived representations of morphologicallycomplex words in distributional semanticsrdquo in Proceedings ofthe ACL (1) pp 1517ndash1526 2013

[69] R Cotterell H Schutze and J Eisner ldquoMorphological smooth-ing and extrapolation of word embeddingsrdquo in Proceedings ofthe 54th Annual Meeting of the Association for ComputationalLinguistics vol 1 pp 1651ndash1660 2016

[70] B D Zeller S Pado and J Snajder ldquoTowards semantic valida-tion of a derivational lexiconrdquo in Proceedings of the COLING2014 the 25th International Conference on Computational Lin-guistics Technical Papers pp 1728ndash1739 2014

[71] H Marchand The Categories and Types of Present-Day EnglishWord Formation A Synchronic-Diachronic Approach BeckrsquoscheVerlagsbuchhandlung Munchen Germany 1969

[72] L Bauer EnglishWord-Formation Cambridge University PressCambridge 1983

Complexity 37

[73] I PlagWord-Formation in English CambridgeUniversity PressCambridge 2003

[74] R Beard Lexeme-Morpheme Base Morphology A GeneralTheory of Inflection and Word Formation State University ofNew York Press Albany NY USA 1995

[75] K Geeraert J Newman and R H Baayen ldquoIdiom VariationExperimental Data and a Blueprint of a Computational ModelrdquoTopics in Cognitive Science vol 9 no 3 pp 653ndash669 2017

[76] C Shaoul S Bitschau N Schilling et al ldquondl2 Naive discrim-inative learning an implementation in R R packagerdquo 2015

[77] P Milin D Divjak and R H Baayen ldquoA learning perspectiveon individual differences in skilled reading Exploring andexploiting orthographic and semantic discrimination cuesrdquoJournal of Experimental Psychology Learning Memory andCognition vol 43 no 11 pp 1730ndash1751 2017

[78] M Ramscar P Hendrix C Shaoul P Milin and R H BaayenldquoNonlinear dynamics of lifelong learning the myth of cognitivedeclinerdquo Topics in Cognitive Science vol 6 pp 5ndash42 2014

[79] M Ramscar C C Sun P Hendrix and R H Baayen ldquoTheMis-measurement of Mind Life-Span Changes in Paired-Associate-Learning Scores Reflect the ldquoCostrdquo of Learning Not CognitiveDeclinerdquo Psychological Science vol 28 no 8 pp 1171ndash1179 2017

[80] G desRosiers and D Ivison ldquoPaired associate learning Form 1and form 2 of the Wechsler memory scalerdquo Archives of ClinicalNeuropsychology vol 3 no 1 pp 47ndash67 1988

[81] E Bruni N K Tran andM Baroni ldquoMultimodal distributionalsemanticsrdquo Journal of Artificial Intelligence Research vol 49 pp1ndash47 2014

[82] W N Venables and B D RipleyModern Applied Statistics withS-PLUS Springer NY USA 2002

[83] A B Warriner V Kuperman and M Brysbaert ldquoNorms ofvalence arousal and dominance for 13915 English lemmasrdquoBehavior Research Methods vol 45 no 4 pp 1191ndash1207 2013

[84] D Kastovsky ldquoProductivity in word formationrdquo Linguistics vol24 no 3 pp 585ndash600 1986

[85] R H Baayen Y Chuang and J P Blevins ldquoInflectionalmorphology with linear mappingsrdquoTheMental Lexicon vol 13no 2 pp 232ndash270 2018

[86] Burnage CELEX A Guide for Users Centre for Lexical Infor-mation Nijmegen The Netherlands 1988

[87] C McBride-Chang ldquoModels of Speech Perception and Phono-logical Processing in Readingrdquo Child Development vol 67 no4 pp 1836ndash1856 1996

[88] G C Van Orden H Kloos et al ldquoThe question of phonologyand readingrdquo inThe Science of Reading A Handbook pp 61ndash782005

[89] D Jared and K OrsquoDonnell ldquoSkilled adult readers activate themeanings of high-frequency words using phonology Evidencefrom eye trackingrdquoMemoryampCognition vol 45 no 2 pp 334ndash346 2017

[90] G S Dell ldquoA Spreading-Activation Theory of Retrieval inSentence Productionrdquo Psychological Review vol 93 no 3 pp283ndash321 1986

[91] A Marantz ldquoNo escape from morphemes in morphologicalprocessingrdquo Language and Cognitive Processes vol 28 no 7 pp905ndash916 2013

[92] M Bozic W D Marslen-Wilson E A Stamatakis M HDavis and L K Tyler ldquoDifferentiating morphology formand meaning Neural correlates of morphological complexityrdquoCognitive Neuroscience vol 19 no 9 pp 1464ndash1475 2007

[93] R F Port andA P Leary ldquoAgainst formal phonologyrdquoLanguagevol 81 no 4 pp 927ndash964 2005

[94] J P Blevins ldquoStems and paradigmsrdquo Language vol 79 no 4 pp737ndash767 2003

[95] R Forsyth and D Holmes ldquoFeature-finding for test classifica-tionrdquo Literary and Linguistic Computing vol 11 no 4 pp 163ndash174 1996

[96] H Pham and R H Baayen ldquoVietnamese compounds showan anti-frequency effect in visual lexical decisionrdquo LanguageCognition and Neuroscience vol 30 no 9 pp 1077ndash1095 2015

[97] L Cohen and S Dehaene ldquoVentral and dorsal contributions toword readingrdquo inTheCognitive Neurosciences M S GazzanigaEd pp 789ndash804 Massachusetts Institute of Technology 2009

[98] E Keuleers P Lacey K Rastle and M Brysbaert ldquoThe BritishLexicon Project Lexical decision data for 28730 monosyllabicand disyllabic English wordsrdquo Behavior Research Methods vol44 no 1 pp 287ndash304 2012

[99] R K Wagner J K Torgesen and C A Rashotte ldquoDevel-opment of reading-related phonological processing abilitiesnew evidence of bidirectional causality from a latent variablelongitudinal studyrdquoDevelopmental Psychology vol 30 no 1 pp73ndash87 1994

[100] K F EWong andH-C Chen ldquoOrthographic and phonologicalprocessing in reading Chinese text Evidence from eye fixa-tionsrdquo Language and Cognitive Processes vol 14 no 5-6 pp461ndash480 1999

[101] R L Newman D Jared and C A Haigh ldquoDoes phonologyplay a role when skilled readers read high-frequency wordsEvidence from ERPsrdquo Language and Cognitive Processes vol 27no 9 pp 1361ndash1384 2012

[102] D Jared J Ashby S J Agauas and B A Levy ldquoPhonologicalactivation of word meanings in grade 5 readersrdquo Journal ofExperimental Psychology LearningMemory andCognition vol42 no 4 pp 524ndash541 2016

[103] T Bitan A Kaftory A Meiri-Leib Z Eviatar and O PelegldquoPhonological ambiguity modulates resolution of semanticambiguity during reading An fMRI study of Hebrewrdquo Neu-ropsychology vol 31 no 7 pp 759ndash777 2017

[104] D Jared and S Bainbridge ldquoReading homophone puns Evi-dence from eye trackingrdquo Canadian Journal of ExperimentalPsychology vol 71 no 1 pp 2ndash13 2017

[105] S Amenta M Marelli and S Sulpizio ldquoFrom sound tomeaning Phonology-to-Semantics mapping in visual wordrecognitionrdquo Psychonomic Bulletin amp Review vol 24 no 3 pp887ndash893 2017

[106] M Perrone-Bertolotti J Kujala J R Vidal et al ldquoHow silent issilent reading intracerebral evidence for top-down activationof temporal voice areas during readingrdquo The Journal of Neuro-science vol 32 no 49 pp 17554ndash17562 2012

[107] B Yao P Belin andC Scheepers ldquoSilent reading of direct versusindirect speech activates voice-selective areas in the auditorycortexrdquo Journal of Cognitive Neuroscience vol 23 no 10 pp3146ndash3152 2011

[108] M Coltheart B Curtis P Atkins and M Haller ldquoMod-els of Reading Aloud Dual-Route and Parallel-Distributed-Processing Approachesrdquo Psychological Review vol 100 no 4pp 589ndash608 1993

[109] M Coltheart ldquoModeling reading The dual-route approachrdquo inThe Science of Reading A Handbook pp 6ndash23 2005

38 Complexity

[110] M Pitt K Johnson E Hume S Kiesling and W RaymondldquoThe Buckeye corpus of conversational speech Labeling con-ventions and a test of transcriber reliabilityrdquo Speech Communi-cation vol 45 no 1 pp 89ndash95 2005

[111] K Johnson ldquoMassive reduction in conversational AmericanEnglishrdquo in Proceedings of the Spontaneous Speech Data andAnalysis Proceedings of the 1st Session of the 10th InternationalSymposium pp 29ndash54 The National International Institute forJapanese Language Tokyo Japan 2004

[112] C Phillips ldquoLevels of representation in the electrophysiology ofspeech perceptionrdquo Cognitive Science vol 25 no 5 pp 711ndash7312001

[113] R L Diehl A J Lotto and L L Holt ldquoSpeech perceptionrdquoAnnual Review of Psychology vol 55 pp 149ndash179 2004

[114] D Norris and J M McQueen ldquoShortlist B a Bayesian model ofcontinuous speech recognitionrdquo Psychological Review vol 115no 2 pp 357ndash395 2008

[115] S Hawkins ldquoRoles and representations of systematic finephonetic detail in speech understandingrdquo Journal of Phoneticsvol 31 no 3-4 pp 373ndash405 2003

[116] CCucchiarini andH Strik ldquoAutomatic Phonetic TranscriptionAn overviewrdquo in Proceedings of the 15th ICPhS pp 347ndash350Barcelona Spain 2003

[117] K JohnsonTheAuditoryPerceptual Basis for Speech Segmenta-tion Ohio State University Working Papers in Linguistics 1997

[118] H Fletcher ldquoAuditory patternsrdquo Reviews of Modern Physics vol12 no 1 pp 47ndash65 1940

[119] D Arnold ldquoAcousticndlcoder Coding sound files for use withndl R package version 101rdquo 2017

[120] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2016

[121] J Pickett and I Pollack ldquoIntelligibility of excerpts from fluentspeech Effects of rate of utterance and duration of excerptrdquoLanguage and Speech vol 6 pp 151ndash164 1963

[122] L Shockey ldquoPerception of reduced forms by non-native speak-ers of Englishrdquo in Sound Patterns of Spontaneous Speech DDuez Ed pp 97ndash100 ESCA Aix 1998

[123] M Ernestus H Baayen and R Schreuder ldquoThe recognition ofreduced word formsrdquo Brain and Language vol 81 no 1-3 pp162ndash173 2002

[124] R H Baayen C Shaoul J Willits and M Ramscar ldquoCom-prehension without segmentation a proof of concept withnaive discriminative learningrdquo Language Cognition and Neu-roscience vol 31 no 1 pp 106ndash128 2016

[125] C P Browman and L Goldstein ldquoArticulatory phonology Anoverviewrdquo Phonetica vol 49 no 3-4 pp 155ndash180 1992

[126] J A Tourville and F H Guenther ldquoThe DIVA model A neuraltheory of speech acquisition and productionrdquo Language andCognitive Processes vol 26 no 7 pp 952ndash981 2011

[127] G Csardi and T Nepusz ldquoThe igraph software package forcomplex network researchrdquo InterJournal Complex Systems vol1695 no 5 pp 1ndash9 2006

[128] M Ernestus and R H Baayen ldquoPredicting the unpredictableInterpreting neutralized segments in Dutchrdquo Language vol 79no 1 pp 5ndash38 2003

[129] R Weingarten G Nottbusch and U Will ldquoMorphemes sylla-bles and graphemes in written word productionrdquo inMultidisci-plinary Approaches to Speech Production T Pechmann and CHabel Eds pp 529ndash572 Mouton de Gruyter Berlin Germany2004

[130] R Bertram F E Toslashnnessen S Stromqvist J Hyona andP Niemi ldquoCascaded processing in written compound wordproductionrdquo Frontiers in Human Neuroscience vol 9 pp 1ndash102015

[131] T Cho ldquoEffects of morpheme boundaries on intergesturaltiming Evidence from Koreanrdquo Phonetica vol 58 no 3 pp129ndash162 2001

[132] S N Wood Generalized Additive Models Chapman ampHallCRC NY USA 2017

[133] M Seidenberg ldquoSublexical structures in visual word recogni-tion Access units or orthographic redundancyrdquo in Attentionand Performance XII M Coltheart Ed pp 245ndash264 LawrenceErlbaum Associates Hove 1987

[134] J M McQueen ldquoSegmentation of continuous speech usingphonotacticsrdquo Journal of Memory and Language vol 39 no 1pp 21ndash46 1998

[135] J B Hay Causes and Consequences of Word Structure Rout-ledge NY USA London UK 2003

[136] J B Hay ldquoFrom speech perception to morphology Affix-ordering revisitedrdquo Language vol 78 pp 527ndash555 2002

[137] J B Hay and R H Baayen ldquoPhonotactics parsing and produc-tivityrdquo Italian Journal of Linguistics vol 15 no 1 pp 99ndash1302003

[138] J Hay J Pierrehumbert and M Beckman ldquoSpeech perceptionwell-formedness and the statistics of the lexiconrdquo Papers inlaboratory phonology VI pp 58ndash74 2004

[139] M Ferro C Marzi and V Pirrelli ldquoA self-organizing modelof word storage and processing implications for morphologylearningrdquo Lingue e Linguaggio vol 10 no 2 pp 209ndash226 2011

[140] F ChersiM Ferro G Pezzulo andV Pirrelli ldquoTopological self-organization and prediction learning support both action andlexical chains in the brainrdquo Topics in Cognitive Science vol 6no 3 pp 476ndash491 2014

[141] V Pirrelli M Ferro and C Marzi ldquoComputational complexityof abstractive morphologyrdquo in Understanding and MeasuringMorphological Complexity M Bearman D Brown and G GCorbett Eds pp 141ndash166 Oxford University Press 2015

[142] C Marzi M Ferro and V Pirrelli ldquoIs inflectional irregularitydysfunctional to human processingrdquo in Abstract Booklet TheMental Lexicon 2018 V Kuperman Ed University of AlbertaEdmonton 2018

[143] E Keuleers M Stevens P Mandera and M Brysbaert ldquoWordknowledge in the crowd Measuring vocabulary size and wordprevalence in a massive online experimentrdquo The QuarterlyJournal of Experimental Psychology vol 68 no 8 pp 1665ndash16922015

[144] M Ernestus Voice Assimilation And Segment Reduction inCasual Dutch A Corpus-Based Study of the Phonology-PhoneticsInterface LOT Utrecht the Netherlands 2000

[145] R Kemps M Ernestus R Schreuder and H Baayen ldquoProcess-ing reducedword formsThe suffix restoration effectrdquoBrain andLanguage vol 90 no 1-3 pp 117ndash127 2004

[146] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) vol 1 pp 886ndash893 June 2005

[147] G Hickok ldquoThe architecture of speech production and the roleof the phoneme in speech processingrdquo Language Cognition andNeuroscience vol 29 no 1 pp 2ndash20 2014

[148] B V Tucker M Sims and R H Baayen Opposing Forceson Acoustic Duration Manuscript University of Alberta andUniversity of Tubingen 2018

Complexity 39

[149] I Plag J Homann and G Kunter ldquoHomophony and mor-phology The acoustics of word-final S in Englishrdquo Journal ofLinguistics vol 53 no 1 pp 181ndash216 2017

[150] F Tomaschek I Plag M Ernestus and R H BaayenModeling the Duration of Word-Final s in English withNaive Discriminative Learning Manuscript University ofSiegenTubingenNijmegen 2018

[151] R E Kalman ldquoA new approach to linear filtering and predictionproblemsrdquo Journal of Fluids Engineering vol 82 no 1 pp 35ndash451960

[152] B Russell ldquoOn denotingrdquo Mind vol 14 no 56 pp 479ndash4931905

[153] N Hornstein Logical Form FromGB toMinimalism Blackwell1995

[154] R Jackendoff Semantic StructuresMITPress Cambridge 1990[155] J J McCarthy ldquoA prosodic theory of non-concatenative mor-

phologyrdquo Linguistic Inquiry vol 12 pp 373ndash418 1981[156] A Ussishkin ldquoA Fixed Prosodic Theory of Nonconcatenative

Templaticmorphologyrdquo Natural Language amp Linguistic Theoryvol 23 no 1 pp 169ndash218 2005

[157] A Ussishkin ldquoAffix-favored contrast inequity and psycholin-guistic grounding for non-concatenative morphologyrdquo Mor-phology vol 16 no 1 pp 107ndash125 2006

[158] RW Young andWMorganTheNavajo Language A Grammarand Colloquial Dictionary University of New Mexico Press1980

[159] S C Levinson and A Majid ldquoThe island of time Yelı Dnye thelanguage of Rossel islandrdquo Frontiers in psychology vol 4 2013

[160] S C Levinson ldquoThe language of space in Yelı Dnyerdquo inGrammars of Space Explorations in Cognitive Diversity pp 157ndash203 Cambridge University Press 2006

[161] E Shafaei-Bajestan and R H Baayen ldquoWide Learning forAuditory Comprehensionrdquo in Proceedings of the Interspeech2018 2018

[162] J Hay and K Drager ldquoStuffed toys and speech perceptionrdquoLinguistics vol 48 no 4 pp 865ndash892 2010

[163] V Henri and C Henri ldquoOn our earliest recollections ofchildhoodrdquo Psychological Review vol 2 pp 215-216 1895

[164] S Freud ldquoChildhood and concealing memoriesrdquo in The BasicWritings of Sigmund Freud The Modern Library NY USA19051953

[165] P J Bauer and M Larkina ldquoThe onset of childhood amnesiain childhood A prospective investigation of the course anddeterminants of forgetting of early-life eventsrdquoMemory vol 22no 8 pp 907ndash924 2014

[166] M S Seidenberg and L M Gonnerman ldquoExplaining deriva-tional morphology as the convergence of codesrdquo Trends inCognitive Sciences vol 4 no 9 pp 353ndash361 2000

[167] M Coltheart K Rastle C Perry R Langdon and J ZieglerldquoDRC a dual route cascaded model of visual word recognitionand reading aloudrdquoPsychological Review vol 108 no 1 pp 204ndash256 2001

[168] M V Butz and E F Kutter How the Mind Comes into BeingIntroducing Cognitive Science from a Functional and Computa-tional Perspective Oxford University Press 2016

[169] M W Harm and M S Seidenberg ldquoPhonology readingacquisition and dyslexia Insights from connectionist modelsrdquoPsychological Review vol 106 no 3 pp 491ndash528 1999

[170] L M Gonnerman M S Seidenberg and E S AndersenldquoGraded semantic and phonological similarity effects in prim-ing Evidence for a distributed connectionist approach to

morphologyrdquo Journal of Experimental Psychology General vol136 no 2 pp 323ndash345 2007

[171] H Zhao O Gallo I Frosio and J Kautz ldquoLoss Functions forImage Restoration With Neural Networksrdquo IEEE Transactionson Computational Imaging vol 3 no 1 pp 47ndash57 2017

[172] C Clarke and P Luce ldquoPerceptual adaptation to speakercharacteristics Vot boundaries in stop voicing categorizationrdquoin Proceedings of the ISCA Workshop on Plasticity in SpeechPerception 2005

[173] DNorris JMMcQueen andACutler ldquoPerceptual learning inspeechrdquo Cognitive Psychology vol 47 no 2 pp 204ndash238 2003

[174] A Schweitzer and N Lewandowski ldquoSocial factors in conver-gence of f1 and f2 in spontaneous speechrdquo in Proceedings ofthe 10th International Seminar on Speech Production CologneGermany 2014

[175] M J Pickering and S Garrod ldquoToward a mechanistic psychol-ogy of dialoguerdquo Behavioral and Brain Sciences vol 27 no 2 pp169ndash190 2004

[176] C M Clarke and M F Garrett ldquoRapid adaptation to foreign-accented Englishrdquo The Journal of the Acoustical Society ofAmerica vol 116 no 6 pp 3647ndash3658 2004

[177] W J Levelt ldquoDie konnektionistische moderdquo Sprache UndKognition vol 10 no 2 pp 61ndash72 1991

[178] A Kostic ldquoInformational load constraints on processinginflected morphologyrdquo in Morphological Aspects of LanguageProcessing L B Feldman Ed Lawrence Erlbaum Inc Publish-ers New Jersey USA 1995

[179] J P BlevinsWord andParadigmMorphology OxfordUniversityPress 2016

[180] E Forstemann Uber Deutsche volksetymologie Zeitschrift furvergleichende Sprachforschung auf dem Gebiete des DeutschenGriechischen und Lateinischen 1852

[181] K Nault Morphological Therapy Protocal [Ph D thesis] Uni-versity of Alberta Edmonton 2010

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 12: The Discriminative Lexicon: A Unified Computational Model ...downloads.hindawi.com/journals/complexity/2019/4895891.pdf · as developed within computational linguistics and com-puter

12 Complexity

consisting of a base and a novel derived form (eg accentaccentable) reported byMarelli and Baroni [60] and availableat httpcliccimecunitnitcomposesFRACSSThe number of pairs available to us for analysis 236 wasrestricted compared to the original 2559 pairs due to theconstraints that the base words had to occur in the tasacorpus Furthermore we also explored the role of wordsrsquoemotional valence arousal and dominance as availablein Warriner et al [83] which restricted the number ofitems even further The reason for doing so is that humanjudgements such as those of age of acquisition may reflectdimensions of emotion [59]

A measure derived from the semantic vectors of thederivational lexomes activation diversity and the L1-normof the semantic vector (the sum of the absolute values of thevector elements ie its city-block distance) turned out tobe predictive for these plausibility ratings as documented inTable 3 and the left panel of Figure 4 Activation diversity is ameasure of lexicality [58] In an auditory word identificationtask for instance speech that gives rise to a low activationdiversity elicits fast rejections whereas speech that generateshigh activation diversity elicits higher acceptance rates but atthe cost of longer response times [18]

Activation diversity interacted with word length Agreater word length had a strong positive effect on ratedplausibility but this effect progressively weakened as acti-vation diversity increases In turn activation diversity hada positive effect on plausibility for shorter words and anegative effect for longer words Apparently the evidencefor lexicality that comes with higher L1-norms contributespositively when there is little evidence coming from wordlength As word length increases and the relative contributionof the formative in theword decreases the greater uncertaintythat comes with higher lexicality (a greater L1-norm impliesmore strong links to many other lexomes) has a detrimentaleffect on the ratings Higher arousal scores also contributedto higher perceived plausibility Valence and dominance werenot predictive and were therefore left out of the specificationof themodel reported here (word frequency was not includedas predictor because all derived words are neologisms withzero frequency addition of base frequency did not improvemodel fit 119901 gt 091)

The degree to which a complex word is semanticallytransparent with respect to its base word is of both practi-cal and theoretical interest An evaluation of the semantictransparency of complex words using transformation matri-ces is developed in Marelli and Baroni [60] Within thepresent framework semantic transparency can be examinedstraightforwardly by comparing the correlations of (i) thesemantic vectors of the base word to which the semanticvector of the affix is added with (ii) the semantic vector ofthe derived word itself The more distant the two vectors arethe more negative their correlation should be This is exactlywhat we find For ness for instance the six most negativecorrelations are business (119903 = minus066) effectiveness (119903 = minus051)awareness (119903 = minus050) loneliness (119903 = minus045) sickness (119903 =minus044) and consciousness (119903 = minus043) Although ness canhave an anaphoric function in discourse [84] words such asbusiness and consciousness have a much deeper and richer

semantics than just reference to a previously mentioned stateof affairs A simple comparison of the wordrsquos actual semanticvector in S (derived from the TASA corpus) and its semanticvector obtained by accumulating evidence over base and affix(ie summing the vectors of base and affix) brings this outstraightforwardly

However evaluating correlations for transparency isprone to researcher bias We therefore also investigated towhat extent the semantic transparency ratings collected byLazaridou et al [68] can be predicted from the activationdiversity of the semantic vector of the derivational lexomeThe original dataset of Lazaridou and colleagues comprises900 words of which 330 meet the criterion of occurringmore than 8 times in the tasa corpus and for which wehave semantic vectors available The summary of a Gaussianlocation-scale additivemodel is given in Table 4 and the rightpanel of Figure 4 visualizes the interaction of word lengthby activation diversity Although modulated by word lengthoverall transparency ratings decrease as activation diversityis increased Apparently the stronger the connections aderivational lexome has with other lexomes the less clearit becomes what its actual semantic contribution to a novelderived word is In other words under semantic uncertaintytransparency ratings decrease

4 Comprehension

Now that we have established that the present semanticvectors make sense even though they are based on a smallcorpus we next consider a comprehension model that hasform vectors as input and semantic vectors as output (a pack-age for R implementing the comprehension and productionalgorithms of linear discriminative learning is available athttpwwwsfsuni-tuebingendesimhbaayenpublicationsWpmWithLdl 10targz The package isdescribed in Baayen et al [85]) We begin with introducingthe central concepts underlying mappings from form tomeaning We then discuss visual comprehension and thenturn to auditory comprehension

41 Setting Up the Mapping Let C denote the cue matrix amatrix that specifies for each word (rows) the form cues ofthat word (columns) For a toy lexicon with the words onetwo and three the Cmatrix is

C

= wV wVn Vn tu tu Tr Tri rionetwothree

( 100100

100010

010001

001001 ) (12)

where we use the disc keyboard phonetic alphabet (theldquoDIstinct SingleCharacterrdquo representationwith one characterfor each phoneme was introduced by CELEX [86]) fortriphones the symbol denotes the word boundary Supposethat the semantic vectors for these words are the row vectors

Complexity 13

26 27

28

29

29

3 3

31

31

32

32

33

33

34

34

35

35

3

3

35

4

45 36

38

40

42

44

Activ

atio

n D

iver

sity

of th

e Der

ivat

iona

l Lex

ome

8 10 12 14 16 186Word Length

36

38

40

42

44

Activ

atio

n D

iver

sity

of th

e Der

ivat

iona

l Lex

ome

6 8 10 12 144Word Length

Figure 4 Interaction of word length by activation diversity in GAMs fitted to plausibility ratings for complex words (left) and to semantictransparency ratings (right)

Table 3 GAM fitted to plausibility ratings for derivational neologisms with the derivational lexomes able again agent ist less and notdata from Marelli and Baroni [60]

A parametric coefficients Estimate Std Error t-value p-valueIntercept -18072 27560 -06557 05127Word Length 05901 02113 27931 00057Activation Diversity 11221 06747 16632 00976Arousal 03847 01521 25295 00121Word Length times Activation Diversity -01318 00500 -26369 00089B smooth terms edf Refdf F-value p-valueRandom Intercepts Affix 33610 40000 556723 lt 00001

of the following matrix S

S = one two threeonetwothree

( 100201031001

040110 ) (13)

We are interested in a transformation matrix 119865 such that

CF = S (14)

The transformation matrix is straightforward to obtain LetC1015840 denote the Moore-Penrose generalized inverse of Cavailable in R as the ginv function in theMASS package [82]Then

F = C1015840S (15)

For the present example

F =one two three

wVwVnVntutuTrTriri

((((((((((((

033033033010010003003003

010010010050050003003003

013013013005005033033033

))))))))))))

(16)

and for this simple example CF is exactly equal to SIn the remainder of this section we investigate how well

this very simple end-to-end model performs for visual wordrecognition as well as auditory word recognition For visualword recognition we use the semantic matrix S developed inSection 3 but we consider two different cue matrices C oneusing letter trigrams (following [58]) and one using phonetrigramsA comparison of the performance of the twomodels

14 Complexity

Table 4 Gaussian location-scale additive model fitted to the semantic transparency ratings for derived words te tensor product smooth ands thin plate regression spline

A parametric coefficients Estimate Std Error t-value p-valueintercept [location] 55016 02564 214537 lt 00001intercept [scale] -04135 00401 -103060 lt 00001B smooth terms edf Refdf F-value p-valuete(activation diversity word length) [location] 53879 62897 237798 00008s(derivational lexome) [location] 143180 150000 3806254 lt 00001s(activation diversity) [scale] 21282 25993 284336 lt 00001s(word length) [scale] 17722 21403 1327311 lt 00001

sheds light on the role of wordsrsquo ldquosound imagerdquo on wordrecognition in reading (cf [87ndash89] for phonological effectsin visual word Recognition) For auditory word recognitionwe make use of the acoustic features developed in Arnold etal [18]

Although the features that we selected as cues are basedon domain knowledge they do not have an ontological statusin our framework in contrast to units such as phonemes andmorphemes in standard linguistic theories In these theoriesphonemes and morphemes are Russelian atomic units offormal phonological and syntactic calculi and considerableresearch has been directed towards showing that these unitsare psychologically real For instance speech errors involvingmorphemes have been interpreted as clear evidence for theexistence in the mind of morphemes [90] Research hasbeen directed at finding behavioral and neural correlates ofphonemes and morphemes [6 91 92] ignoring fundamentalcriticisms of both the phoneme and the morpheme astheoretical constructs [25 93 94]

The features that we use for representing aspects of formare heuristic features that have been selected or developedprimarily because they work well as discriminators We willgladly exchange the present features for other features if theseother features can be shown to afford higher discriminativeaccuracy Nevertheless the features that we use are groundedin domain knowledge For instance the letter trigrams usedfor modeling reading (see eg[58]) are motivated by thefinding in stylometry that letter trigrams are outstandingfeatures for discriminating between authorial hands [95] (inwhat follows we work with letter triplets and triphoneswhich are basically contextually enriched letter and phoneunits For languages with strong phonotactic restrictionssuch that the syllable inventory is quite small (eg Viet-namese) digraphs work appear to work better than trigraphs[96] Baayen et al [85] show that working with four-gramsmay enhance performance and current work in progress onmany other languages shows that for highly inflecting lan-guages with long words 4-grams may outperform 3-gramsFor computational simplicity we have not experimented withmixtures of units of different length nor with algorithmswithwhich such units might be determined) Letter n-grams arealso posited by Cohen and Dehaene [97] for the visual wordform system at the higher endof a hierarchy of neurons tunedto increasingly large visual features of words

Triphones the units that we use for representing thelsquoacoustic imagersquo or lsquoauditory verbal imageryrsquo of canonicalword forms have the same discriminative potential as lettertriplets but have as additional advantage that they do betterjustice compared to phonemes to phonetic contextual inter-dependencies such as plosives being differentiated primarilyby formant transitions in adjacent vowels The acousticfeatures that we use for modeling auditory comprehensionto be introduced in more detail below are motivated in partby the sensitivity of specific areas on the cochlea to differentfrequencies in the speech signal

Evaluation of model predictions proceeds by comparingthe predicted semantic vector s obtained by multiplying anobserved cue vector c with the transformation matrix F (s =cF) with the corresponding target row vector s of S A word 119894is counted as correctly recognized when s119894 is most stronglycorrelated with the target semantic vector s119894 of all targetvectors s119895 across all words 119895

Inflected words do not have their own semantic vectors inS We therefore created semantic vectors for inflected wordsby adding the semantic vectors of stem and affix and addedthese as additional row vectors to S before calculating thetransformation matrix F

We note here that there is only one transformationmatrix ie one discrimination network that covers allaffixes inflectional and derivational as well as derived andmonolexomic (simple) words This approach contrasts withthat of Marelli and Baroni [60] who pair every affix with itsown transformation matrix

42 Visual Comprehension For visual comprehension wefirst consider a model straightforwardly mapping form vec-tors onto semantic vectors We then expand the model withan indirect route first mapping form vectors onto the vectorsfor the acoustic image (derived from wordsrsquo triphone repre-sentations) and thenmapping the acoustic image vectors ontothe semantic vectors

The dataset on which we evaluated our models com-prised 3987 monolexomic English words 6595 inflectedvariants of monolexomic words and 898 derived words withmonolexomic base words to a total of 11480 words Thesecounts follow from the simultaneous constraints of (i) a wordappearing with sufficient frequency in TASA (ii) the wordbeing available in the British Lexicon Project (BLP [98]) (iii)

Complexity 15

the word being available in the CELEX database and theword being of the abovementioned morphological type Inthe present study we did not include inflected variants ofderived words nor did we include compounds

421 The Direct Route Straight from Orthography to Seman-tics To model single word reading as gauged by the visuallexical decision task we used letter trigrams as cues In ourdataset there were a total of 3465 unique trigrams resultingin a 11480 times 3465 orthographic cue matrix C The semanticvectors of the monolexomic and derived words were takenfrom the semantic weight matrix described in Section 3 Forinflected words semantic vectors were obtained by summa-tion of the semantic vectors of base words and inflectionalfunctions For the resulting semantic matrix S we retainedthe 5030 column vectors with the highest variances settingthe cutoff value for the minimal variance to 034times10minus7 FromS and C we derived the transformation matrix F which weused to obtain estimates s of the semantic vectors s in S

For 59 of the words s had the highest correlation withthe targeted semantic vector s (for the correctly predictedwords the mean correlation was 083 and the median was086 With respect to the incorrectly predicted words themean and median correlation were both 048) The accuracyobtained with naive discriminative learning using orthog-onal lexomes as outcomes instead of semantic vectors was27Thus as expected performance of linear discriminationlearning (ldl) is substantially better than that of naivediscriminative learning (ndl)

To assess whether this model is productive in the sensethat it canmake sense of novel complex words we considereda separate set of inflected and derived words which were notincluded in the original dataset For an unseen complexwordboth base word and the inflectional or derivational functionappeared in the training set but not the complex word itselfThe network F therefore was presented with a novel formvector which it mapped onto a novel vector in the semanticspace Recognition was successful if this novel vector wasmore strongly correlated with the semantic vector obtainedby summing the semantic vectors of base and inflectional orderivational function than with any other semantic vector

Of 553 unseen inflected words 43 were recognized suc-cessfully The semantic vectors predicted from their trigramsby the network F were overall well correlated in the meanwith the targeted semantic vectors of the novel inflectedwords (obtained by summing the semantic vectors of theirbase and inflectional function) 119903 = 067 119901 lt 00001The predicted semantic vectors also correlated well with thesemantic vectors of the inflectional functions (119903 = 061 119901 lt00001) For the base words the mean correlation dropped to119903 = 028 (119901 lt 00001)

For unseen derivedwords (514 in total) we also calculatedthe predicted semantic vectors from their trigram vectorsThe resulting semantic vectors s had moderate positivecorrelations with the semantic vectors of their base words(119903 = 040 119901 lt 00001) but negative correlations withthe semantic vectors of their derivational functions (119903 =minus013 119901 lt 00001) They did not correlate with the semantic

vectors obtained by summation of the semantic vectors ofbase and affix (119903 = 001 119901 = 056)

The reduced correlations for the derived words as com-pared to those for the inflected words likely reflects tosome extent that the model was trained on many moreinflected words (in all 6595) than derived words (898)whereas the number of different inflectional functions (7)was much reduced compared to the number of derivationalfunctions (24) However the negative correlations of thederivational semantic vectors with the semantic vectors oftheir derivational functions fit well with the observation inSection 323 that derivational functions are antiprototypesthat enter into negative correlations with the semantic vectorsof the corresponding derived words We suspect that theabsence of a correlation of the predicted vectors with thesemantic vectors obtained by integrating over the vectorsof base and derivational function is due to the semanticidiosyncracies that are typical for derivedwords For instanceausterity can denote harsh discipline or simplicity or apolicy of deficit cutting or harshness to the taste Anda worker is not just someone who happens to work butsomeone earning wages or a nonreproductive bee or waspor a thread in a computer program Thus even within theusages of one and the same derived word there is a lotof semantic heterogeneity that stands in stark contrast tothe straightforward and uniform interpretation of inflectedwords

422 The Indirect Route from Orthography via Phonology toSemantics Although reading starts with orthographic inputit has been shown that phonology actually plays a role duringthe reading process as well For instance developmentalstudies indicate that childrenrsquos reading development can bepredicted by their phonological abilities [87 99] Evidence ofthe influence of phonology on adultsrsquo reading has also beenreported [89 100ndash105] As pointed out by Perrone-Bertolottiet al [106] silent reading often involves an imagery speechcomponent we hear our own ldquoinner voicerdquo while readingSince written words produce a vivid auditory experiencealmost effortlessly they argue that auditory verbal imageryshould be part of any neurocognitive model of readingInterestingly in a study using intracranial EEG recordingswith four epileptic neurosurgical patients they observed thatsilent reading elicits auditory processing in the absence ofany auditory stimulation suggesting that auditory images arespontaneously evoked during reading (see also [107])

To explore the role of auditory images in silent readingwe operationalized sound images bymeans of phone trigrams(in our model phone trigrams are independently motivatedas intermediate targets of speech production see Section 5for further discussion) We therefore ran the model againwith phone trigrams instead of letter triplets The semanticmatrix S was exactly the same as above However a newtransformation matrix F was obtained for mapping a 11480times 5929 cue matrix C of phone trigrams onto S To our sur-prise the triphone model outperformed the trigram modelsubstantially Overall accuracy rose to 78 an improvementof almost 20

16 Complexity

In order to integrate this finding into our model weimplemented an additional network K that maps ortho-graphic vectors (coding the presence or absence of trigrams)onto phonological vectors (indicating the presence or absenceof triphones) The result is a dual route model for (silent)reading with a first route utilizing a network mappingstraight from orthography to semantics and a secondindirect route utilizing two networks one mapping fromtrigrams to triphones and the other subsequently mappingfrom triphones to semantics

The network K which maps trigram vectors onto tri-phone vectors provides good support for the relevant tri-phones but does not provide guidance as to their orderingUsing a graph-based algorithm detailed in the Appendix (seealso Section 52) we found that for 92 of the words thecorrect sequence of triphones jointly spanning the wordrsquossequence of letters received maximal support

We then constructed amatrix Twith the triphone vectorspredicted from the trigrams by networkK and defined a newnetwork H mapping these predicted triphone vectors ontothe semantic vectors SThemean correlation for the correctlyrecognized words was 090 and the median correlation was094 For words that were not recognized correctly the meanand median correlation were 045 and 043 respectively Wenow have for each word two estimated semantic vectors avector s1 obtained with network F of the first route and avector s2 obtained with network H from the auditory targetsthat themselves were predicted by the orthographic cuesThemean of the correlations of these pairs of vectors was 119903 = 073(119901 lt 00001)

To assess to what extent the networks that we definedare informative for actual visual word recognition we madeuse of the reaction times in the visual lexical decision taskavailable in the BLPAll 11480words in the current dataset arealso included in BLPWe derived threemeasures from each ofthe two networks

The first measure is the sum of the L1-norms of s1 ands2 to which we will refer as a wordrsquos total activation diversityThe total activation diversity is an estimate of the support fora wordrsquos lexicality as provided by the two routes The secondmeasure is the correlation of s1 and s2 to which we will referas a wordrsquos route congruency The third measure is the L1-norm of a lexomersquos column vector in S to which we referas its prior This last measure has previously been observedto be a strong predictor for lexical decision latencies [59] Awordrsquos prior is an estimate of a wordrsquos entrenchment and prioravailability

We fitted a generalized additive model to the inverse-transformed response latencies in the BLP with these threemeasures as key covariates of interest including as controlpredictors word length andword type (derived inflected andmonolexomic with derived as reference level) As shown inTable 5 inflected words were responded to more slowly thanderived words and the same holds for monolexomic wordsalbeit to a lesser extent Response latencies also increasedwith length Total activation diversity revealed a U-shapedeffect (Figure 5 left panel) For all but the lowest totalactivation diversities we find that response times increase as

total activation diversity increases This result is consistentwith previous findings for auditory word recognition [18]As expected given the results of Baayen et al [59] the priorwas a strong predictor with larger priors affording shorterresponse times There was a small effect of route congruencywhich emerged in a nonlinear interaction with the priorsuch that for large priors a greater route congruency affordedfurther reduction in response time (Figure 5 right panel)Apparently when the two routes converge on the samesemantic vector uncertainty is reduced and a faster responsecan be initiated

For comparison the dual-route model was implementedwith ndl as well Three measures were derived from thenetworks and used to predict the same response latenciesThese measures include activation activation diversity andprior all of which have been reported to be reliable predictorsfor RT in visual lexical decision [54 58 59] However sincehere we are dealing with two routes the three measures canbe independently derived from both routes We thereforesummed up the measures of the two routes obtaining threemeasures total activation total activation diversity and totalprior With word type and word length as control factorsTable 6 shows that thesemeasures participated in a three-wayinteraction presented in Figure 6 Total activation showed aU-shaped effect on RT that is increasingly attenuated as totalactivation diversity is increased (left panel) Total activationalso interacted with the total prior (right panel) For mediumrange values of total activation RTs increased with totalactivation and as expected decreased with the total priorThecenter panel shows that RTs decrease with total prior formostof the range of total activation diversity and that the effect oftotal activation diversity changes sign going from low to highvalues of the total prior

Model comparison revealed that the generalized additivemodel with ldl measures (Table 5) provides a substantiallyimproved fit to the data compared to the GAM using ndlmeasures (Table 6) with a difference of no less than 65101AIC units At the same time the GAM based on ldl is lesscomplex

Given the substantially better predictability of ldl mea-sures on human behavioral data one remaining question iswhether it is really the case that two routes are involved insilent reading After all the superior accuracy of the secondroute might be an artefact of a simple machine learningtechnique performing better for triphones than for trigramsThis question can be addressed by examining whether thefit of the GAM summarized in Table 5 improves or worsensdepending on whether the activation diversity of the first orthe second route is taken out of commission

When the GAM is provided access to just the activationdiversity of s2 the AIC increased by 10059 units Howeverwhen themodel is based only on the activation diversity of s1themodel fit increased by no less than 14790 AIC units Fromthis we conclude that at least for the visual lexical decisionlatencies in the BLP the second route first mapping trigramsto triphones and then mapping triphones onto semanticvectors plays the more important role

The superiority of the second route may in part bedue to the number of triphone features being larger

Complexity 17

minus0295

minus00175

026

part

ial e

ffect

1 2 3 4 50Total activation diversity

000

005

010

015

Part

ial e

ffect

05

10

15

20

Prio

r

02 04 06 08 1000Selfminuscorrelation

Figure 5 The partial effects of total activation diversity (left) and the interaction of route congruency and prior (right) on RT in the BritishLexicon Project

Table 5 Summary of a generalized additive model fitted to response latencies in visual lexical decision using measures based on ldl s thinplate regression spline smooth te tensor product smooth

A parametric coefficients Estimate Std Error t-value p-valueintercept -17774 00087 -205414 lt0001word typeInflected 01110 00059 18742 lt0001word typeMonomorphemic 00233 00054 4322 lt0001word length 00117 00011 10981 lt0001B smooth terms edf Refdf F p-values(total activation diversity) 6002 7222 236 lt0001te(route congruency prior) 14673 17850 2137 lt0001

02 06 10 14

NDL total activation

minus0171

0031

0233

part

ial e

ffect

minus4 minus2 0 2 4

10

15

20

25

30

35

40

NDL total prior

ND

L to

tal a

ctiv

atio

n di

vers

ity

minus0416

01915

0799

part

ial e

ffect

02 06 10 14

minus4

minus2

02

4

NDL total activation

ND

L to

tal p

rior

minus0158

01635

0485

part

ial e

ffect

10

15

20

25

30

35

40

ND

L to

tal a

ctiv

atio

n di

vers

ity

Figure 6The interaction of total activation and total activation diversity (left) of total prior by total activation diversity (center) and of totalactivation by total prior (right) on RT in the British Lexicon Project based on ndl

18 Complexity

Table 6 Summary of a generalized additive model fitted to response latencies in visual lexical decision using measures from ndl te tensorproduct smooth

A parametric coefficients Estimate Std Error t-value p-valueintercept -16934 00086 -195933 lt0001word typeInflected 00286 00052 5486 lt0001word typeMonomorphemic 00042 00055 0770 = 4word length 00066 00011 5893 lt0001B smooth terms edf Refdf F p-valuete(activation activation diversity prior) 4134 4972 1084 lt0001

than the number of trigram features (3465 trigrams ver-sus 5929 triphones) More features which mathematicallyamount to more predictors enable more precise map-pings Furthermore the heavy use made in English ofletter sequences such as ough (with 10 different pro-nunciations httpswwwdictionarycomesough)reduces semantic discriminability compared to the corre-sponding spoken forms It is noteworthy however that thebenefits of the triphone-to-semantics mapping are possibleonly thanks to the high accuracy with which orthographictrigram vectors are mapped onto phonological triphonevectors (92)

It is unlikely that the lsquophonological routersquo is always dom-inant in silent reading Especially in fast lsquodiagonalrsquo readingthe lsquodirect routersquomay bemore dominantThere is remarkablealthough for the present authors unexpected convergencewith the dual route model for reading aloud of Coltheartet al [108] and Coltheart [109] However while the text-to-phonology route of their model has as primary function toexplain why nonwords can be pronounced our results showthat both routes can actually be active when silently readingreal words A fundamental difference is however that in ourmodel wordsrsquo semantics play a central role

43 Auditory Comprehension For the modeling of readingwe made use of letter trigrams as cues These cues abstractaway from the actual visual patterns that fall on the retinapatterns that are already transformed at the retina beforebeing sent to the visual cortex Our hypothesis is that lettertrigrams represent those high-level cells or cell assemblies inthe visual system that are critical for reading andwe thereforeleave the modeling possibly with deep learning networksof how patterns on the retina are transformed into lettertrigrams for further research

As a consequence of the high level of abstraction of thetrigrams a word form is represented by a unique vectorspecifying which of a fixed set of letter trigrams is present inthe word Although one might consider modeling auditorycomprehension with phone triplets (triphones) replacing theletter trigrams of visual comprehension such an approachwould not do justice to the enormous variability of actualspeech Whereas the modeling of reading printed wordscan depart from the assumption that the pixels of a wordrsquosletters on a computer screen are in a fixed configurationindependently of where the word is shown on the screenthe speech signal of the same word type varies from token

to token as illustrated in Figure 7 for the English wordcrisis A survey of the Buckeye corpus [110] of spontaneousconversations recorded at Columbus Ohio [111] indicatesthat around 5 of the words are spoken with one syllablemissing and that a little over 20 of words have at least onephone missing

It is widely believed that the phoneme as an abstractunit of sound is essential for coming to grips with thehuge variability that characterizes the speech signal [112ndash114]However the phoneme as theoretical linguistic constructis deeply problematic [93] and for many spoken formscanonical phonemes do not do justice to the phonetics ofthe actual sounds [115] Furthermore if words are defined assequences of phones the problem arises what representationsto posit for words with two ormore reduced variants Addingentries for reduced forms to the lexicon turns out not toafford better overal recognition [116] Although exemplarmodels have been put forward to overcome this problem[117] we take a different approach here and following Arnoldet al [18] lay out a discriminative approach to auditorycomprehension

The cues that we make use of to represent the acousticsignal are the Frequency Band Summary Features (FBSFs)introduced by Arnold et al [18] as input cues FBSFs summa-rize the information present in the spectrogram of a speechsignal The algorithm that derives FBSFs first chunks theinput at the minima of the Hilbert amplitude envelope of thesignalrsquos oscillogram (see the upper panel of Figure 8) Foreach chunk the algorithm distinguishes 21 frequency bandsin the MEL scaled spectrum and intensities are discretizedinto 5 levels (lower panel in Figure 8) for small intervalsof time For each chunk and for each frequency band inthese chunks a discrete feature is derived that specifies chunknumber frequency band number and a description of thetemporal variation in the band bringing together minimummaximum median initial and final intensity values The21 frequency bands are inspired by the 21 receptive areason the cochlear membrane that are sensitive to differentranges of frequencies [118] Thus a given FBSF is a proxyfor cell assemblies in the auditory cortex that respond to aparticular pattern of changes over time in spectral intensityThe AcousticNDLCodeR package [119] for R [120] wasemployed to extract the FBSFs from the audio files

We tested ldl on 20 hours of speech sampled from theaudio files of the ucla library broadcast newsscapedata a vast repository of multimodal TV news broadcasts

Complexity 19

0 5000 10000 15000 20000 25000 30000 35000 40000minus6000

minus4000

minus2000

0

2000

4000

6000

8000

0 5000 10000 15000 20000 25000 30000 35000 40000minus6000

minus4000

minus2000

0

2000

4000

6000

8000

Figure 7Oscillogram for twodifferent realizations of theword crisiswith different degrees of reduction in theNewsScape archive [kraız](left)and [khraısıs](right)

00 01 02 03 04 05 06time

minus020minus015minus010minus005

000005010015

ampl

itude

1

21

frequ

ency

ban

d

Figure 8 Oscillogram of the speech signal for a realization of the word economic with Hilbert envelope (in orange) outlining the signal isshown on the top panel The lower panel depicts the discretized MEL scaled spectrum of the signalThe vertical bars are the boundary pointsthat partition the signal into 4 chunks For one chunk horizontal lines (in red) highlight one of the frequency bands for which the FBSFsprovide summaries of the variation over time in that frequency band The FBSF for the highlighted band is ldquoband11-start3-median3-min2-max5-end3-part2rdquo

provided to us by the Distributed Little Red Hen Lab Theaudio files of this resource were automatically classified asclean for relatively clean parts where there is speech withoutbackground noise or music and noisy for speech snippetswhere background noise or music is present Here we reportresults for 20 hours of clean speech to a total of 131673word tokens (representing 4779word types) with in all 40639distinct FBSFs

The FBSFs for the word tokens are brought together ina matrix C119886 with dimensions 131673 audio tokens times 40639FBSFs The targeted semantic vectors are taken from the Smatrix which is expanded to a matrix with 131673 rows onefor each audio token and 4609 columns the dimension of

the semantic vectors Although the transformation matrix Fcould be obtained by calculating C1015840S the calculation of C1015840is numerically expensive To reduce computational costs wecalculated F as follows (the transpose of a square matrix Xdenoted by X119879 is obtained by replacing the upper triangleof the matrix by the lower triangle and vice versa Thus( 3 87 2 )119879 = ( 3 78 2 ) For a nonsquare matrix the transpose isobtained by switching rows and columnsThus a 4times2 matrixbecomes a 2 times 4 matrix when transposed)

CF = S

C119879CF = C119879S

20 Complexity

(C119879C)minus1 (C119879C) F = (C119879C)minus1 C119879SF = (C119879C)minus1 C119879S

(17)

In this way matrix inversion is required for a much smallermatrixC119879C which is a square matrix of size 40639 times 40639

To evaluate model performance we compared the esti-mated semantic vectors with the targeted semantic vectorsusing the Pearson correlation We therefore calculated the131673 times 131673 correlation matrix for all possible pairs ofestimated and targeted semantic vectors Precisionwas calcu-lated by comparing predicted vectors with the gold standardprovided by the targeted vectors Recognition was defined tobe successful if the correlation of the predicted vector withthe targeted gold vector was the highest of all the pairwisecorrelations of this predicted vector with any of the other goldsemantic vectors Precision defined as the proportion of cor-rect recognitions divided by the total number of audio tokenswas at 3361 For the correctly identified words the meancorrelation was 072 and for the incorrectly identified wordsit was 055 To place this in perspective a naive discrimi-nation model with discrete lexomes as output performed at12 and a deep convolution network Mozilla DeepSpeech(httpsgithubcommozillaDeepSpeech based onHannun et al [9]) performed at 6 The low performanceofMozilla DeepSpeech is due primarily due to its dependenceon a languagemodelWhenpresentedwith utterances insteadof single words it performs remarkably well

44 Discussion A problem with naive discriminative learn-ing that has been puzzling for a long time is that measuresbased on ndl performed well as predictors of processingtimes [54 58] whereas accuracy for lexical decisions waslow An accuracy of 27 for a model performing an 11480-classification task is perhaps reasonable but the lack ofprecision is unsatisfactory when the goal is to model humanvisual lexicality decisions Bymoving fromndl to ldlmodelaccuracy is substantially improved (to 59) At the sametime predictions for reaction times improved considerablyas well As lexicality decisions do not require word identifica-tion further improvement in predicting decision behavior isexpected to be possible by considering not only whether thepredicted semantic is closest to the targeted vector but alsomeasures such as how densely the space around the predictedsemantic vector is populated

Accuracy for auditory comprehension is lower for thedata we considered above at around 33 Interestingly aseries of studies indicates that recognizing isolated wordstaken out of running speech is a nontrivial task also forhuman listeners [121ndash123] Correct identification by nativespeakers of 1000 randomly sampled word tokens from aGerman corpus of spontaneous conversational speech rangedbetween 21 and 44 [18] For both human listeners andautomatic speech recognition systems recognition improvesconsiderably when words are presented in their naturalcontext Given that ldl with FBSFs performs very wellon isolated word recognition it seems worth investigating

further whether the present approach can be developed into afully-fledged model of auditory comprehension that can takefull utterances as input For a blueprint of how we plan toimplement such a model see Baayen et al [124]

5 Speech Production

This section examines whether we can predict wordsrsquo formsfrom the semantic vectors of S If this is possible for thepresent dataset with reasonable accuracy we have a proofof concept that discriminative morphology is feasible notonly for comprehension but also for speech productionThe first subsection introduces the computational imple-mentation The next subsection reports on the modelrsquosperformance which is evaluated first for monomorphemicwords then for inflected words and finally for derivedwords Section 53 provides further evidence for the pro-duction network by showing that as the support from thesemantics for the triphones becomes weaker the amount oftime required for articulating the corresponding segmentsincreases

51 Computational Implementation For a productionmodelsome representational format is required for the output thatin turn drives articulation In what follows we make useof triphones as output features Triphones capture part ofthe contextual dependencies that characterize speech andthat render problematic the phoneme as elementary unit ofa phonological calculus [93] Triphones are in many waysnot ideal in that they inherit the limitations that come withdiscrete units Other output features structured along thelines of gestural phonology [125] or time series of movementsof key articulators registered with electromagnetic articulog-raphy or ultrasound are on our list for further explorationFor now we use triphones as a convenience construct andwe will show that given the support from the semantics forthe triphones the sequence of phones can be reconstructedwith high accuracy As some models of speech productionassemble articulatory targets from phone segments (eg[126]) the present implementation can be seen as a front-endfor this type of model

Before putting the model to the test we first clarify theway the model works by means of our toy lexicon with thewords one two and three Above we introduced the semanticmatrix S (13) which we repeat here for convenience

S = one two threeonetwothree

( 100201031001

040110 ) (18)

as well as an C indicator matrix specifying which triphonesoccur in which words (12) As in what follows this matrixspecifies the triphones targeted for productionwe henceforthrefer to this matrix as the Tmatrix

Complexity 21

T

= wV wVn Vn tu tu Tr Tri rionetwothree

( 100100

100010

010001

001001 ) (19)

For production our interest is in thematrixG that transformsthe row vectors of S into the row vectors of T ie we need tosolve

SG = T (20)

Given G we can predict for any semantic vector s the vectorof triphones t that quantifies the support for the triphonesprovided by s simply by multiplying s with G

t = sG (21)

As before the transformation matrix G is straightforwardto estimate Let S1015840 denote the Moore-Penrose generalizedinverse of S Since

S1015840SG = S1015840T (22)

we have

G = S1015840T (23)

Given G we can predict the triphone matrix T from thesemantic matrix S

SG = T (24)

For the present example the inverse of S and S1015840 is

S1015840 = one two threeonetwothree

( 110minus021minus009minus029107minus008

minus041minus002104 ) (25)

and the transformation matrix G is

G = wV wVn Vn tu tu Tr Tri ri

onetwothree

( 110minus021minus009110minus021minus009

110minus021minus009minus029107minus008

minus029107minus008minus041minus002104

minus041minus002104minus041minus002104 ) (26)

For this simple example T is virtually identical to T Forrealistic data T will not be identical to T but will be anapproximation of it that is optimal in the least squares senseThe triphones with the strongest support are expected to bethe most likely to be the triphones defining a wordrsquos form

We made use of the S matrix which we derived fromthe tasa corpus as described in Section 3 The majority ofcolumns of the 23561 times 23561 matrix S show very smalldeviations from zero and hence are uninformative As beforewe reduced the number of columns of S by removing columnswith very low variance Here one option is to remove allcolumns with a variance below a preset threshold 120579 Howeverto avoid adding a threshold as a free parameter we setthe number of columns retained to the number of differenttriphones in a given dataset a number which is around 119899 =4500 In summary S denotes a 119908times119899 matrix that specifies foreach of 119908 words an 119899-dimensional semantic vector

52 Model Performance

521 Performance on Monolexomic Words We first exam-ined model performance on a dataset comprising monolex-omicwords that did not carry any inflectional exponentsThisdataset of 3987words comprised three irregular comparatives

(elder less andmore) two irregular superlatives (least most)28 irregular past tense forms and one irregular past participle(smelt) For this set of words we constructed a 3987 times 4446matrix of semantic vectors S119898 (a submatrix of the S matrixintroduced above in Section 3) and a 3987 times 4446 triphonematrix T119898 (a submatrix of T) The number of colums of S119898was set to the number of different triphones (the columndimension of T119898) Those column vectors of S119898 were retainedthat had the 4446 highest variances We then estimated thetransformation matrix G and used this matrix to predict thetriphones that define wordsrsquo forms

We evaluated model performance in two ways First weinspected whether the triphones with maximal support wereindeed the targeted triphones This turned out to be the casefor all words Targeted triphones had an activation valueclose to one and nontargeted triphones an activation closeto zero As the triphones are not ordered we also investi-gated whether the sequence of phones could be constructedcorrectly for these words To this end we set a thresholdof 099 extracted all triphones with an activation exceedingthis threshold and used the all simple paths (a path issimple if the vertices it visits are not visited more than once)function from the igraph package [127] to calculate all pathsstarting with any left-edge triphone in the set of extractedtriphones From the resulting set of paths we selected the

22 Complexity

longest path which invariably was perfectly aligned with thesequence of triphones that defines wordsrsquo forms

We also evaluated model performance with a secondmore general heuristic algorithm that also makes use of thesame algorithm from graph theory Our algorithm sets up agraph with vertices collected from the triphones that are bestsupported by the relevant semantic vectors and considers allpaths it can find that lead from an initial triphone to a finaltriphoneThis algorithm which is presented inmore detail inthe appendix and which is essential for novel complex wordsproduced the correct form for 3982 out of 3987 words Itselected a shorter form for five words int for intent lin forlinnen mis for mistress oint for ointment and pin for pippinThe correct forms were also found but ranked second due toa simple length penalty that is implemented in the algorithm

From these analyses it is clear that mapping nearly4000 semantic vectors on their corresponding triphone pathscan be accomplished with very high accuracy for Englishmonolexomic words The question to be addressed nextis how well this approach works for complex words Wefirst address inflected forms and limit ourselves here to theinflected variants of the present set of 3987 monolexomicwords

522 Performance on Inflected Words Following the classicdistinction between inflection and word formation inflectedwords did not receive semantic vectors of their own Never-theless we can create semantic vectors for inflected words byadding the semantic vector of an inflectional function to thesemantic vector of its base However there are several ways inwhich themapping frommeaning to form for inflectedwordscan be set up To explain this we need some further notation

Let S119898 and T119898 denote the submatrices of S and T thatcontain the semantic and triphone vectors of monolexomicwords Assume that a subset of 119896 of these monolexomicwords is attested in the training corpus with inflectionalfunction 119886 Let T119886 denote the matrix with the triphonevectors of these inflected words and let S119886 denote thecorresponding semantic vectors To obtain S119886 we take thepertinent submatrix S119898119886 from S119898 and add the semantic vectors119886 of the affix

S119886 = S119898119886 + i⨂ s119886 (27)

Here i is a unit vector of length 119896 and ⨂ is the generalizedKronecker product which in (27) stacks 119896 copies of s119886 row-wise As a first step we could define a separate mapping G119886for each inflectional function 119886

S119886G119886 = T119886 (28)

but in this set-up learning of inflected words does not benefitfrom the knowledge of the base words This can be remediedby a mapping for augmented matrices that contain the rowvectors for both base words and inflected words[S119898

S119886]G119886 = [T119898

T119886] (29)

The dimension of G119886 (length of semantic vector by lengthof triphone vector) remains the same so this option is notmore costly than the preceding one Nevertheless for eachinflectional function a separate large matrix is required Amuch more parsimonious solution is to build augmentedmatrices for base words and all inflected words jointly

[[[[[[[[[[S119898S1198861S1198862S119886119899

]]]]]]]]]]G = [[[[[[[[[[

T119898T1198861T1198862T119886119899

]]]]]]]]]](30)

The dimension of G is identical to that of G119886 but now allinflectional functions are dealt with by a single mappingIn what follows we report the results obtained with thismapping

We selected 6595 inflected variants which met the cri-terion that the frequency of the corresponding inflectionalfunction was at least 50 This resulted in a dataset with 91comparatives 97 superlatives 2401 plurals 1333 continuousforms (eg walking) 859 past tense forms and 1086 formsclassed as perfective (past participles) as well as 728 thirdperson verb forms (eg walks) Many forms can be analyzedas either past tenses or as past participles We followed theanalyses of the treetagger which resulted in a dataset inwhichboth inflectional functions are well-attested

Following (30) we obtained an (augmented) 10582 times5483 semantic matrix S whereas before we retained the 5483columns with the highest column variance The (augmented)triphone matrix T for this dataset had the same dimensions

Inspection of the activations of the triphones revealedthat targeted triphones had top activations for 85 of themonolexomic words and 86 of the inflected words Theproportion of words with at most one intruding triphone was97 for both monolexomic and inflected words The graph-based algorithm performed with an overall accuracy of 94accuracies broken down bymorphology revealed an accuracyof 99 for the monolexomic words and an accuracy of 92for inflected words One source of errors for the productionalgorithm is inconsistent coding in the celex database Forinstance the stem of prosper is coded as having a final schwafollowed by r but the inflected forms are coded without ther creating a mismatch between a partially rhotic stem andcompletely non-rhotic inflected variants

We next put model performance to a more stringent testby using 10-fold cross-validation for the inflected variantsFor each fold we trained on all stems and 90 of all inflectedforms and then evaluated performance on the 10of inflectedforms that were not seen in training In this way we can ascer-tain the extent to which our production system (network plusgraph-based algorithm for ordering triphones) is productiveWe excluded from the cross-validation procedure irregularinflected forms forms with celex phonological forms withinconsistent within-paradigm rhoticism as well as forms thestem of which was not available in the training set Thus

Complexity 23

cross-validation was carried out for a total of 6236 inflectedforms

As before the semantic vectors for inflected words wereobtained by addition of the corresponding content and inflec-tional semantic vectors For each training set we calculatedthe transformationmatrixG from the S andTmatrices of thattraining set For an out-of-bag inflected form in the test setwe calculated its semantic vector and multiplied this vectorwith the transformation matrix (using (21)) to obtain thepredicted triphone vector t

The proportion of forms that were predicted correctlywas 062 The proportion of forms ranked second was 025Forms that were incorrectly ranked first typically were otherinflected forms (including bare stems) that happened toreceive stronger support than the targeted form Such errorsare not uncommon in spoken English For instance inthe Buckeye corpus [110] closest is once reduced to closFurthermore Dell [90] classified forms such as concludementfor conclusion and he relax for he relaxes as (noncontextual)errors

The algorithm failed to produce the targeted form for3 of the cases Examples of the forms produced instead arethe past tense for blazed being realized with [zId] insteadof [zd] the plural mouths being predicted as having [Ts] ascoda rather than [Ds] and finest being reduced to finst Thevoiceless production formouth does not follow the dictionarynorm but is used as attested by on-line pronunciationdictionaries Furthermore the voicing alternation is partlyunpredictable (see eg [128] for final devoicing in Dutch)and hence model performance here is not unreasonable Wenext consider model accuracy for derived words

523 Performance on Derived Words When building thevector space model we distinguished between inflectionand word formation Inflected words did not receive theirown semantic vectors By contrast each derived word wasassigned its own lexome together with a lexome for itsderivational function Thus happiness was paired with twolexomes happiness and ness Since the semantic matrix forour dataset already contains semantic vectors for derivedwords we first investigated how well forms are predictedwhen derived words are assessed along with monolexomicwords without any further differentiation between the twoTo this end we constructed a semantic matrix S for 4885words (rows) by 4993 (columns) and constructed the trans-formation matrix G from this matrix and the correspondingtriphone matrix T The predicted form vectors of T =SG supported the targeted triphones above all other tri-phones without exception Furthermore the graph-basedalgorithm correctly reconstructed 99 of all forms withonly 5 cases where it assigned the correct form secondrank

Next we inspected how the algorithm performs whenthe semantic vectors of derived forms are obtained fromthe semantic vectors of their base words and those of theirderivational lexomes instead of using the semantic vectors ofthe derived words themselves To allow subsequent evalua-tion by cross-validation we selected those derived words that

contained an affix that occured at least 30 times in our data set(again (38) agent (177) ful (45) instrument (82) less(54) ly (127) and ness (57)) to a total of 770 complex words

We first combined these derived words with the 3987monolexomic words For the resulting 4885 words the Tand S matrices were constructed from which we derivedthe transformation matrix G and subsequently the matrixof predicted triphone strengths T The proportion of wordsfor which the targeted triphones were the best supportedtriphones was 096 and the graph algorithm performed withan accuracy of 989

In order to assess the productivity of the system weevaluated performance on derived words by means of 10-fold cross-validation For each fold we made sure that eachaffix was present proportionally to its frequency in the overalldataset and that a derived wordrsquos base word was included inthe training set

We first examined performance when the transformationmatrix is estimated from a semantic matrix that contains thesemantic vectors of the derived words themselves whereassemantic vectors for unseen derived words are obtained bysummation of the semantic vectors of base and affix It turnsout that this set-up results in a total failure In the presentframework the semantic vectors of derived words are tooidiosyncratic and too finely tuned to their own collocationalpreferences They are too scattered in semantic space tosupport a transformation matrix that supports the triphonesof both stem and affix

We then used exactly the same cross-validation procedureas outlined above for inflected words constructing semanticvectors for derived words from the semantic vectors of baseand affix and calculating the transformation matrix G fromthese summed vectors For unseen derivedwordsGwas usedto transform the semantic vectors for novel derived words(obtained by summing the vectors of base and affix) intotriphone space

For 75 of the derived words the graph algorithmreconstructed the targeted triphones For 14 of the derivednouns the targeted form was not retrieved These includecases such as resound which the model produced with [s]instead of the (unexpected) [z] sewer which the modelproduced with R instead of only R and tumbler where themodel used syllabic [l] instead of nonsyllabic [l] given in thecelex target form

53 Weak Links in the Triphone Graph and Delays in SpeechProduction An important property of the model is that thesupport for trigrams changes where a wordrsquos graph branchesout for different morphological variants This is illustratedin Figure 9 for the inflected form blending Support for thestem-final triphone End is still at 1 but then blend can endor continue as blends blended or blending The resultinguncertainty is reflected in the weights on the edges leavingEnd In this example the targeted ing form is driven by theinflectional semantic vector for continuous and hence theedge to ndI is best supported For other forms the edgeweights will be different and hence other paths will be bettersupported

24 Complexity

1

1

1051

024023

03

023

1

001 001

1

02

03

023

002

1 0051

bl

blE

lEn

EndndI

dIN

IN

INI

NIN

dId

Id

IdI

dII

IIN

ndz

dz

dzI

zIN

nd

EnI

nIN

lEI

EIN

Figure 9 The directed graph for blending Vertices represent triphones including triphones such as IIN that were posited by the graphalgorithm to bridge potentially relevant transitions that are not instantiated in the training data Edges are labelled with their edge weightsThe graph the edge weights of which are specific to the lexomes blend and continuous incorporates not only the path for blending butalso the paths for blend blends and blended

The uncertainty that arises at branching points in thetriphone graph is of interest in the light of several exper-imental results For instance inter keystroke intervals intyping become longer at syllable and morph boundaries[129 130] Evidence from articulography suggests variabilityis greater at morph boundaries [131] Longer keystroke exe-cution times and greater articulatory variability are exactlywhat is expected under reduced edge support We thereforeexamined whether the edge weights at the first branchingpoint are predictive for lexical processing To this end weinvestigated the acoustic duration of the segment at the centerof the first triphone with a reduced LDL edge weight (in thepresent example d in ndI henceforth lsquobranching segmentrsquo)for those words in our study that are attested in the Buckeyecorpus [110]

The dataset we extracted from the Buckeye corpus com-prised 15105 tokens of a total of 1327 word types collectedfrom 40 speakers For each of these words we calculatedthe relative duration of the branching segment calculatedby dividing segment duration by word duration For thepurposes of statistical evaluation the distribution of thisrelative duration was brought closer to normality bymeans ofa logarithmic transformationWe fitted a generalized additivemixed model [132] to log relative duration with randomintercepts for word and speaker log LDL edge weight aspredictor of interest and local speech rate (Phrase Rate)log neighborhood density (Ncount) and log word frequencyas control variables A summary of this model is presentedin Table 7 As expected an increase in LDL Edge Weight

goes hand in hand with a reduction in the duration of thebranching segment In other words when the edge weight isreduced production is slowed

The adverse effects of weaknesses in a wordrsquos path inthe graph where the path branches is of interest againstthe discussion about the function of weak links in diphonetransitions in the literature on comprehension [133ndash138] Forcomprehension it has been argued that bigram or diphonelsquotroughsrsquo ie low transitional probabilities in a chain of hightransitional probabilities provide points where sequencesare segmented and parsed into their constituents Fromthe perspective of discrimination learning however low-probability transitions function in exactly the opposite wayfor comprehension [54 124] Naive discriminative learningalso predicts that troughs should give rise to shorter process-ing times in comprehension but not because morphologicaldecomposition would proceed more effectively Since high-frequency boundary bigrams and diphones are typically usedword-internally across many words they have a low cuevalidity for thesewords Conversely low-frequency boundarybigrams are much more typical for very specific base+affixcombinations and hence are better discriminative cues thatafford enhanced activation of wordsrsquo lexomes This in turngives rise to faster processing (see also Ramscar et al [78] forthe discriminative function of low-frequency bigrams in low-frequency monolexomic words)

There is remarkable convergence between the directedgraphs for speech production such as illustrated in Figure 9and computational models using temporal self-organizing

Complexity 25

Table 7 Statistics for the partial effects of a generalized additive mixed model fitted to the relative duration of edge segments in the Buckeyecorpus The trend for log frequency is positive accelerating TPRS thin plate regression spline

A parametric coefficients Estimate Std Error t-value p-valueIntercept -19165 00592 -323908 lt 00001Log LDL Edge Weight -01358 00400 -33960 00007Phrase Rate 00047 00021 22449 00248Log Ncount 01114 00189 58837 lt 00001B smooth terms edf Refdf F-value p-valueTPRS log Frequency 19007 19050 75495 00004random intercepts word 11498839 13230000 314919 lt 00001random intercepts speaker 302995 390000 345579 lt 00001

maps (TSOMs [139ndash141]) TSOMs also use error-drivenlearning and in recent work attention is also drawn toweak edges in wordsrsquo paths in the TSOM and the conse-quences thereof for lexical processing [142] We are not usingTSOMs however one reason being that in our experiencethey do not scale up well to realistically sized lexiconsA second reason is that we are pursuing the hypothesisthat form serves meaning and that self-organization ofform by itself is not necessary This hypothesis is likelytoo strong especially as we do not provide a rationale forthe trigram and triphone units that we use to representaspects of form It is worth noting that spatial organizationof form features is not restricted to TSOMs Although inour approach the spatial organization of triphone features isleft unspecified such an organization can be easily enforced(see [85] for further details) by using an algorithm suchas graphopt which self-organizes the vertices of a graphinto a two-dimensional plane Figure 9 was obtained withthis algorithm (Graphopt was developed for the layout oflarge graphs httpwwwschmuhlorggraphopt andis implemented in the igraph package Graphopt uses basicprinciples of physics to iteratively determine an optimallayout Each node in the graph is given both mass and anelectric charge and edges between nodes are modeled asspringsThis sets up a system inwhich there are attracting andrepelling forces between the vertices of the graph and thisphysical system is simulated until it reaches an equilibrium)

54 Discussion Our production network reconstructsknown wordsrsquo forms with a high accuracy 999 formonolexomic words 92 for inflected words and 99 forderived words For novel complex words accuracy under10-fold cross-validation was 62 for inflected words and 75for derived words

The drop in performance for novel forms is perhapsunsurprising given that speakers understand many morewords than they themselves produce even though they hearnovel forms on a fairly regular basis as they proceed throughlife [79 143]

However we also encountered several technical problemsthat are not due to the algorithm but to the representationsthat the algorithm has had to work with First it is surprisingthat accuracy is as high as it is given that the semanticvectors are constructed from a small corpus with a very

simple discriminative algorithm Second we encounteredinconsistencies in the phonological forms retrieved from thecelex database inconsistencies that in part are due to theuse of discrete triphones Third many cases where the modelpredictions do not match the targeted triphone sequence thetargeted forms have a minor irregularity (eg resound with[z] instead of [s]) Fourth several of the typical errors that themodel makes are known kinds of speech errors or reducedforms that one might encounter in engaged conversationalspeech

It is noteworthy that themodel is almost completely data-driven The triphones are derived from celex and otherthan somemanual corrections for inconsistencies are derivedautomatically from wordsrsquo phone sequences The semanticvectors are based on the tasa corpus and were not in any wayoptimized for the production model Given the triphone andsemantic vectors the transformation matrix is completelydetermined No by-hand engineering of rules and exceptionsis required nor is it necessary to specify with hand-codedlinks what the first segment of a word is what its secondsegment is etc as in the weaver model [37] It is only in theheuristic graph algorithm that three thresholds are requiredin order to avoid that graphs become too large to remaincomputationally tractable

The speech production system that emerges from thisapproach comprises first of all an excellent memory forforms that have been encountered before Importantly thismemory is not a static memory but a dynamic one It is not arepository of stored forms but it reconstructs the forms fromthe semantics it is requested to encode For regular unseenforms however a second network is required that projects aregularized semantic space (obtained by accumulation of thesemantic vectors of content and inflectional or derivationalfunctions) onto the triphone output space Importantly itappears that no separate transformations or rules are requiredfor individual inflectional or derivational functions

Jointly the two networks both of which can also betrained incrementally using the learning rule ofWidrow-Hoff[44] define a dual route model attempts to build a singleintegrated network were not successfulThe semantic vectorsof derived words are too idiosyncratic to allow generalizationfor novel forms It is the property of the semantic vectorsof derived lexomes described in Section 323 namely thatthey are close to but not inside the cloud of their content

26 Complexity

lexomes which makes them productive Novel forms do notpartake in the idiosyncracies of lexicalized existing wordsbut instead remain bound to base and derivational lexomesIt is exactly this property that makes them pronounceableOnce produced a novel form will then gravitate as experi-ence accumulates towards the cloud of semantic vectors ofits morphological category meanwhile also developing itsown idiosyncracies in pronunciation including sometimeshighly reduced pronunciation variants (eg tyk for Dutchnatuurlijk ([natyrlk]) [111 144 145] It is perhaps possibleto merge the two routes into one but for this much morefine-grained semantic vectors are required that incorporateinformation about discourse and the speakerrsquos stance withrespect to the adressee (see eg [115] for detailed discussionof such factors)

6 Bringing in Time

Thus far we have not considered the role of time The audiosignal comes in over time longerwords are typically readwithmore than one fixation and likewise articulation is a temporalprocess In this section we briefly outline how time can bebrought into the model We do so by discussing the readingof proper names

Proper names (morphologically compounds) pose achallenge to compositional theories as the people referredto by names such as Richard Dawkins Richard NixonRichardThompson and SandraThompson are not obviouslysemantic composites of each other We therefore assignlexomes to names irrespective of whether the individu-als referred to are real or fictive alive or dead Further-more we assume that when the personal name and familyname receive their own fixations both names are under-stood as pertaining to the same named entity which istherefore coupled with its own unique lexome Thus forthe example sentences in Table 8 the letter trigrams ofJohn are paired with the lexome JohnClark and like-wise the letter trigrams of Clark are paired with this lex-ome

By way of illustration we obtained thematrix of semanticvectors training on 100 randomly ordered tokens of thesentences of Table 8 We then constructed a trigram cuematrix C specifying for each word (John Clark wrote )which trigrams it contains In parallel a matrix L specify-ing for each word its corresponding lexomes (JohnClarkJohnClark write ) was set up We then calculatedthe matrix F by solving CF = L and used F to cal-culate estimated (predicted) semantic vectors L Figure 10presents the correlations of the estimated semantic vectorsfor the word forms John John Welsch Janet and Clarkwith the targeted semantic vectors for the named entitiesJohnClark JohnWelsch JohnWiggam AnneHastieand JaneClark For the sequentially read words John andWelsch the semantic vector generated by John and thatgenerated by Welsch were summed to obtain the integratedsemantic vector for the composite name

Figure 10 upper left panel illustrates that upon readingthe personal name John there is considerable uncertainty

about which named entity is at issue When subsequentlythe family name is read (upper right panel) uncertainty isreduced and John Welsch now receives full support whereasother named entities have correlations close to zero or evennegative correlations As in the small world of the presenttoy example Janet is a unique personal name there is nouncertainty about what named entity is at issue when Janetis read For the family name Clark on its own by contrastJohn Clark and Janet Clark are both viable options

For the present example we assumed that every ortho-graphic word received one fixation However depending onwhether the eye lands far enough into the word namessuch as John Clark and compounds such as graph theorycan also be read with a single fixation For single-fixationreading learning events should comprise the joint trigrams ofthe two orthographic words which are then simultaneouslycalibrated against the targeted lexome (JohnClark graph-theory)

Obviously the true complexities of reading such asparafoveal preview are not captured by this example Never-theless as a rough first approximation this example showsa way forward for integrating time into the discriminativelexicon

7 General Discussion

We have presented a unified discrimination-driven modelfor visual and auditory word comprehension and wordproduction the discriminative lexicon The layout of thismodel is visualized in Figure 11 Input (cochlea and retina)and output (typing and speaking) systems are presented inlight gray the representations of themodel are shown in darkgray For auditory comprehension we make use of frequencyband summary (FBS) features low-level cues that we deriveautomatically from the speech signal The FBS features serveas input to the auditory network F119886 that maps vectors of FBSfeatures onto the semantic vectors of S For reading lettertrigrams represent the visual input at a level of granularitythat is optimal for a functional model of the lexicon (see [97]for a discussion of lower-level visual features) Trigrams arerelatively high-level features compared to the FBS featuresFor features implementing visual input at a much lower levelof visual granularity using histograms of oriented gradients[146] see Linke et al [15] Letter trigrams are mapped by thevisual network F119900 onto S

For reading we also implemented a network heredenoted by K119886 that maps trigrams onto auditory targets(auditory verbal images) represented by triphones Theseauditory triphones in turn are mapped by network H119886onto the semantic vectors S This network is motivated notonly by our observation that for predicting visual lexicaldecision latencies triphones outperform trigrams by a widemargin Network H119886 is also part of the control loop forspeech production see Hickok [147] for discussion of thiscontrol loop and the necessity of auditory targets in speechproduction Furthermore the acoustic durations with whichEnglish speakers realize the stem vowels of English verbs[148] as well as the duration of word final s in English [149]

Complexity 27

Table 8 Example sentences for the reading of proper names To keep the example simple inflectional lexomes are not taken into account

John Clark wrote great books about antsJohn Clark published a great book about dangerous antsJohn Welsch makes great photographs of airplanesJohn Welsch has a collection of great photographs of airplanes landingJohn Wiggam will build great harpsichordsJohn Wiggam will be a great expert in tuning harpsichordsAnne Hastie has taught statistics to great second year studentsAnne Hastie has taught probability to great first year studentsJanet Clark teaches graph theory to second year studentsJohnClark write great book about antJohnClark publish a great book about dangerous antJohnWelsch make great photograph of airplaneJohnWelsch have a collection of great photograph airplane landingJohnWiggam future build great harpsichordJohnWiggam future be a great expert in tuning harpsichordAnneHastie have teach statistics to great second year studentAnneHastie have teach probability to great first year studentJanetClark teach graphtheory to second year student

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

John

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

John Welsch

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

Janet

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

Clark

Pear

son

corr

elat

ion

minus0

50

00

51

0

Figure 10 Correlations of estimated semantic vectors for John JohnWelsch Janet andClarkwith the targeted semantic vectors of JohnClarkJohnWelsch JohnWiggam AnneHastie and JaneClark

can be predicted from ndl networks using auditory targets todiscriminate between lexomes [150]

Speech production is modeled by network G119886 whichmaps semantic vectors onto auditory targets which in turnserve as input for articulation Although current modelsof speech production assemble articulatory targets fromphonemes (see eg [126 147]) a possibility that is certainlycompatible with our model when phones are recontextual-ized as triphones we think that it is worthwhile investigatingwhether a network mapping semantic vectors onto vectors ofarticulatory parameters that in turn drive the articulators canbe made to work especially since many reduced forms arenot straightforwardly derivable from the phone sequences oftheir lsquocanonicalrsquo forms [111 144]

We have shown that the networks of our model havereasonable to high production and recognition accuraciesand also generate a wealth of well-supported predictions forlexical processing Central to all networks is the hypothesisthat the relation between form and meaning can be modeleddiscriminatively with large but otherwise surprisingly simplelinear networks the underlyingmathematics of which (linearalgebra) is well-understood Given the present results it isclear that the potential of linear algebra for understandingthe lexicon and lexical processing has thus far been severelyunderestimated (the model as outlined in Figure 11 is amodular one for reasons of computational convenience andinterpretabilityThe different networks probably interact (seefor some evidence [47]) One such interaction is explored

28 Complexity

typing speaking

orthographic targets

trigrams

auditory targets

triphones

semantic vectors

TASA

auditory cues

FBS features

orthographic cues

trigrams

cochlea retina

To Ta

Ha

GoGa

KaS

Fa Fo

CoCa

Figure 11 Overview of the discriminative lexicon Input and output systems are presented in light gray the representations of the model areshown in dark gray Each of these representations is a matrix with its own set of features Arcs are labeled with the discriminative networks(transformationmatrices) thatmapmatrices onto each otherThe arc from semantic vectors to orthographic vectors is in gray as thismappingis not explored in the present study

in Baayen et al [85] In their production model for Latininflection feedback from the projected triphone paths backto the semantics (synthesis by analysis) is allowed so thatof otherwise equally well-supported paths the path that bestexpresses the targeted semantics is selected For informationflows between systems in speech production see Hickok[147]) In what follows we touch upon a series of issues thatare relevant for evaluating our model

Incremental learning In the present study we esti-mated the weights on the connections of these networkswith matrix operations but importantly these weights canalso be estimated incrementally using the learning rule ofWidrow and Hoff [44] further improvements in accuracyare expected when using the Kalman filter [151] As allmatrices can be updated incrementally (and this holds aswell for the matrix with semantic vectors which are alsotime-variant) and in theory should be updated incrementally

whenever information about the order of learning events isavailable the present theory has potential for studying lexicalacquisition and the continuing development of the lexiconover the lifetime [79]

Morphology without compositional operations Wehave shown that our networks are predictive for a rangeof experimental findings Important from the perspective ofdiscriminative linguistics is that there are no compositionaloperations in the sense of Frege and Russell [1 2 152] Thework on compositionality in logic has deeply influencedformal linguistics (eg [3 153]) and has led to the beliefthat the ldquoarchitecture of the language facultyrdquo is groundedin a homomorphism between a calculus (or algebra) ofsyntactic representations and a calculus (or algebra) based onsemantic primitives Within this tradition compositionalityarises when rules combining representations of form arematched with rules combining representations of meaning

Complexity 29

The approach of Marelli and Baroni [60] who derive thesemantic vector of a complex word from the semantic vectorof its base and a dedicated affix-specific mapping from thepertinent base words to the corresponding derived wordsis in the spirit of compositionality in that a change in form(through affixation) goes hand in hand with a change inmeaning (modeled with a linear mapping in semantic space)

The perspective on morphological processing developedin the present study breaks with this tradition Althoughsemantic vectors for inflected words are obtained by summa-tion of the semantic vectors of the base word and those ofits inflectional functions (see [85] for modeling of the richverbal paradigms of Latin) the form vectors correspondingto the resulting semantic vectors are not obtained by thesummation of vectors for chunks of form nor by any otheroperations on forms Attempts to align chunks of wordsrsquo formwith their semantic structures mdash typically represented bygraphs constructed over semantic primitives (see eg [154])mdash are destined to end in quagmires of arbitrary decisionsabout the number of inflectional classes required (Estonian30 or 300) what the chunks should look like (does Germanhave a morpheme d in its articles and demonstratives derdie das diese ) whether Semitic morphology requiresoperations on form on one or two tiers [155ndash157] andhow many affixal position slots are required for Athebascanlanguages such asNavajo [158] A language that is particularlypainful for compositional theories is Yelı Dnye an Indo-Pacific language spoken on Rossel island (in the south-east ofPapuaNewGuinea)The language has a substantial inventoryof over a 1000 function words used for verb agreementTypically these words are monosyllables that simultaneouslyspecify values for negation tense aspect person and numberof subject deixis evidentiality associated motion and coun-terfactuality Furthermore verbs typically have no less thaneight different stems the use of which depends on particularconstellations of inflectional functions [159 160] Yelı Dnyethus provides a striking counterexample to the hypothesisthat a calculus of meaning would be paralleled by a calculusof form

Whereas the theory of linear discriminative learningradically rejects the idea that morphological processing isgrounded in compositionality as defined in mainstream for-mal linguistic theory it does allow for the conjoint expressionin form of lexical meanings and inflectional functions andfor forms to map onto such conjoint semantic vectors Butjust asmultiple inflectional functions can be expressed jointlyin a single word multiple words can map onto nonconjointsemantic vectors as was illustrated above for named entityrecognition John Wiggam can be understood to refer to avery specific person without this person being a composi-tional function of the proper name John and the surnameWiggam In both these cases there is no isomorphy betweeninflectional functions and lexical meanings on the one handand chunks of form on the other hand

Of course conjoint vectors for lexical meanings andinflectional functions can be implemented andmade to workonly because a Latin verb form such as sapıvissemus isunderstood when building the semantic space S to expresslexomes for person (first) number (plural) voice (active)

tense (pluperfect) mood (subjunctive) and lexical meaning(to know) (see [85] for computational modeling details)Although this could loosely be described as a decomposi-tional approach to inflected words we prefer not to use theterm lsquodecompositionalrsquo as in formal linguistics this termas outlined above has a very different meaning A moreadequate terminology would be that the system is conjointrealizational for production and conjoint inferential for com-prehension (we note here that we have focused on temporallyconjoint realization and inference leaving temporally disjointrealization and inference for sequences of words and possiblycompounds and fixed expression for further research)

Naive versus linear discriminative learning There isone important technical difference between the learningengine of the current study ldl and naive discriminativelearning (ndl) Because ndl makes the simplifying assump-tion that outcomes are orthogonal the weights from the cuesto a given outcome are independent of the weights from thecues to the other outcomes This independence assumptionwhich motivates the word naive in naive discriminativelearning makes it possible to very efficiently update networkweights during incremental learning In ldl by contrast thisindependence no longer holds learning is no longer naiveWe therefore refer to our networks not as ndl networksbut simply as linear discriminative learning networks Actualincremental updating of ldl networks is computationallymore costly but further numerical optimization is possibleand implementations are underway

Scaling An important property of the present approachis that good results are obtained already for datasets ofmodestsize Our semantic matrix is derived from the tasa corpuswhich with only 10 million words is dwarfed by the 28 billionwords of the ukWaC corpus used by Marelli and Baroni [60]more words than any individual speaker can ever encounterin their lifetime Similarly Arnold et al [18] report goodresults for word identification when models are trained on20 hours of speech which contrasts with the huge volumesof speech required for training deep learning networks forspeech recognition Although performance increases some-whatwithmore training data [161] it is clear that considerableheadway can be made with relatively moderate volumesof speech It thus becomes feasible to train models onfor instance Australian English and New Zealand Englishusing already existing speech corpora and to implementnetworks that make precise quantitative predictions for howunderstanding is mediated by listenersrsquo expectations abouttheir interlocutors (see eg [162])

In this study we have obtained good results with asemantic matrix with a dimensionality of around 4000lexomes By itself we find it surprising that a semantic vectorspace can be scaffolded with a mere 4000 words But thisfindingmay also shed light on the phenomenon of childhoodamnesia [163 164] Bauer and Larkina [165] pointed outthat young children have autobiographical memories thatthey may not remember a few years later They argue thatthe rate of forgetting diminishes as we grow older andstabilizes in adulthood Part of this process of forgettingmay relate to the changes in the semantic matrix Cruciallywe expect the correlational structure of lexomes to change

30 Complexity

considerably over time as experience accumulates If this isindeed the case the semantic vectors for the lexomes foryoung children are different from the corresponding lexomesfor older children which will again differ from those ofadults As a consequence autobiographical memories thatwere anchored to the lexomes at a young age can no longerbe accessed as the semantic vectors to which these memorieswere originally anchored no longer exist

No examplars Given how we estimate the transforma-tion matrices with which we navigate between form andmeaning it might seem that our approach is in essence animplementation of an exemplar-based theory of the mentallexicon After all for instance the C119886 matrix specifies foreach auditory word form (exemplar) its pertinent frequencyband summary features However the computational imple-mentation of the present study should not be confused withthe underlying algorithmic conceptualization The underly-ing conceptualization is that learning proceeds incrementallytrial by trial with the weights of transformation networksbeing updated with the learning rule of Widrow and Hoff[44] In the mean trial-by-trial updating with the Widrow-Hoff learning rule results in the same expected connectionweights as obtained by the present estimates using thegeneralized inverse The important conceptual advantage ofincremental learning is however that there is no need to storeexemplars By way of example for auditory comprehensionacoustic tokens are given with the input to the learningsystemThese tokens the ephemeral result of interactingwiththe environment leave their traces in the transformationmatrix F119886 but are not themselves stored The same holdsfor speech production We assume that speakers dynamicallyconstruct the semantic vectors from their past experiencesrather than retrieving these semantic vectors from somedictionary-like fixed representation familiar from currentfile systems The dynamic nature of these semantic vectorsemerged loud and clear from the modeling of novel complexwords for speech production under cross-validation In otherwords our hypothesis is that all input and output vectors areephemeral in the sense that they represent temporary statesof the cognitive system that are not themselves represented inthat system but that leave their traces in the transformationmatrices that jointly characterize and define the cognitivesystem that we approximate with the discriminative lexicon

This dynamic perspective on the mental lexicon asconsisting of several coupled networks that jointly bringthe cognitive system into states that in turn feed into fur-ther networks (not modeled here) for sentence integration(comprehension) and articulation (production) raises thequestion about the status of units in this theory Units playan important role when constructing numeric vectors forform and meaning Units for letter trigrams and phonetrigrams are of course context-enriched letter and phoneunits Content lexomes as well as lexomes for derivationaland inflectional functions are crutches for central semanticfunctions ranging from functions realizing relatively concreteonomasiological functions (lsquothis is a dog not a catrsquo) toabstract situational functions allowing entities events andstates to be specified on the dimensions of time spacequantity aspect etc However crucial to the central thrust of

the present study there are no morphological symbols (unitscombining form andmeaning) normorphological formunitssuch as stems and exponents of any kind in the modelAbove we have outlined the many linguistic considerationsthat have led us not to want to incorporate such units aspart of our theory Importantly as there are no hidden layersin our model it is not possible to seek for lsquosubsymbolicrsquoreflexes of morphological units in patterns of activation overhidden layers Here we depart from earlier connectionistmodels which rejected symbolic representations but retainedthe belief that there should be morphological representationalbeit subsymbolic ones Seidenberg and Gonnerman [166]for instance discuss morphological representations as beinglsquointerlevel representationsrsquo that are emergent reflections ofcorrelations among orthography phonology and semanticsThis is not to say that themore agglutinative a language is themore the present model will seem to the outside observer tobe operating withmorphemes But themore a language tendstowards fusional or polysyntheticmorphology the less strongthis impression will be

Model complexity Next consider the question of howdifficult lexical processing actually is Sidestepping the com-plexities of the neural embedding of lexical processing [126147] here we narrow down this question to algorithmiccomplexity State-of-the-art models in psychology [37 114167] implementmany layers of hierarchically organized unitsand many hold it for an established fact that such units arein some sense psychologically real (see eg [19 91 168])However empirical results can be equally well understoodwithout requiring the theoretical construct of the morpheme(see eg [47 54 58 166 169 170]) The present studyillustrates this point for lsquoboundaryrsquo effects in written andoral production When paths in the triphone graph branchuncertainty increases and processing slows down Althoughobserved lsquoboundaryrsquo effects are compatible with a post-Bloomfieldian lexicon the very same effects arise in ourmodel even though morphemes do not figure in any way inour computations

Importantly the simplicity of the linear mappings in ourmodel of the discriminative lexicon allows us to sidestep theproblem that typically arise in computationally implementedfull-scale hierarchical systems namely that errors arisingat lower levels propagate to higher levels The major stepsforward made in recent end-to-end models in languageengineering suggest that end-to-end cognitive models arealso likely to perform much better One might argue thatthe hidden layers of deep learning networks represent thetraditional lsquohidden layersrsquo of phonemes morphemes andword forms mediating between form and meaning in lin-guistic models and offshoots thereof in psychology Howeverthe case of baboon lexical learning [12 13] illustrates thatthis research strategy is not without risk and that simplerdiscriminative networks can substantially outperform deeplearning networks [15] We grant however that it is conceiv-able that model predictions will improve when the presentlinear networks are replaced by deep learning networks whenpresented with the same input and output representationsused here (see Zhao et al [171] for a discussion of lossfunctions for deep learning targeting numeric instead of

Complexity 31

categorical output vectors) What is unclear is whethersuch networks will improve our understandingmdashthe currentlinear networks offer the analyst connection weights betweeninput and output representations that are straightforwardlylinguistically interpretable Where we think deep learningnetworks really will come into their own is bridging the gapbetween for instance the cochlea and retina on the one handand the heuristic representations (letter trigrams) that wehave used as a starting point for our functional theory of themappings between form and meaning

Network flexibility An important challenge for deeplearning networks designed to model human lexical process-ing is to keep the networks flexible and open to rapid updat-ing This openness to continuous learning is required notonly by findings on categorization [40 41 41 42] includingphonetic categorization [172 173] but is also visible in timeseries of reaction times Baayen and Hendrix [17] reportedimproved prediction of visual lexical decision reaction timesin the British Lexicon Project when a naive discriminativelearning network is updated from trial to trial simulating thewithin-experiment learning that goes on as subjects proceedthrough the experiment In natural speech the phenomenonof phonetic convergence between interlocutors [174] likewisebears witness to a lexicon that is constantly recalibratedin interaction with its environment (for the many ways inwhich interlocutorsrsquo speech aligns see [175]) Likewise therapidity with which listeners adapt to foreign accents [176]within a few minutes shows that lexical networks exhibitfast local optimization As deep learning networks typicallyrequire huge numbers of cycles through the training dataonce trained they are at risk of not having the requiredflexibility for fast local adaptation This risk is reduced for(incremental) wide learning as illustrated by the phoneticconvergence evidenced by the model of Arnold et al [18] forauditory comprehension (a very similar point was made byLevelt [177] in his critique of connectionist models in thenineties)

Open questions This initial study necessarily leavesmany questions unanswered Compounds and inflectedderived words have not been addressed and morphologicaloperations such as reduplication infixation and nonconcate-native morphology provide further testing grounds for thepresent approach We also need to address the processingof compounds with multiple constituents ubiquitous inlanguages as different as German Mandarin and EstonianFurthermore in actual utterances words undergo assimila-tion at their boundaries and hence a crucial test case forthe discriminative lexicon is to model the production andcomprehension of words in multi-word utterances For ourproduction model we have also completely ignored stresssyllabification and pitch which however likely has renderedthe production task more difficult than necessary

A challenge for discriminative morphology is the devel-opment of proper semantic matrices For languages withrich morphology such as Estonian current morphologicalparsers will identify case-inflected words as nominativesgenitives or partitives (among others) but these labelsfor inflectional forms do not correspond to the semanticfunctions encoded in these forms (see eg [178 179]) What

is required are computational tools that detect the appropriateinflectional lexomes for these words in the sentences orutterances in which they are embedded

A related problem specifically for the speech productionmodel is how to predict the strongly reduced word formsthat are rampant in conversational speech [111] Our auditorycomprehension network F119886 is confronted with reduced formsin training and the study of Arnold et al [18] indicates thatpresent identification accuracymay approximate human per-formance for single-word recognition Whereas for auditorycomprehension we are making progress what is required forspeech production is a much more detailed understanding ofthe circumstances under which speakers actually use reducedword forms The factors driving reduction may be in partcaptured by the semantic vectors but we anticipate thataspects of the unfolding discourse play an important role aswell which brings us back to the unresolved question of howto best model not isolated words but words in context

An important challenge for our theory which formalizesincremental implicit learning is how to address and modelthe wealth of explicit knowledge that speakers have abouttheir language We play with the way words sound in poetrywe are perplexed by wordsrsquo meanings and reinterpret themso that they make more sense (eg the folk etymology [180]reflected in modern English crawfish a crustacean and nota fish originally Middle English crevis compare ecrevissein French) we teach morphology in schools and we evenfind that making morphological relations explicit may bebeneficial for aphasic patients [181] The rich culture of worduse cannot but interact with the implicit system but howexactly this interaction unfolds andwhat its consequences areat both sides of the consciousunconscious divide is presentlyunclear

Although many questions remain our model is remark-ably successful in capturing many aspects of lexical andmorphological processing in both comprehension and pro-duction Since themodel builds on a combination of linguisti-callymotivated lsquosmartrsquo low-level representations for form andmeaning and large but otherwise straightforward andmathe-matically well-understood two-layer linear networks (to beclear networks without any hidden layers) the conclusionseems justified that algorithmically the ldquomental lexiconrdquomaybe much simpler than previously thought

Appendix

A Vectors Matrices and MatrixMultiplication

Figure 12 presents two data points 119886 and 119887 with coordinates(3 4) and (-2 -3) Arrows drawn from the origin to thesepoints are shown in blue with a solid and dashed linerespectively We refer to these points as vectors Vectors aredenoted with lower case letters in bold and we place the 119909coordinate of a point next to the 119910 coordinate

a = (3 4) b = (minus2 minus3) (A1)

32 Complexity

minus4 minus2 0 2 4

minus4

minus2

02

4

x

y

a

b

x

y

Figure 12 Points 119886 and 119887 can be transformed into points 119901 and 119902 bya linear transformation

We can represent a and b jointly by placing them next to eachother in a matrix For matrices we use capital letters in boldfont

A = ( 3 4minus2 minus3) (A2)

In addition to the points 119886 and 119887 Figure 12 also shows twoother datapoints 119909 and 119910 The matrix for these two points is

B = (minus5 24 minus1) (A3)

Let us assume that points 119886 and 119887 represent the forms oftwo words 1205961 and 1205962 and that the points 119909 and 119910 representthe meanings of these two words (below we discuss in detailhow exactly wordsrsquo forms and meanings can be representedas vectors of numbers) As morphology is the study of therelation between wordsrsquo forms and their meanings we areinterested in how to transform points 119886 and b into points119909 and 119910 and likewise in how to transform points 119909 and119910 back into points 119886 and 119887 The first transformation isrequired for comprehension and the second transformationfor production

We begin with the transformation for comprehensionFormally we have matrix A which we want to transforminto matrix B A way of mapping A onto B is by means ofa linear transformation using matrix multiplication For 2 times2 matrices matrix multiplication is defined as follows(1198861 11988621198871 1198872) (1199091 11990921199101 1199102) = (11988611199091 + 11988621199101 11988611199092 + 1198862119910211988711199091 + 11988721199101 11988711199092 + 11988721199102) (A4)

Such a mapping of A onto B is given by a matrix F

F = ( 1 2minus2 minus1) (A5)

and it is straightforward to verify that indeed( 3 4minus2 minus3) ( 1 2minus2 minus1) = (minus5 24 minus1) (A6)

Using matrix notation we can write

AF = B (A7)

Given A and B how do we obtain F The answeris straightforward but we need two further concepts theidentity matrix and the matrix inverse For multiplication ofnumbers the identity multiplication is multiplication with 1For matrices multiplication with the identity matrix I leavesthe locations of the points unchangedThe identitymatrix hasones on the main diagonal and zeros elsewhere It is easilyverified that indeed(1 00 1) (1199091 11990921199101 1199102) = (1199091 11990921199101 1199102) (1 00 1) = (1199091 11990921199101 1199102) (A8)

For numbers the inverse of multiplication with 119904 is dividingby 119904 (1119904 ) (119904119909) = 119909 (A9)

For matrices the inverse of a square matrix X is that matrixXminus1 such that their product is the identity matrix

Xminus1X = XXminus1 = I (A10)

For nonsquare matrices the inverse Yminus1 is defined such that

Yminus1 (YX) = (XY) 119884minus1 = X (A11)

We find the square matrix F that maps the square matrices Aonto B as follows

AF = B

Aminus1AF = Aminus1B

IF = Aminus1B

F = Aminus1B(A12)

Since for the present example A and its inverse happen to beidentical (Aminus1 = A) we obtain (A6)

Fmaps wordsrsquo form vectors onto wordsrsquo semantic vectorsLet us now consider the reverse and see how we can obtainwordsrsquo form vectors from wordsrsquo semantic vectors That isgiven B we want to find that matrix G which maps B ontoA ie

BG = A (A13)

Solving this equation proceeds exactly as above

BG = A

Bminus1BG = Bminus1A

IG = Bminus1A

G = Bminus1A(A14)

Complexity 33

The inverse of B is

Bminus1 = (13 2343 53) (A15)

and hence we have

(13 2343 53) ( 3 4minus2 minus3) = (minus13 minus2323 13 ) = G (A16)

The inverse of a matrix needs not exist A square matrixthat does not have an inverse is referred to as a singularmatrix Almost all matrices with which we work in theremainder of this study are singular For singular matricesan approximation of the inverse can be used such as theMoore-Penrose generalized inverse (In R the generalizedinverse is available in theMASS package function ginv Thenumeric library that is used by ginv is LAPACK availableat httpwwwnetliborglapack) In this study wedenote the generalized inverse of a matrix X by X1015840

The examples presented here are restricted to 2 times 2matrices but themathematics generalize tomatrices of largerdimensions When multiplying two matrices X and Y theonly constraint is that the number of columns ofX is the sameas the number of rows ofYThe resultingmatrix has the samenumber of rows as X and the same number of columns as Y

In the present study the rows of matrices represent wordsand columns the features of thesewords For instance the rowvectors of A can represent wordsrsquo form features and the rowvectors of and B the corresponding semantic features Thefirst row vector of A is mapped onto the first row vector ofB as follows (3 4) ( 1 2minus2 minus1) = (minus5 2) (A17)

A transformationmatrix such as F can be represented as atwo-layer networkThe network corresponding to F is shownin Figure 13 When the input vector (3 4) is presented to thenetwork the nodes 1198941 and 1198942 are set to 3 and 4 respectively Toobtain the activations 1199001 = minus5 and 1199002 = 2 for the output vector(minus5 2) we cumulate the evidence at the input weighted bythe connection strengths specified on the edges of the graphThus for 1199001 we obtain the value 3 times 1 + 4 times minus2 = minus5 and for1199002 we have 3 times 2 + 4 times minus1 = 2 In other words the networkmaps the input vector (3 4) onto the output vector (minus5 2)

An important property of linear maps is that they areproductive We can present a novel form vector say

s = (2 2) (A18)

to the network and it will map this vector onto a novelsemantic vector Using matrices this new semantic vector isstraightforwardly calculated(2 2) ( 1 2minus2 minus1) = (minus2 2) (A19)

1 2 minus2 minus1

i1 i2

o1 o2

semantic vectors come out here

form features go in here

connection weightsare the transformationmatrix

Figure 13 The linear network corresponding to the transformationmatrix F There are two input nodes 1198941 and 1198942 (in blue)and twooutput nodes 1199001 and 1199002 in red When the input vector (3 4) ispresented to the network the value at 1199001 is 3 times 1 + 4 times minus2 = minus5and that at 1199002 is 3 times 2 + 4 times minus1 = 2 The network maps the point(3 4) onto the point (minus5 2)B Graph-Based Triphone Sequencing

The hypothesis underlying graph-based triphone sequencingis that in the directed graph that has triphones as verticesand an edge between any two vertices that overlap (iesegments 2 and 3 of triphone 1 are identical to segments 1and 2 of triphone 2) the path from a wordrsquos initial triphone(the triphone starting with ) to a wordrsquos final triphone (thetriphone ending with a ) is the path receiving the bestsupport of all possible paths from the initial to the finaltriphone Unfortunately the directed graph containing allvertices and edges is too large tomake computations tractablein a reasonable amount of computation time The followingalgorithm is a heuristic algorithm that first collects potentiallyrelevant vertices and edges and then calculates all pathsstarting fromany initial vertex using theall simple pathsfunction from the igraph package [127] As a first step allvertices that are supported by the stem and whole wordwith network support exceeding 01 are selected To theresulting set those vertices are added that are supported bythe affix which is conditional on these vertices having anetwork support exceeding 095 A directed graph can nowbe constructed and for each initial vertex all paths startingat these vertices are calculated The subset of those pathsreaching a final vertex is selected and the support for eachpath is calculated As longer paths trivially will have greatersupport the support for a path is weighted for its lengthsimply by dividing the raw support by the path length Pathsare ranked by weighted path support and the path withmaximal support is selected as the acoustic image drivingarticulation

It is possible that no path from an initial vertex to a finalvertex is found due to critical boundary triphones not beinginstantiated in the data on which the model was trainedThis happens when the model is assessed on novel inflected

34 Complexity

nobo

dyso

met

hing

ever

ythi

ngno

thin

gev

eryo

nean

ythi

ngan

yone

som

eone

hers

him his

via

than I

exce

ptou

tside

thei

rsw

hoev

erth

ence

min

ein

side

amon

gst its

thei

rou

rm

yyo

uyo

ur her

she

your

selv

esw

hate

ver us we it

they he

them m

em

yself

him

self

hers

elfou

rselv

esyo

urse

lfits

elfth

emse

lves

abou

tw

ithou

tw

ithin

beyo

ndam

ong

betw

een

thro

ugho

utdu

ring

since by

from in of

with at fo

r tobe

low

abov

e up offdo

wn

upon on

agai

nst

tow

ard

tow

ards

arou

ndne

aral

ong

unde

rbe

hind

besid

eth

roug

hon

toin

toov

erac

ross

nobodysomethingeverythingnothingeveryoneanythinganyonesomeonehershimhisviathanIexceptoutsidetheirswhoeverthencemineinsideamongstitstheirourmyyouyourhersheyourselveswhateverusweittheyhethemmemyselfhimselfherselfourselvesyourselfitselfthemselvesaboutwithoutwithinbeyondamongbetweenthroughoutduringsincebyfrominofwithatfortobelowaboveupoffdownupononagainsttowardtowardsaroundnearalongunderbehindbesidethroughontointooveracross

minus01 0 01 02Value

Color Keyand Histogram

050

015

0025

00C

ount

Figure 14 Heatmap for the correlations of the semantic vectors of pronouns and prepositions Note that quantifier pronouns form a groupof their own that prepositions and the other pronouns form distinguisable groups and that within groups further subclusters are visible egforwe and us she and her and across and over All diagonal elements represent correlations equal to 1 (this is not brought out by the heatmapwhich color-codes diagonal elements with the color for the highest correlation of the range we specified 025)

and derived words under cross-validationTherefore verticesand edges are added to the graph for all nonfinal verticesthat are at the end of any of the paths starting at any initialtriphone Such novel vertices and edges are assigned zeronetwork support Typically paths from the initial vertex tothe final vertex can now be constructed and path support isevaluated as above

For an algorithm that allows triphone paths to includecycles which may be the case in languages with much richer

morphology than English see Baayen et al [85] and for codethe R packageWpmWithLdl described therein

C A Heatmap for Function Words(Pronouns and Prepositions)

The heatmap for the correlations of the semantic vectors ofpronouns and prepositions is presented in Figure 14

Complexity 35

Data Availability

With the exception of the data from theDistributed Little RedHen Lab all primary data are publicly available from the citedsources

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The authors are indebted to Geoff Hollis Jessie Nixon ChrisWestbury and LuisMienhardt for their constructive feedbackon earlier versions of this manuscript This research wassupported by an ERC advancedGrant (no 742545) to the firstauthor

References

[1] G Frege ldquoBegriffsschrift a formula language modeled uponthat of arithmetic for pure thoughtrdquo in From Frege to GodelA Source Book in Mathematical Logic 1879-1931 pp 1ndash82 1967

[2] B RussellAn Inquiry intoMeaning andTruth Allen andUnwinLondon UK 1942

[3] R Montague ldquoThe proper treatment of quantification in ordi-nary englishrdquo in Approaches to Natural Language pp 221ndash242Springer 1973

[4] N Chomsky Syntactic Structures vol 33 Mouton The Hague1957

[5] W J M Levelt Formal Grammars in Linguistics And Psycholin-guistics Volume 1 An Introduction to the Theory of FormalLanguages and Automata Volume 2 Applications in LinguisticTheory Volume 3 Psycholinguistic Applications John BenjaminsPublishing 2008

[6] D F Kleinschmidt and T Florian Jaeger ldquoRobust speechperception Recognize the familiar generalize to the similar andadapt to the novelrdquo Psychological Review vol 122 no 2 pp 148ndash203 2015

[7] L Breiman ldquoRandom forestsrdquo Machine Learning vol 45 no 1pp 5ndash32 2001

[8] C Strobl A-L Boulesteix A Zeileis and T Hothorn ldquoBiasin random forest variable importance measures illustrationssources and a solutionrdquo BMC Bioinformatics vol 8 2007

[9] A Hannun C Case J Casper et al ldquoDeep speechScaling up end-to-end speech recognitionrdquo 2014httpsarxivorgabs14125567

[10] Z Tuske P Golik R Schluter and H Ney ldquoAcoustic modelingwith deep neural networks using raw time signal for lvcsrrdquo inProceedings of the INTERSPEECH pp 890ndash894 2014

[11] J Schmidhuber ldquoDeep learning in neural networks anoverviewrdquo Neural Networks vol 61 pp 85ndash117 2015

[12] T Hannagan J C Ziegler S Dufau J Fagot and J GraingerldquoDeep learning of orthographic representations in baboonsrdquoPLoS ONE vol 9 no 1 2014

[13] J Grainger S Dufau M Montant J C Ziegler and J FagotldquoOrthographic processing in baboons (Papio papio)rdquo Sciencevol 336 no 6078 pp 245ndash248 2012

[14] D Scarf K Boy A U Reinert J Devine O Gunturkun andM Colombo ldquoOrthographic processing in pigeons (columba

livia)rdquo Proceedings of the National Academy of Sciences vol 113no 40 pp 11272ndash11276 2016

[15] M Linke F Broker M Ramscar and R H Baayen ldquoArebaboons learning ldquoorthographicrdquo representations Probablynotrdquo PLoS ONE vol 12 no 8 2017

[16] D H Hubel and T N Wiesel ldquoReceptive fields binocularinteraction and functional architecture in the catrsquos visualcortexrdquo The Journal of Physiology vol 160 pp 106ndash154 1962

[17] R H Baayen and P Hendrix ldquoTwo-layer networks non-linear separation and human learningrdquo in From Semantics toDialectometry Festschrift in honor of JohnNerbonne Tributes 32M Wieling M Kroon G Van Noord and G Bouma Eds pp13ndash22 College Publications 2017

[18] D Arnold F Tomaschek K Sering F Lopez and R H BaayenldquoWords from spontaneous conversational speech can be rec-ognized with human-like accuracy by an error-driven learningalgorithm that discriminates between meanings straight fromsmart acoustic features bypassing the phoneme as recognitionunitrdquo PLoS ONE vol 12 no 4 2017

[19] P Zwitserlood ldquoProcessing and representation of morphologi-cal complexity in native language comprehension and produc-tionrdquo in The Construction of Words G E Booij Ed pp 583ndash602 Springer 2018

[20] G Booij ldquoConstruction morphologyrdquo Language and LinguisticsCompass vol 4 no 7 pp 543ndash555 2010

[21] J P Blevins ldquoWord-based morphologyrdquo Journal of Linguisticsvol 42 no 3 pp 531ndash573 2006

[22] G Stump Inflectional Morphology A Theory of ParadigmStructure Cambridge University Press 2001

[23] R Beard ldquoOn the extent and nature of irregularity in thelexiconrdquo Lingua vol 42 no 4 pp 305ndash341 1977

[24] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1974

[25] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1991

[26] C Hockett ldquoThe origin of speechrdquo Scientific American vol 203pp 89ndash97 1960

[27] M Erelt Estonian Language M Erelt Ed Estonian AcademyPublishers Tallinn Estonia 2003

[28] P H Matthews Grammatical Theory in the United States fromBloomfield to Chomsky Cambridge University Press Cam-bridge 1993

[29] N Chomsky andMHalleTheSound Pattern of English Harperand Row NY USA 1968

[30] T Landauer and S Dumais ldquoA solution to Platorsquos problem Thelatent semantic analysis theory of acquisition induction andrepresentation of knowledgerdquo Psychological Review vol 104 no2 pp 211ndash240 1997

[31] W Weaver ldquoTranslationrdquo in Machine Translation of LanguagesFourteen Essays W N Locke and A D Booth Eds pp 15ndash23MIT Press Cambridge 1955

[32] J R Firth Selected Papers of J R Firth 1952-59 IndianaUniversity Press 1968

[33] K Lund and C Burgess ldquoProducing high-dimensional seman-tic spaces from lexical co-occurrencerdquo Behavior Research Meth-ods Instruments and Computers vol 28 no 2 pp 203ndash2081996

[34] C Shaoul and C Westbury ldquoExploring lexical co-occurrencespace using HiDExrdquo Behavior Research Methods vol 42 no 2pp 393ndash413 2010

36 Complexity

[35] T Mikolov I Sutskever K Chen G Corrado and J DeanldquoDistributed representations of words and phrases and theircompositionalityrdquo inAdvances in Neural Information ProcessingSystems pp 3111ndash3119 2013

[36] A Roelofs ldquoThe WEAVER model of word-form encoding inspeech productionrdquo Cognition vol 64 no 3 pp 249ndash284 1997

[37] W J M Levelt A Roelofs and A S Meyer ldquoA theory of lexicalaccess in speech productionrdquoBehavioral and Brain Sciences vol22 no 1 pp 1ndash38 1999

[38] M Taft ldquoInteractive-activation as a Framework for Under-standing Morphological Processingrdquo Language and CognitiveProcesses vol 9 no 3 pp 271ndash294 1994

[39] R Schreuder and R H Baayen ldquoModeling morphologicalprocessingrdquo inMorphological Aspects of Language Processing LB Feldman Ed pp 131ndash154 Lawrence ErlbaumHillsdale NewJersey USA 1995

[40] B C Love D L Medin and T M Gureckis ldquoSUSTAIN ANetwork Model of Category Learningrdquo Psychological Reviewvol 111 no 2 pp 309ndash332 2004

[41] C J Marsolek ldquoWhat antipriming reveals about primingrdquoTrends in Cognitive Sciences vol 12 no 5 pp 176ndash181 2008

[42] M Ramscar and R Port ldquoCategorization (without categories)rdquoin Handbook of Cognitive Linguistics E Dabrowska and DDivjak Eds pp 75ndash99 De Gruyter Berlin Germany 2015

[43] R A Rescorla and A R Wagner ldquoA theory of Pavlovianconditioning Variations in the effectiveness of reinforcementand nonreinforcementrdquo in Classical Conditioning II CurrentResearch and Theory A H Black and W F Prokasy Eds pp64ndash99 Appleton Century Crofts NY USA 1972

[44] B Widrow and M E Hoff ldquoAdaptive switching circuitsrdquo 1960WESCON Convention Record Part IV pp 96ndash104 1960

[45] R H Baayen R Schreuder and R Sproat ldquoMorphology inthe Mental Lexicon A Computational Model for Visual WordRecognitionrdquo in Lexicon Development for Speech and LanguageProcessing F van Eynde and D Gibbon Eds pp 267ndash291Kluwer Academic Publishers 2000

[46] M Taft ldquoA morphological-decomposition model of lexicalrepresentationrdquo Linguistics vol 26 no 4 pp 657ndash667 1988

[47] MWHarmandM S Seidenberg ldquoComputing themeanings ofwords in reading Cooperative division of labor between visualand phonological processesrdquo Psychological Review vol 111 no3 pp 662ndash720 2004

[48] P C Trimmer J M McNamara A Houston and J A RMarshall ldquoDoes natural selection favour the Rescorla-Wagnerrulerdquo Journal of Theoretical Biology vol 302 pp 39ndash52 2012

[49] R R Miller R C Barnet and N J Grahame ldquoAssessment ofthe Rescorla-Wagner modelrdquo Psychological Bulletin vol 117 no3 pp 363ndash386 1995

[50] L J Kamin ldquoPredictability surprise attention and condition-ingrdquo in Punishment and Aversive Behavior B A Campbell andRMChurch Eds pp 276ndash296Appleton-Century-CroftsNYUSA 1969

[51] R A Rescorla ldquoPavlovian conditioning Itrsquos not what you thinkit isrdquo American Psychologist vol 43 no 3 pp 151ndash160 1988

[52] M Ramscar D Yarlett M Dye K Denny and K ThorpeldquoThe Effects of Feature-Label-Order and Their Implications forSymbolic Learningrdquo Cognitive Science vol 34 no 6 pp 909ndash957 2010

[53] M Ramscar and D Yarlett ldquoLinguistic self-correction in theabsence of feedback A new approach to the logical problem oflanguage acquisitionrdquo Cognitive Science vol 31 no 6 pp 927ndash960 2007

[54] R H Baayen P Milin D F ETHurđevic P Hendrix and MMarelli ldquoAn Amorphous Model for Morphological Process-ing in Visual Comprehension Based on Naive DiscriminativeLearningrdquo Psychological Review vol 118 no 3 pp 438ndash4812011

[55] F M Del Prado Martın R Bertram T Haikio R Schreuderand R H Baayen ldquoMorphological family size in a morphologi-cally rich languageThe case of Finnish compared to Dutch andHebrewrdquo Journal of Experimental Psychology LearningMemoryand Cognition vol 30 pp 1271ndash1278 2004

[56] P Milin D F Durdevic and F M Del Prado Martın ldquoThesimultaneous effects of inflectional paradigms and classes onlexical recognition Evidence from Serbianrdquo Journal of Memoryand Language vol 60 no 1 pp 50ndash64 2009

[57] B K Bergen ldquoThe psychological reality of phonaesthemesrdquoLanguage vol 80 no 2 pp 290ndash311 2004

[58] P Milin L B Feldman M Ramscar P Hendrix and R HBaayen ldquoDiscrimination in lexical decisionrdquo PLoS ONE vol 12no 2 2017

[59] R H Baayen P Milin and M Ramscar ldquoFrequency in lexicalprocessingrdquo Aphasiology vol 30 no 11 pp 1174ndash1220 2016

[60] M Marelli and M Baroni ldquoAffixation in semantic space Mod-eling morpheme meanings with compositional distributionalsemanticsrdquo Psychological Review vol 122 no 3 pp 485ndash5152015

[61] T Sering PMilin andRH Baayen ldquoLanguage comprehensionas amultiple label classification problemrdquo StatisticaNeerlandicapp 1ndash15 2018

[62] R Kaye and RWilson Linear Algebra Oxford University Press1998

[63] S H Ivens and B L Koslin Demands for Reading LiteracyRequire New Accountability Methods Touchstone Applied Sci-ence Associates 1991

[64] T K Landauer P W Foltz and D Laham ldquoAn introduction tolatent semantic analysisrdquoDiscourse Processes vol 25 no 2-3 pp259ndash284 1998

[65] H Schmid ldquoImprovements in Part-of-Speech Tagging with anApplication to Germanrdquo in Proceedings of the ACL SIGDAT-Workshop pp 13ndash25 Dublin Ireland 1995

[66] R H Baayen R Piepenbrock and L Gulikers The CELEXLexical Database (CD-ROM) Linguistic Data ConsortiumUniversity of Pennsylvania Philadelphia PA 1995

[67] J Mitchell and M Lapata ldquoVector-based models of semanticcompositionrdquo in Proceedings of the ACL pp 236ndash244 2008

[68] A Lazaridou M Marelli R Zamparelli and M BaronildquoCompositionally derived representations of morphologicallycomplex words in distributional semanticsrdquo in Proceedings ofthe ACL (1) pp 1517ndash1526 2013

[69] R Cotterell H Schutze and J Eisner ldquoMorphological smooth-ing and extrapolation of word embeddingsrdquo in Proceedings ofthe 54th Annual Meeting of the Association for ComputationalLinguistics vol 1 pp 1651ndash1660 2016

[70] B D Zeller S Pado and J Snajder ldquoTowards semantic valida-tion of a derivational lexiconrdquo in Proceedings of the COLING2014 the 25th International Conference on Computational Lin-guistics Technical Papers pp 1728ndash1739 2014

[71] H Marchand The Categories and Types of Present-Day EnglishWord Formation A Synchronic-Diachronic Approach BeckrsquoscheVerlagsbuchhandlung Munchen Germany 1969

[72] L Bauer EnglishWord-Formation Cambridge University PressCambridge 1983

Complexity 37

[73] I PlagWord-Formation in English CambridgeUniversity PressCambridge 2003

[74] R Beard Lexeme-Morpheme Base Morphology A GeneralTheory of Inflection and Word Formation State University ofNew York Press Albany NY USA 1995

[75] K Geeraert J Newman and R H Baayen ldquoIdiom VariationExperimental Data and a Blueprint of a Computational ModelrdquoTopics in Cognitive Science vol 9 no 3 pp 653ndash669 2017

[76] C Shaoul S Bitschau N Schilling et al ldquondl2 Naive discrim-inative learning an implementation in R R packagerdquo 2015

[77] P Milin D Divjak and R H Baayen ldquoA learning perspectiveon individual differences in skilled reading Exploring andexploiting orthographic and semantic discrimination cuesrdquoJournal of Experimental Psychology Learning Memory andCognition vol 43 no 11 pp 1730ndash1751 2017

[78] M Ramscar P Hendrix C Shaoul P Milin and R H BaayenldquoNonlinear dynamics of lifelong learning the myth of cognitivedeclinerdquo Topics in Cognitive Science vol 6 pp 5ndash42 2014

[79] M Ramscar C C Sun P Hendrix and R H Baayen ldquoTheMis-measurement of Mind Life-Span Changes in Paired-Associate-Learning Scores Reflect the ldquoCostrdquo of Learning Not CognitiveDeclinerdquo Psychological Science vol 28 no 8 pp 1171ndash1179 2017

[80] G desRosiers and D Ivison ldquoPaired associate learning Form 1and form 2 of the Wechsler memory scalerdquo Archives of ClinicalNeuropsychology vol 3 no 1 pp 47ndash67 1988

[81] E Bruni N K Tran andM Baroni ldquoMultimodal distributionalsemanticsrdquo Journal of Artificial Intelligence Research vol 49 pp1ndash47 2014

[82] W N Venables and B D RipleyModern Applied Statistics withS-PLUS Springer NY USA 2002

[83] A B Warriner V Kuperman and M Brysbaert ldquoNorms ofvalence arousal and dominance for 13915 English lemmasrdquoBehavior Research Methods vol 45 no 4 pp 1191ndash1207 2013

[84] D Kastovsky ldquoProductivity in word formationrdquo Linguistics vol24 no 3 pp 585ndash600 1986

[85] R H Baayen Y Chuang and J P Blevins ldquoInflectionalmorphology with linear mappingsrdquoTheMental Lexicon vol 13no 2 pp 232ndash270 2018

[86] Burnage CELEX A Guide for Users Centre for Lexical Infor-mation Nijmegen The Netherlands 1988

[87] C McBride-Chang ldquoModels of Speech Perception and Phono-logical Processing in Readingrdquo Child Development vol 67 no4 pp 1836ndash1856 1996

[88] G C Van Orden H Kloos et al ldquoThe question of phonologyand readingrdquo inThe Science of Reading A Handbook pp 61ndash782005

[89] D Jared and K OrsquoDonnell ldquoSkilled adult readers activate themeanings of high-frequency words using phonology Evidencefrom eye trackingrdquoMemoryampCognition vol 45 no 2 pp 334ndash346 2017

[90] G S Dell ldquoA Spreading-Activation Theory of Retrieval inSentence Productionrdquo Psychological Review vol 93 no 3 pp283ndash321 1986

[91] A Marantz ldquoNo escape from morphemes in morphologicalprocessingrdquo Language and Cognitive Processes vol 28 no 7 pp905ndash916 2013

[92] M Bozic W D Marslen-Wilson E A Stamatakis M HDavis and L K Tyler ldquoDifferentiating morphology formand meaning Neural correlates of morphological complexityrdquoCognitive Neuroscience vol 19 no 9 pp 1464ndash1475 2007

[93] R F Port andA P Leary ldquoAgainst formal phonologyrdquoLanguagevol 81 no 4 pp 927ndash964 2005

[94] J P Blevins ldquoStems and paradigmsrdquo Language vol 79 no 4 pp737ndash767 2003

[95] R Forsyth and D Holmes ldquoFeature-finding for test classifica-tionrdquo Literary and Linguistic Computing vol 11 no 4 pp 163ndash174 1996

[96] H Pham and R H Baayen ldquoVietnamese compounds showan anti-frequency effect in visual lexical decisionrdquo LanguageCognition and Neuroscience vol 30 no 9 pp 1077ndash1095 2015

[97] L Cohen and S Dehaene ldquoVentral and dorsal contributions toword readingrdquo inTheCognitive Neurosciences M S GazzanigaEd pp 789ndash804 Massachusetts Institute of Technology 2009

[98] E Keuleers P Lacey K Rastle and M Brysbaert ldquoThe BritishLexicon Project Lexical decision data for 28730 monosyllabicand disyllabic English wordsrdquo Behavior Research Methods vol44 no 1 pp 287ndash304 2012

[99] R K Wagner J K Torgesen and C A Rashotte ldquoDevel-opment of reading-related phonological processing abilitiesnew evidence of bidirectional causality from a latent variablelongitudinal studyrdquoDevelopmental Psychology vol 30 no 1 pp73ndash87 1994

[100] K F EWong andH-C Chen ldquoOrthographic and phonologicalprocessing in reading Chinese text Evidence from eye fixa-tionsrdquo Language and Cognitive Processes vol 14 no 5-6 pp461ndash480 1999

[101] R L Newman D Jared and C A Haigh ldquoDoes phonologyplay a role when skilled readers read high-frequency wordsEvidence from ERPsrdquo Language and Cognitive Processes vol 27no 9 pp 1361ndash1384 2012

[102] D Jared J Ashby S J Agauas and B A Levy ldquoPhonologicalactivation of word meanings in grade 5 readersrdquo Journal ofExperimental Psychology LearningMemory andCognition vol42 no 4 pp 524ndash541 2016

[103] T Bitan A Kaftory A Meiri-Leib Z Eviatar and O PelegldquoPhonological ambiguity modulates resolution of semanticambiguity during reading An fMRI study of Hebrewrdquo Neu-ropsychology vol 31 no 7 pp 759ndash777 2017

[104] D Jared and S Bainbridge ldquoReading homophone puns Evi-dence from eye trackingrdquo Canadian Journal of ExperimentalPsychology vol 71 no 1 pp 2ndash13 2017

[105] S Amenta M Marelli and S Sulpizio ldquoFrom sound tomeaning Phonology-to-Semantics mapping in visual wordrecognitionrdquo Psychonomic Bulletin amp Review vol 24 no 3 pp887ndash893 2017

[106] M Perrone-Bertolotti J Kujala J R Vidal et al ldquoHow silent issilent reading intracerebral evidence for top-down activationof temporal voice areas during readingrdquo The Journal of Neuro-science vol 32 no 49 pp 17554ndash17562 2012

[107] B Yao P Belin andC Scheepers ldquoSilent reading of direct versusindirect speech activates voice-selective areas in the auditorycortexrdquo Journal of Cognitive Neuroscience vol 23 no 10 pp3146ndash3152 2011

[108] M Coltheart B Curtis P Atkins and M Haller ldquoMod-els of Reading Aloud Dual-Route and Parallel-Distributed-Processing Approachesrdquo Psychological Review vol 100 no 4pp 589ndash608 1993

[109] M Coltheart ldquoModeling reading The dual-route approachrdquo inThe Science of Reading A Handbook pp 6ndash23 2005

38 Complexity

[110] M Pitt K Johnson E Hume S Kiesling and W RaymondldquoThe Buckeye corpus of conversational speech Labeling con-ventions and a test of transcriber reliabilityrdquo Speech Communi-cation vol 45 no 1 pp 89ndash95 2005

[111] K Johnson ldquoMassive reduction in conversational AmericanEnglishrdquo in Proceedings of the Spontaneous Speech Data andAnalysis Proceedings of the 1st Session of the 10th InternationalSymposium pp 29ndash54 The National International Institute forJapanese Language Tokyo Japan 2004

[112] C Phillips ldquoLevels of representation in the electrophysiology ofspeech perceptionrdquo Cognitive Science vol 25 no 5 pp 711ndash7312001

[113] R L Diehl A J Lotto and L L Holt ldquoSpeech perceptionrdquoAnnual Review of Psychology vol 55 pp 149ndash179 2004

[114] D Norris and J M McQueen ldquoShortlist B a Bayesian model ofcontinuous speech recognitionrdquo Psychological Review vol 115no 2 pp 357ndash395 2008

[115] S Hawkins ldquoRoles and representations of systematic finephonetic detail in speech understandingrdquo Journal of Phoneticsvol 31 no 3-4 pp 373ndash405 2003

[116] CCucchiarini andH Strik ldquoAutomatic Phonetic TranscriptionAn overviewrdquo in Proceedings of the 15th ICPhS pp 347ndash350Barcelona Spain 2003

[117] K JohnsonTheAuditoryPerceptual Basis for Speech Segmenta-tion Ohio State University Working Papers in Linguistics 1997

[118] H Fletcher ldquoAuditory patternsrdquo Reviews of Modern Physics vol12 no 1 pp 47ndash65 1940

[119] D Arnold ldquoAcousticndlcoder Coding sound files for use withndl R package version 101rdquo 2017

[120] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2016

[121] J Pickett and I Pollack ldquoIntelligibility of excerpts from fluentspeech Effects of rate of utterance and duration of excerptrdquoLanguage and Speech vol 6 pp 151ndash164 1963

[122] L Shockey ldquoPerception of reduced forms by non-native speak-ers of Englishrdquo in Sound Patterns of Spontaneous Speech DDuez Ed pp 97ndash100 ESCA Aix 1998

[123] M Ernestus H Baayen and R Schreuder ldquoThe recognition ofreduced word formsrdquo Brain and Language vol 81 no 1-3 pp162ndash173 2002

[124] R H Baayen C Shaoul J Willits and M Ramscar ldquoCom-prehension without segmentation a proof of concept withnaive discriminative learningrdquo Language Cognition and Neu-roscience vol 31 no 1 pp 106ndash128 2016

[125] C P Browman and L Goldstein ldquoArticulatory phonology Anoverviewrdquo Phonetica vol 49 no 3-4 pp 155ndash180 1992

[126] J A Tourville and F H Guenther ldquoThe DIVA model A neuraltheory of speech acquisition and productionrdquo Language andCognitive Processes vol 26 no 7 pp 952ndash981 2011

[127] G Csardi and T Nepusz ldquoThe igraph software package forcomplex network researchrdquo InterJournal Complex Systems vol1695 no 5 pp 1ndash9 2006

[128] M Ernestus and R H Baayen ldquoPredicting the unpredictableInterpreting neutralized segments in Dutchrdquo Language vol 79no 1 pp 5ndash38 2003

[129] R Weingarten G Nottbusch and U Will ldquoMorphemes sylla-bles and graphemes in written word productionrdquo inMultidisci-plinary Approaches to Speech Production T Pechmann and CHabel Eds pp 529ndash572 Mouton de Gruyter Berlin Germany2004

[130] R Bertram F E Toslashnnessen S Stromqvist J Hyona andP Niemi ldquoCascaded processing in written compound wordproductionrdquo Frontiers in Human Neuroscience vol 9 pp 1ndash102015

[131] T Cho ldquoEffects of morpheme boundaries on intergesturaltiming Evidence from Koreanrdquo Phonetica vol 58 no 3 pp129ndash162 2001

[132] S N Wood Generalized Additive Models Chapman ampHallCRC NY USA 2017

[133] M Seidenberg ldquoSublexical structures in visual word recogni-tion Access units or orthographic redundancyrdquo in Attentionand Performance XII M Coltheart Ed pp 245ndash264 LawrenceErlbaum Associates Hove 1987

[134] J M McQueen ldquoSegmentation of continuous speech usingphonotacticsrdquo Journal of Memory and Language vol 39 no 1pp 21ndash46 1998

[135] J B Hay Causes and Consequences of Word Structure Rout-ledge NY USA London UK 2003

[136] J B Hay ldquoFrom speech perception to morphology Affix-ordering revisitedrdquo Language vol 78 pp 527ndash555 2002

[137] J B Hay and R H Baayen ldquoPhonotactics parsing and produc-tivityrdquo Italian Journal of Linguistics vol 15 no 1 pp 99ndash1302003

[138] J Hay J Pierrehumbert and M Beckman ldquoSpeech perceptionwell-formedness and the statistics of the lexiconrdquo Papers inlaboratory phonology VI pp 58ndash74 2004

[139] M Ferro C Marzi and V Pirrelli ldquoA self-organizing modelof word storage and processing implications for morphologylearningrdquo Lingue e Linguaggio vol 10 no 2 pp 209ndash226 2011

[140] F ChersiM Ferro G Pezzulo andV Pirrelli ldquoTopological self-organization and prediction learning support both action andlexical chains in the brainrdquo Topics in Cognitive Science vol 6no 3 pp 476ndash491 2014

[141] V Pirrelli M Ferro and C Marzi ldquoComputational complexityof abstractive morphologyrdquo in Understanding and MeasuringMorphological Complexity M Bearman D Brown and G GCorbett Eds pp 141ndash166 Oxford University Press 2015

[142] C Marzi M Ferro and V Pirrelli ldquoIs inflectional irregularitydysfunctional to human processingrdquo in Abstract Booklet TheMental Lexicon 2018 V Kuperman Ed University of AlbertaEdmonton 2018

[143] E Keuleers M Stevens P Mandera and M Brysbaert ldquoWordknowledge in the crowd Measuring vocabulary size and wordprevalence in a massive online experimentrdquo The QuarterlyJournal of Experimental Psychology vol 68 no 8 pp 1665ndash16922015

[144] M Ernestus Voice Assimilation And Segment Reduction inCasual Dutch A Corpus-Based Study of the Phonology-PhoneticsInterface LOT Utrecht the Netherlands 2000

[145] R Kemps M Ernestus R Schreuder and H Baayen ldquoProcess-ing reducedword formsThe suffix restoration effectrdquoBrain andLanguage vol 90 no 1-3 pp 117ndash127 2004

[146] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) vol 1 pp 886ndash893 June 2005

[147] G Hickok ldquoThe architecture of speech production and the roleof the phoneme in speech processingrdquo Language Cognition andNeuroscience vol 29 no 1 pp 2ndash20 2014

[148] B V Tucker M Sims and R H Baayen Opposing Forceson Acoustic Duration Manuscript University of Alberta andUniversity of Tubingen 2018

Complexity 39

[149] I Plag J Homann and G Kunter ldquoHomophony and mor-phology The acoustics of word-final S in Englishrdquo Journal ofLinguistics vol 53 no 1 pp 181ndash216 2017

[150] F Tomaschek I Plag M Ernestus and R H BaayenModeling the Duration of Word-Final s in English withNaive Discriminative Learning Manuscript University ofSiegenTubingenNijmegen 2018

[151] R E Kalman ldquoA new approach to linear filtering and predictionproblemsrdquo Journal of Fluids Engineering vol 82 no 1 pp 35ndash451960

[152] B Russell ldquoOn denotingrdquo Mind vol 14 no 56 pp 479ndash4931905

[153] N Hornstein Logical Form FromGB toMinimalism Blackwell1995

[154] R Jackendoff Semantic StructuresMITPress Cambridge 1990[155] J J McCarthy ldquoA prosodic theory of non-concatenative mor-

phologyrdquo Linguistic Inquiry vol 12 pp 373ndash418 1981[156] A Ussishkin ldquoA Fixed Prosodic Theory of Nonconcatenative

Templaticmorphologyrdquo Natural Language amp Linguistic Theoryvol 23 no 1 pp 169ndash218 2005

[157] A Ussishkin ldquoAffix-favored contrast inequity and psycholin-guistic grounding for non-concatenative morphologyrdquo Mor-phology vol 16 no 1 pp 107ndash125 2006

[158] RW Young andWMorganTheNavajo Language A Grammarand Colloquial Dictionary University of New Mexico Press1980

[159] S C Levinson and A Majid ldquoThe island of time Yelı Dnye thelanguage of Rossel islandrdquo Frontiers in psychology vol 4 2013

[160] S C Levinson ldquoThe language of space in Yelı Dnyerdquo inGrammars of Space Explorations in Cognitive Diversity pp 157ndash203 Cambridge University Press 2006

[161] E Shafaei-Bajestan and R H Baayen ldquoWide Learning forAuditory Comprehensionrdquo in Proceedings of the Interspeech2018 2018

[162] J Hay and K Drager ldquoStuffed toys and speech perceptionrdquoLinguistics vol 48 no 4 pp 865ndash892 2010

[163] V Henri and C Henri ldquoOn our earliest recollections ofchildhoodrdquo Psychological Review vol 2 pp 215-216 1895

[164] S Freud ldquoChildhood and concealing memoriesrdquo in The BasicWritings of Sigmund Freud The Modern Library NY USA19051953

[165] P J Bauer and M Larkina ldquoThe onset of childhood amnesiain childhood A prospective investigation of the course anddeterminants of forgetting of early-life eventsrdquoMemory vol 22no 8 pp 907ndash924 2014

[166] M S Seidenberg and L M Gonnerman ldquoExplaining deriva-tional morphology as the convergence of codesrdquo Trends inCognitive Sciences vol 4 no 9 pp 353ndash361 2000

[167] M Coltheart K Rastle C Perry R Langdon and J ZieglerldquoDRC a dual route cascaded model of visual word recognitionand reading aloudrdquoPsychological Review vol 108 no 1 pp 204ndash256 2001

[168] M V Butz and E F Kutter How the Mind Comes into BeingIntroducing Cognitive Science from a Functional and Computa-tional Perspective Oxford University Press 2016

[169] M W Harm and M S Seidenberg ldquoPhonology readingacquisition and dyslexia Insights from connectionist modelsrdquoPsychological Review vol 106 no 3 pp 491ndash528 1999

[170] L M Gonnerman M S Seidenberg and E S AndersenldquoGraded semantic and phonological similarity effects in prim-ing Evidence for a distributed connectionist approach to

morphologyrdquo Journal of Experimental Psychology General vol136 no 2 pp 323ndash345 2007

[171] H Zhao O Gallo I Frosio and J Kautz ldquoLoss Functions forImage Restoration With Neural Networksrdquo IEEE Transactionson Computational Imaging vol 3 no 1 pp 47ndash57 2017

[172] C Clarke and P Luce ldquoPerceptual adaptation to speakercharacteristics Vot boundaries in stop voicing categorizationrdquoin Proceedings of the ISCA Workshop on Plasticity in SpeechPerception 2005

[173] DNorris JMMcQueen andACutler ldquoPerceptual learning inspeechrdquo Cognitive Psychology vol 47 no 2 pp 204ndash238 2003

[174] A Schweitzer and N Lewandowski ldquoSocial factors in conver-gence of f1 and f2 in spontaneous speechrdquo in Proceedings ofthe 10th International Seminar on Speech Production CologneGermany 2014

[175] M J Pickering and S Garrod ldquoToward a mechanistic psychol-ogy of dialoguerdquo Behavioral and Brain Sciences vol 27 no 2 pp169ndash190 2004

[176] C M Clarke and M F Garrett ldquoRapid adaptation to foreign-accented Englishrdquo The Journal of the Acoustical Society ofAmerica vol 116 no 6 pp 3647ndash3658 2004

[177] W J Levelt ldquoDie konnektionistische moderdquo Sprache UndKognition vol 10 no 2 pp 61ndash72 1991

[178] A Kostic ldquoInformational load constraints on processinginflected morphologyrdquo in Morphological Aspects of LanguageProcessing L B Feldman Ed Lawrence Erlbaum Inc Publish-ers New Jersey USA 1995

[179] J P BlevinsWord andParadigmMorphology OxfordUniversityPress 2016

[180] E Forstemann Uber Deutsche volksetymologie Zeitschrift furvergleichende Sprachforschung auf dem Gebiete des DeutschenGriechischen und Lateinischen 1852

[181] K Nault Morphological Therapy Protocal [Ph D thesis] Uni-versity of Alberta Edmonton 2010

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 13: The Discriminative Lexicon: A Unified Computational Model ...downloads.hindawi.com/journals/complexity/2019/4895891.pdf · as developed within computational linguistics and com-puter

Complexity 13

26 27

28

29

29

3 3

31

31

32

32

33

33

34

34

35

35

3

3

35

4

45 36

38

40

42

44

Activ

atio

n D

iver

sity

of th

e Der

ivat

iona

l Lex

ome

8 10 12 14 16 186Word Length

36

38

40

42

44

Activ

atio

n D

iver

sity

of th

e Der

ivat

iona

l Lex

ome

6 8 10 12 144Word Length

Figure 4 Interaction of word length by activation diversity in GAMs fitted to plausibility ratings for complex words (left) and to semantictransparency ratings (right)

Table 3 GAM fitted to plausibility ratings for derivational neologisms with the derivational lexomes able again agent ist less and notdata from Marelli and Baroni [60]

A parametric coefficients Estimate Std Error t-value p-valueIntercept -18072 27560 -06557 05127Word Length 05901 02113 27931 00057Activation Diversity 11221 06747 16632 00976Arousal 03847 01521 25295 00121Word Length times Activation Diversity -01318 00500 -26369 00089B smooth terms edf Refdf F-value p-valueRandom Intercepts Affix 33610 40000 556723 lt 00001

of the following matrix S

S = one two threeonetwothree

( 100201031001

040110 ) (13)

We are interested in a transformation matrix 119865 such that

CF = S (14)

The transformation matrix is straightforward to obtain LetC1015840 denote the Moore-Penrose generalized inverse of Cavailable in R as the ginv function in theMASS package [82]Then

F = C1015840S (15)

For the present example

F =one two three

wVwVnVntutuTrTriri

((((((((((((

033033033010010003003003

010010010050050003003003

013013013005005033033033

))))))))))))

(16)

and for this simple example CF is exactly equal to SIn the remainder of this section we investigate how well

this very simple end-to-end model performs for visual wordrecognition as well as auditory word recognition For visualword recognition we use the semantic matrix S developed inSection 3 but we consider two different cue matrices C oneusing letter trigrams (following [58]) and one using phonetrigramsA comparison of the performance of the twomodels

14 Complexity

Table 4 Gaussian location-scale additive model fitted to the semantic transparency ratings for derived words te tensor product smooth ands thin plate regression spline

A parametric coefficients Estimate Std Error t-value p-valueintercept [location] 55016 02564 214537 lt 00001intercept [scale] -04135 00401 -103060 lt 00001B smooth terms edf Refdf F-value p-valuete(activation diversity word length) [location] 53879 62897 237798 00008s(derivational lexome) [location] 143180 150000 3806254 lt 00001s(activation diversity) [scale] 21282 25993 284336 lt 00001s(word length) [scale] 17722 21403 1327311 lt 00001

sheds light on the role of wordsrsquo ldquosound imagerdquo on wordrecognition in reading (cf [87ndash89] for phonological effectsin visual word Recognition) For auditory word recognitionwe make use of the acoustic features developed in Arnold etal [18]

Although the features that we selected as cues are basedon domain knowledge they do not have an ontological statusin our framework in contrast to units such as phonemes andmorphemes in standard linguistic theories In these theoriesphonemes and morphemes are Russelian atomic units offormal phonological and syntactic calculi and considerableresearch has been directed towards showing that these unitsare psychologically real For instance speech errors involvingmorphemes have been interpreted as clear evidence for theexistence in the mind of morphemes [90] Research hasbeen directed at finding behavioral and neural correlates ofphonemes and morphemes [6 91 92] ignoring fundamentalcriticisms of both the phoneme and the morpheme astheoretical constructs [25 93 94]

The features that we use for representing aspects of formare heuristic features that have been selected or developedprimarily because they work well as discriminators We willgladly exchange the present features for other features if theseother features can be shown to afford higher discriminativeaccuracy Nevertheless the features that we use are groundedin domain knowledge For instance the letter trigrams usedfor modeling reading (see eg[58]) are motivated by thefinding in stylometry that letter trigrams are outstandingfeatures for discriminating between authorial hands [95] (inwhat follows we work with letter triplets and triphoneswhich are basically contextually enriched letter and phoneunits For languages with strong phonotactic restrictionssuch that the syllable inventory is quite small (eg Viet-namese) digraphs work appear to work better than trigraphs[96] Baayen et al [85] show that working with four-gramsmay enhance performance and current work in progress onmany other languages shows that for highly inflecting lan-guages with long words 4-grams may outperform 3-gramsFor computational simplicity we have not experimented withmixtures of units of different length nor with algorithmswithwhich such units might be determined) Letter n-grams arealso posited by Cohen and Dehaene [97] for the visual wordform system at the higher endof a hierarchy of neurons tunedto increasingly large visual features of words

Triphones the units that we use for representing thelsquoacoustic imagersquo or lsquoauditory verbal imageryrsquo of canonicalword forms have the same discriminative potential as lettertriplets but have as additional advantage that they do betterjustice compared to phonemes to phonetic contextual inter-dependencies such as plosives being differentiated primarilyby formant transitions in adjacent vowels The acousticfeatures that we use for modeling auditory comprehensionto be introduced in more detail below are motivated in partby the sensitivity of specific areas on the cochlea to differentfrequencies in the speech signal

Evaluation of model predictions proceeds by comparingthe predicted semantic vector s obtained by multiplying anobserved cue vector c with the transformation matrix F (s =cF) with the corresponding target row vector s of S A word 119894is counted as correctly recognized when s119894 is most stronglycorrelated with the target semantic vector s119894 of all targetvectors s119895 across all words 119895

Inflected words do not have their own semantic vectors inS We therefore created semantic vectors for inflected wordsby adding the semantic vectors of stem and affix and addedthese as additional row vectors to S before calculating thetransformation matrix F

We note here that there is only one transformationmatrix ie one discrimination network that covers allaffixes inflectional and derivational as well as derived andmonolexomic (simple) words This approach contrasts withthat of Marelli and Baroni [60] who pair every affix with itsown transformation matrix

42 Visual Comprehension For visual comprehension wefirst consider a model straightforwardly mapping form vec-tors onto semantic vectors We then expand the model withan indirect route first mapping form vectors onto the vectorsfor the acoustic image (derived from wordsrsquo triphone repre-sentations) and thenmapping the acoustic image vectors ontothe semantic vectors

The dataset on which we evaluated our models com-prised 3987 monolexomic English words 6595 inflectedvariants of monolexomic words and 898 derived words withmonolexomic base words to a total of 11480 words Thesecounts follow from the simultaneous constraints of (i) a wordappearing with sufficient frequency in TASA (ii) the wordbeing available in the British Lexicon Project (BLP [98]) (iii)

Complexity 15

the word being available in the CELEX database and theword being of the abovementioned morphological type Inthe present study we did not include inflected variants ofderived words nor did we include compounds

421 The Direct Route Straight from Orthography to Seman-tics To model single word reading as gauged by the visuallexical decision task we used letter trigrams as cues In ourdataset there were a total of 3465 unique trigrams resultingin a 11480 times 3465 orthographic cue matrix C The semanticvectors of the monolexomic and derived words were takenfrom the semantic weight matrix described in Section 3 Forinflected words semantic vectors were obtained by summa-tion of the semantic vectors of base words and inflectionalfunctions For the resulting semantic matrix S we retainedthe 5030 column vectors with the highest variances settingthe cutoff value for the minimal variance to 034times10minus7 FromS and C we derived the transformation matrix F which weused to obtain estimates s of the semantic vectors s in S

For 59 of the words s had the highest correlation withthe targeted semantic vector s (for the correctly predictedwords the mean correlation was 083 and the median was086 With respect to the incorrectly predicted words themean and median correlation were both 048) The accuracyobtained with naive discriminative learning using orthog-onal lexomes as outcomes instead of semantic vectors was27Thus as expected performance of linear discriminationlearning (ldl) is substantially better than that of naivediscriminative learning (ndl)

To assess whether this model is productive in the sensethat it canmake sense of novel complex words we considereda separate set of inflected and derived words which were notincluded in the original dataset For an unseen complexwordboth base word and the inflectional or derivational functionappeared in the training set but not the complex word itselfThe network F therefore was presented with a novel formvector which it mapped onto a novel vector in the semanticspace Recognition was successful if this novel vector wasmore strongly correlated with the semantic vector obtainedby summing the semantic vectors of base and inflectional orderivational function than with any other semantic vector

Of 553 unseen inflected words 43 were recognized suc-cessfully The semantic vectors predicted from their trigramsby the network F were overall well correlated in the meanwith the targeted semantic vectors of the novel inflectedwords (obtained by summing the semantic vectors of theirbase and inflectional function) 119903 = 067 119901 lt 00001The predicted semantic vectors also correlated well with thesemantic vectors of the inflectional functions (119903 = 061 119901 lt00001) For the base words the mean correlation dropped to119903 = 028 (119901 lt 00001)

For unseen derivedwords (514 in total) we also calculatedthe predicted semantic vectors from their trigram vectorsThe resulting semantic vectors s had moderate positivecorrelations with the semantic vectors of their base words(119903 = 040 119901 lt 00001) but negative correlations withthe semantic vectors of their derivational functions (119903 =minus013 119901 lt 00001) They did not correlate with the semantic

vectors obtained by summation of the semantic vectors ofbase and affix (119903 = 001 119901 = 056)

The reduced correlations for the derived words as com-pared to those for the inflected words likely reflects tosome extent that the model was trained on many moreinflected words (in all 6595) than derived words (898)whereas the number of different inflectional functions (7)was much reduced compared to the number of derivationalfunctions (24) However the negative correlations of thederivational semantic vectors with the semantic vectors oftheir derivational functions fit well with the observation inSection 323 that derivational functions are antiprototypesthat enter into negative correlations with the semantic vectorsof the corresponding derived words We suspect that theabsence of a correlation of the predicted vectors with thesemantic vectors obtained by integrating over the vectorsof base and derivational function is due to the semanticidiosyncracies that are typical for derivedwords For instanceausterity can denote harsh discipline or simplicity or apolicy of deficit cutting or harshness to the taste Anda worker is not just someone who happens to work butsomeone earning wages or a nonreproductive bee or waspor a thread in a computer program Thus even within theusages of one and the same derived word there is a lotof semantic heterogeneity that stands in stark contrast tothe straightforward and uniform interpretation of inflectedwords

422 The Indirect Route from Orthography via Phonology toSemantics Although reading starts with orthographic inputit has been shown that phonology actually plays a role duringthe reading process as well For instance developmentalstudies indicate that childrenrsquos reading development can bepredicted by their phonological abilities [87 99] Evidence ofthe influence of phonology on adultsrsquo reading has also beenreported [89 100ndash105] As pointed out by Perrone-Bertolottiet al [106] silent reading often involves an imagery speechcomponent we hear our own ldquoinner voicerdquo while readingSince written words produce a vivid auditory experiencealmost effortlessly they argue that auditory verbal imageryshould be part of any neurocognitive model of readingInterestingly in a study using intracranial EEG recordingswith four epileptic neurosurgical patients they observed thatsilent reading elicits auditory processing in the absence ofany auditory stimulation suggesting that auditory images arespontaneously evoked during reading (see also [107])

To explore the role of auditory images in silent readingwe operationalized sound images bymeans of phone trigrams(in our model phone trigrams are independently motivatedas intermediate targets of speech production see Section 5for further discussion) We therefore ran the model againwith phone trigrams instead of letter triplets The semanticmatrix S was exactly the same as above However a newtransformation matrix F was obtained for mapping a 11480times 5929 cue matrix C of phone trigrams onto S To our sur-prise the triphone model outperformed the trigram modelsubstantially Overall accuracy rose to 78 an improvementof almost 20

16 Complexity

In order to integrate this finding into our model weimplemented an additional network K that maps ortho-graphic vectors (coding the presence or absence of trigrams)onto phonological vectors (indicating the presence or absenceof triphones) The result is a dual route model for (silent)reading with a first route utilizing a network mappingstraight from orthography to semantics and a secondindirect route utilizing two networks one mapping fromtrigrams to triphones and the other subsequently mappingfrom triphones to semantics

The network K which maps trigram vectors onto tri-phone vectors provides good support for the relevant tri-phones but does not provide guidance as to their orderingUsing a graph-based algorithm detailed in the Appendix (seealso Section 52) we found that for 92 of the words thecorrect sequence of triphones jointly spanning the wordrsquossequence of letters received maximal support

We then constructed amatrix Twith the triphone vectorspredicted from the trigrams by networkK and defined a newnetwork H mapping these predicted triphone vectors ontothe semantic vectors SThemean correlation for the correctlyrecognized words was 090 and the median correlation was094 For words that were not recognized correctly the meanand median correlation were 045 and 043 respectively Wenow have for each word two estimated semantic vectors avector s1 obtained with network F of the first route and avector s2 obtained with network H from the auditory targetsthat themselves were predicted by the orthographic cuesThemean of the correlations of these pairs of vectors was 119903 = 073(119901 lt 00001)

To assess to what extent the networks that we definedare informative for actual visual word recognition we madeuse of the reaction times in the visual lexical decision taskavailable in the BLPAll 11480words in the current dataset arealso included in BLPWe derived threemeasures from each ofthe two networks

The first measure is the sum of the L1-norms of s1 ands2 to which we will refer as a wordrsquos total activation diversityThe total activation diversity is an estimate of the support fora wordrsquos lexicality as provided by the two routes The secondmeasure is the correlation of s1 and s2 to which we will referas a wordrsquos route congruency The third measure is the L1-norm of a lexomersquos column vector in S to which we referas its prior This last measure has previously been observedto be a strong predictor for lexical decision latencies [59] Awordrsquos prior is an estimate of a wordrsquos entrenchment and prioravailability

We fitted a generalized additive model to the inverse-transformed response latencies in the BLP with these threemeasures as key covariates of interest including as controlpredictors word length andword type (derived inflected andmonolexomic with derived as reference level) As shown inTable 5 inflected words were responded to more slowly thanderived words and the same holds for monolexomic wordsalbeit to a lesser extent Response latencies also increasedwith length Total activation diversity revealed a U-shapedeffect (Figure 5 left panel) For all but the lowest totalactivation diversities we find that response times increase as

total activation diversity increases This result is consistentwith previous findings for auditory word recognition [18]As expected given the results of Baayen et al [59] the priorwas a strong predictor with larger priors affording shorterresponse times There was a small effect of route congruencywhich emerged in a nonlinear interaction with the priorsuch that for large priors a greater route congruency affordedfurther reduction in response time (Figure 5 right panel)Apparently when the two routes converge on the samesemantic vector uncertainty is reduced and a faster responsecan be initiated

For comparison the dual-route model was implementedwith ndl as well Three measures were derived from thenetworks and used to predict the same response latenciesThese measures include activation activation diversity andprior all of which have been reported to be reliable predictorsfor RT in visual lexical decision [54 58 59] However sincehere we are dealing with two routes the three measures canbe independently derived from both routes We thereforesummed up the measures of the two routes obtaining threemeasures total activation total activation diversity and totalprior With word type and word length as control factorsTable 6 shows that thesemeasures participated in a three-wayinteraction presented in Figure 6 Total activation showed aU-shaped effect on RT that is increasingly attenuated as totalactivation diversity is increased (left panel) Total activationalso interacted with the total prior (right panel) For mediumrange values of total activation RTs increased with totalactivation and as expected decreased with the total priorThecenter panel shows that RTs decrease with total prior formostof the range of total activation diversity and that the effect oftotal activation diversity changes sign going from low to highvalues of the total prior

Model comparison revealed that the generalized additivemodel with ldl measures (Table 5) provides a substantiallyimproved fit to the data compared to the GAM using ndlmeasures (Table 6) with a difference of no less than 65101AIC units At the same time the GAM based on ldl is lesscomplex

Given the substantially better predictability of ldl mea-sures on human behavioral data one remaining question iswhether it is really the case that two routes are involved insilent reading After all the superior accuracy of the secondroute might be an artefact of a simple machine learningtechnique performing better for triphones than for trigramsThis question can be addressed by examining whether thefit of the GAM summarized in Table 5 improves or worsensdepending on whether the activation diversity of the first orthe second route is taken out of commission

When the GAM is provided access to just the activationdiversity of s2 the AIC increased by 10059 units Howeverwhen themodel is based only on the activation diversity of s1themodel fit increased by no less than 14790 AIC units Fromthis we conclude that at least for the visual lexical decisionlatencies in the BLP the second route first mapping trigramsto triphones and then mapping triphones onto semanticvectors plays the more important role

The superiority of the second route may in part bedue to the number of triphone features being larger

Complexity 17

minus0295

minus00175

026

part

ial e

ffect

1 2 3 4 50Total activation diversity

000

005

010

015

Part

ial e

ffect

05

10

15

20

Prio

r

02 04 06 08 1000Selfminuscorrelation

Figure 5 The partial effects of total activation diversity (left) and the interaction of route congruency and prior (right) on RT in the BritishLexicon Project

Table 5 Summary of a generalized additive model fitted to response latencies in visual lexical decision using measures based on ldl s thinplate regression spline smooth te tensor product smooth

A parametric coefficients Estimate Std Error t-value p-valueintercept -17774 00087 -205414 lt0001word typeInflected 01110 00059 18742 lt0001word typeMonomorphemic 00233 00054 4322 lt0001word length 00117 00011 10981 lt0001B smooth terms edf Refdf F p-values(total activation diversity) 6002 7222 236 lt0001te(route congruency prior) 14673 17850 2137 lt0001

02 06 10 14

NDL total activation

minus0171

0031

0233

part

ial e

ffect

minus4 minus2 0 2 4

10

15

20

25

30

35

40

NDL total prior

ND

L to

tal a

ctiv

atio

n di

vers

ity

minus0416

01915

0799

part

ial e

ffect

02 06 10 14

minus4

minus2

02

4

NDL total activation

ND

L to

tal p

rior

minus0158

01635

0485

part

ial e

ffect

10

15

20

25

30

35

40

ND

L to

tal a

ctiv

atio

n di

vers

ity

Figure 6The interaction of total activation and total activation diversity (left) of total prior by total activation diversity (center) and of totalactivation by total prior (right) on RT in the British Lexicon Project based on ndl

18 Complexity

Table 6 Summary of a generalized additive model fitted to response latencies in visual lexical decision using measures from ndl te tensorproduct smooth

A parametric coefficients Estimate Std Error t-value p-valueintercept -16934 00086 -195933 lt0001word typeInflected 00286 00052 5486 lt0001word typeMonomorphemic 00042 00055 0770 = 4word length 00066 00011 5893 lt0001B smooth terms edf Refdf F p-valuete(activation activation diversity prior) 4134 4972 1084 lt0001

than the number of trigram features (3465 trigrams ver-sus 5929 triphones) More features which mathematicallyamount to more predictors enable more precise map-pings Furthermore the heavy use made in English ofletter sequences such as ough (with 10 different pro-nunciations httpswwwdictionarycomesough)reduces semantic discriminability compared to the corre-sponding spoken forms It is noteworthy however that thebenefits of the triphone-to-semantics mapping are possibleonly thanks to the high accuracy with which orthographictrigram vectors are mapped onto phonological triphonevectors (92)

It is unlikely that the lsquophonological routersquo is always dom-inant in silent reading Especially in fast lsquodiagonalrsquo readingthe lsquodirect routersquomay bemore dominantThere is remarkablealthough for the present authors unexpected convergencewith the dual route model for reading aloud of Coltheartet al [108] and Coltheart [109] However while the text-to-phonology route of their model has as primary function toexplain why nonwords can be pronounced our results showthat both routes can actually be active when silently readingreal words A fundamental difference is however that in ourmodel wordsrsquo semantics play a central role

43 Auditory Comprehension For the modeling of readingwe made use of letter trigrams as cues These cues abstractaway from the actual visual patterns that fall on the retinapatterns that are already transformed at the retina beforebeing sent to the visual cortex Our hypothesis is that lettertrigrams represent those high-level cells or cell assemblies inthe visual system that are critical for reading andwe thereforeleave the modeling possibly with deep learning networksof how patterns on the retina are transformed into lettertrigrams for further research

As a consequence of the high level of abstraction of thetrigrams a word form is represented by a unique vectorspecifying which of a fixed set of letter trigrams is present inthe word Although one might consider modeling auditorycomprehension with phone triplets (triphones) replacing theletter trigrams of visual comprehension such an approachwould not do justice to the enormous variability of actualspeech Whereas the modeling of reading printed wordscan depart from the assumption that the pixels of a wordrsquosletters on a computer screen are in a fixed configurationindependently of where the word is shown on the screenthe speech signal of the same word type varies from token

to token as illustrated in Figure 7 for the English wordcrisis A survey of the Buckeye corpus [110] of spontaneousconversations recorded at Columbus Ohio [111] indicatesthat around 5 of the words are spoken with one syllablemissing and that a little over 20 of words have at least onephone missing

It is widely believed that the phoneme as an abstractunit of sound is essential for coming to grips with thehuge variability that characterizes the speech signal [112ndash114]However the phoneme as theoretical linguistic constructis deeply problematic [93] and for many spoken formscanonical phonemes do not do justice to the phonetics ofthe actual sounds [115] Furthermore if words are defined assequences of phones the problem arises what representationsto posit for words with two ormore reduced variants Addingentries for reduced forms to the lexicon turns out not toafford better overal recognition [116] Although exemplarmodels have been put forward to overcome this problem[117] we take a different approach here and following Arnoldet al [18] lay out a discriminative approach to auditorycomprehension

The cues that we make use of to represent the acousticsignal are the Frequency Band Summary Features (FBSFs)introduced by Arnold et al [18] as input cues FBSFs summa-rize the information present in the spectrogram of a speechsignal The algorithm that derives FBSFs first chunks theinput at the minima of the Hilbert amplitude envelope of thesignalrsquos oscillogram (see the upper panel of Figure 8) Foreach chunk the algorithm distinguishes 21 frequency bandsin the MEL scaled spectrum and intensities are discretizedinto 5 levels (lower panel in Figure 8) for small intervalsof time For each chunk and for each frequency band inthese chunks a discrete feature is derived that specifies chunknumber frequency band number and a description of thetemporal variation in the band bringing together minimummaximum median initial and final intensity values The21 frequency bands are inspired by the 21 receptive areason the cochlear membrane that are sensitive to differentranges of frequencies [118] Thus a given FBSF is a proxyfor cell assemblies in the auditory cortex that respond to aparticular pattern of changes over time in spectral intensityThe AcousticNDLCodeR package [119] for R [120] wasemployed to extract the FBSFs from the audio files

We tested ldl on 20 hours of speech sampled from theaudio files of the ucla library broadcast newsscapedata a vast repository of multimodal TV news broadcasts

Complexity 19

0 5000 10000 15000 20000 25000 30000 35000 40000minus6000

minus4000

minus2000

0

2000

4000

6000

8000

0 5000 10000 15000 20000 25000 30000 35000 40000minus6000

minus4000

minus2000

0

2000

4000

6000

8000

Figure 7Oscillogram for twodifferent realizations of theword crisiswith different degrees of reduction in theNewsScape archive [kraız](left)and [khraısıs](right)

00 01 02 03 04 05 06time

minus020minus015minus010minus005

000005010015

ampl

itude

1

21

frequ

ency

ban

d

Figure 8 Oscillogram of the speech signal for a realization of the word economic with Hilbert envelope (in orange) outlining the signal isshown on the top panel The lower panel depicts the discretized MEL scaled spectrum of the signalThe vertical bars are the boundary pointsthat partition the signal into 4 chunks For one chunk horizontal lines (in red) highlight one of the frequency bands for which the FBSFsprovide summaries of the variation over time in that frequency band The FBSF for the highlighted band is ldquoband11-start3-median3-min2-max5-end3-part2rdquo

provided to us by the Distributed Little Red Hen Lab Theaudio files of this resource were automatically classified asclean for relatively clean parts where there is speech withoutbackground noise or music and noisy for speech snippetswhere background noise or music is present Here we reportresults for 20 hours of clean speech to a total of 131673word tokens (representing 4779word types) with in all 40639distinct FBSFs

The FBSFs for the word tokens are brought together ina matrix C119886 with dimensions 131673 audio tokens times 40639FBSFs The targeted semantic vectors are taken from the Smatrix which is expanded to a matrix with 131673 rows onefor each audio token and 4609 columns the dimension of

the semantic vectors Although the transformation matrix Fcould be obtained by calculating C1015840S the calculation of C1015840is numerically expensive To reduce computational costs wecalculated F as follows (the transpose of a square matrix Xdenoted by X119879 is obtained by replacing the upper triangleof the matrix by the lower triangle and vice versa Thus( 3 87 2 )119879 = ( 3 78 2 ) For a nonsquare matrix the transpose isobtained by switching rows and columnsThus a 4times2 matrixbecomes a 2 times 4 matrix when transposed)

CF = S

C119879CF = C119879S

20 Complexity

(C119879C)minus1 (C119879C) F = (C119879C)minus1 C119879SF = (C119879C)minus1 C119879S

(17)

In this way matrix inversion is required for a much smallermatrixC119879C which is a square matrix of size 40639 times 40639

To evaluate model performance we compared the esti-mated semantic vectors with the targeted semantic vectorsusing the Pearson correlation We therefore calculated the131673 times 131673 correlation matrix for all possible pairs ofestimated and targeted semantic vectors Precisionwas calcu-lated by comparing predicted vectors with the gold standardprovided by the targeted vectors Recognition was defined tobe successful if the correlation of the predicted vector withthe targeted gold vector was the highest of all the pairwisecorrelations of this predicted vector with any of the other goldsemantic vectors Precision defined as the proportion of cor-rect recognitions divided by the total number of audio tokenswas at 3361 For the correctly identified words the meancorrelation was 072 and for the incorrectly identified wordsit was 055 To place this in perspective a naive discrimi-nation model with discrete lexomes as output performed at12 and a deep convolution network Mozilla DeepSpeech(httpsgithubcommozillaDeepSpeech based onHannun et al [9]) performed at 6 The low performanceofMozilla DeepSpeech is due primarily due to its dependenceon a languagemodelWhenpresentedwith utterances insteadof single words it performs remarkably well

44 Discussion A problem with naive discriminative learn-ing that has been puzzling for a long time is that measuresbased on ndl performed well as predictors of processingtimes [54 58] whereas accuracy for lexical decisions waslow An accuracy of 27 for a model performing an 11480-classification task is perhaps reasonable but the lack ofprecision is unsatisfactory when the goal is to model humanvisual lexicality decisions Bymoving fromndl to ldlmodelaccuracy is substantially improved (to 59) At the sametime predictions for reaction times improved considerablyas well As lexicality decisions do not require word identifica-tion further improvement in predicting decision behavior isexpected to be possible by considering not only whether thepredicted semantic is closest to the targeted vector but alsomeasures such as how densely the space around the predictedsemantic vector is populated

Accuracy for auditory comprehension is lower for thedata we considered above at around 33 Interestingly aseries of studies indicates that recognizing isolated wordstaken out of running speech is a nontrivial task also forhuman listeners [121ndash123] Correct identification by nativespeakers of 1000 randomly sampled word tokens from aGerman corpus of spontaneous conversational speech rangedbetween 21 and 44 [18] For both human listeners andautomatic speech recognition systems recognition improvesconsiderably when words are presented in their naturalcontext Given that ldl with FBSFs performs very wellon isolated word recognition it seems worth investigating

further whether the present approach can be developed into afully-fledged model of auditory comprehension that can takefull utterances as input For a blueprint of how we plan toimplement such a model see Baayen et al [124]

5 Speech Production

This section examines whether we can predict wordsrsquo formsfrom the semantic vectors of S If this is possible for thepresent dataset with reasonable accuracy we have a proofof concept that discriminative morphology is feasible notonly for comprehension but also for speech productionThe first subsection introduces the computational imple-mentation The next subsection reports on the modelrsquosperformance which is evaluated first for monomorphemicwords then for inflected words and finally for derivedwords Section 53 provides further evidence for the pro-duction network by showing that as the support from thesemantics for the triphones becomes weaker the amount oftime required for articulating the corresponding segmentsincreases

51 Computational Implementation For a productionmodelsome representational format is required for the output thatin turn drives articulation In what follows we make useof triphones as output features Triphones capture part ofthe contextual dependencies that characterize speech andthat render problematic the phoneme as elementary unit ofa phonological calculus [93] Triphones are in many waysnot ideal in that they inherit the limitations that come withdiscrete units Other output features structured along thelines of gestural phonology [125] or time series of movementsof key articulators registered with electromagnetic articulog-raphy or ultrasound are on our list for further explorationFor now we use triphones as a convenience construct andwe will show that given the support from the semantics forthe triphones the sequence of phones can be reconstructedwith high accuracy As some models of speech productionassemble articulatory targets from phone segments (eg[126]) the present implementation can be seen as a front-endfor this type of model

Before putting the model to the test we first clarify theway the model works by means of our toy lexicon with thewords one two and three Above we introduced the semanticmatrix S (13) which we repeat here for convenience

S = one two threeonetwothree

( 100201031001

040110 ) (18)

as well as an C indicator matrix specifying which triphonesoccur in which words (12) As in what follows this matrixspecifies the triphones targeted for productionwe henceforthrefer to this matrix as the Tmatrix

Complexity 21

T

= wV wVn Vn tu tu Tr Tri rionetwothree

( 100100

100010

010001

001001 ) (19)

For production our interest is in thematrixG that transformsthe row vectors of S into the row vectors of T ie we need tosolve

SG = T (20)

Given G we can predict for any semantic vector s the vectorof triphones t that quantifies the support for the triphonesprovided by s simply by multiplying s with G

t = sG (21)

As before the transformation matrix G is straightforwardto estimate Let S1015840 denote the Moore-Penrose generalizedinverse of S Since

S1015840SG = S1015840T (22)

we have

G = S1015840T (23)

Given G we can predict the triphone matrix T from thesemantic matrix S

SG = T (24)

For the present example the inverse of S and S1015840 is

S1015840 = one two threeonetwothree

( 110minus021minus009minus029107minus008

minus041minus002104 ) (25)

and the transformation matrix G is

G = wV wVn Vn tu tu Tr Tri ri

onetwothree

( 110minus021minus009110minus021minus009

110minus021minus009minus029107minus008

minus029107minus008minus041minus002104

minus041minus002104minus041minus002104 ) (26)

For this simple example T is virtually identical to T Forrealistic data T will not be identical to T but will be anapproximation of it that is optimal in the least squares senseThe triphones with the strongest support are expected to bethe most likely to be the triphones defining a wordrsquos form

We made use of the S matrix which we derived fromthe tasa corpus as described in Section 3 The majority ofcolumns of the 23561 times 23561 matrix S show very smalldeviations from zero and hence are uninformative As beforewe reduced the number of columns of S by removing columnswith very low variance Here one option is to remove allcolumns with a variance below a preset threshold 120579 Howeverto avoid adding a threshold as a free parameter we setthe number of columns retained to the number of differenttriphones in a given dataset a number which is around 119899 =4500 In summary S denotes a 119908times119899 matrix that specifies foreach of 119908 words an 119899-dimensional semantic vector

52 Model Performance

521 Performance on Monolexomic Words We first exam-ined model performance on a dataset comprising monolex-omicwords that did not carry any inflectional exponentsThisdataset of 3987words comprised three irregular comparatives

(elder less andmore) two irregular superlatives (least most)28 irregular past tense forms and one irregular past participle(smelt) For this set of words we constructed a 3987 times 4446matrix of semantic vectors S119898 (a submatrix of the S matrixintroduced above in Section 3) and a 3987 times 4446 triphonematrix T119898 (a submatrix of T) The number of colums of S119898was set to the number of different triphones (the columndimension of T119898) Those column vectors of S119898 were retainedthat had the 4446 highest variances We then estimated thetransformation matrix G and used this matrix to predict thetriphones that define wordsrsquo forms

We evaluated model performance in two ways First weinspected whether the triphones with maximal support wereindeed the targeted triphones This turned out to be the casefor all words Targeted triphones had an activation valueclose to one and nontargeted triphones an activation closeto zero As the triphones are not ordered we also investi-gated whether the sequence of phones could be constructedcorrectly for these words To this end we set a thresholdof 099 extracted all triphones with an activation exceedingthis threshold and used the all simple paths (a path issimple if the vertices it visits are not visited more than once)function from the igraph package [127] to calculate all pathsstarting with any left-edge triphone in the set of extractedtriphones From the resulting set of paths we selected the

22 Complexity

longest path which invariably was perfectly aligned with thesequence of triphones that defines wordsrsquo forms

We also evaluated model performance with a secondmore general heuristic algorithm that also makes use of thesame algorithm from graph theory Our algorithm sets up agraph with vertices collected from the triphones that are bestsupported by the relevant semantic vectors and considers allpaths it can find that lead from an initial triphone to a finaltriphoneThis algorithm which is presented inmore detail inthe appendix and which is essential for novel complex wordsproduced the correct form for 3982 out of 3987 words Itselected a shorter form for five words int for intent lin forlinnen mis for mistress oint for ointment and pin for pippinThe correct forms were also found but ranked second due toa simple length penalty that is implemented in the algorithm

From these analyses it is clear that mapping nearly4000 semantic vectors on their corresponding triphone pathscan be accomplished with very high accuracy for Englishmonolexomic words The question to be addressed nextis how well this approach works for complex words Wefirst address inflected forms and limit ourselves here to theinflected variants of the present set of 3987 monolexomicwords

522 Performance on Inflected Words Following the classicdistinction between inflection and word formation inflectedwords did not receive semantic vectors of their own Never-theless we can create semantic vectors for inflected words byadding the semantic vector of an inflectional function to thesemantic vector of its base However there are several ways inwhich themapping frommeaning to form for inflectedwordscan be set up To explain this we need some further notation

Let S119898 and T119898 denote the submatrices of S and T thatcontain the semantic and triphone vectors of monolexomicwords Assume that a subset of 119896 of these monolexomicwords is attested in the training corpus with inflectionalfunction 119886 Let T119886 denote the matrix with the triphonevectors of these inflected words and let S119886 denote thecorresponding semantic vectors To obtain S119886 we take thepertinent submatrix S119898119886 from S119898 and add the semantic vectors119886 of the affix

S119886 = S119898119886 + i⨂ s119886 (27)

Here i is a unit vector of length 119896 and ⨂ is the generalizedKronecker product which in (27) stacks 119896 copies of s119886 row-wise As a first step we could define a separate mapping G119886for each inflectional function 119886

S119886G119886 = T119886 (28)

but in this set-up learning of inflected words does not benefitfrom the knowledge of the base words This can be remediedby a mapping for augmented matrices that contain the rowvectors for both base words and inflected words[S119898

S119886]G119886 = [T119898

T119886] (29)

The dimension of G119886 (length of semantic vector by lengthof triphone vector) remains the same so this option is notmore costly than the preceding one Nevertheless for eachinflectional function a separate large matrix is required Amuch more parsimonious solution is to build augmentedmatrices for base words and all inflected words jointly

[[[[[[[[[[S119898S1198861S1198862S119886119899

]]]]]]]]]]G = [[[[[[[[[[

T119898T1198861T1198862T119886119899

]]]]]]]]]](30)

The dimension of G is identical to that of G119886 but now allinflectional functions are dealt with by a single mappingIn what follows we report the results obtained with thismapping

We selected 6595 inflected variants which met the cri-terion that the frequency of the corresponding inflectionalfunction was at least 50 This resulted in a dataset with 91comparatives 97 superlatives 2401 plurals 1333 continuousforms (eg walking) 859 past tense forms and 1086 formsclassed as perfective (past participles) as well as 728 thirdperson verb forms (eg walks) Many forms can be analyzedas either past tenses or as past participles We followed theanalyses of the treetagger which resulted in a dataset inwhichboth inflectional functions are well-attested

Following (30) we obtained an (augmented) 10582 times5483 semantic matrix S whereas before we retained the 5483columns with the highest column variance The (augmented)triphone matrix T for this dataset had the same dimensions

Inspection of the activations of the triphones revealedthat targeted triphones had top activations for 85 of themonolexomic words and 86 of the inflected words Theproportion of words with at most one intruding triphone was97 for both monolexomic and inflected words The graph-based algorithm performed with an overall accuracy of 94accuracies broken down bymorphology revealed an accuracyof 99 for the monolexomic words and an accuracy of 92for inflected words One source of errors for the productionalgorithm is inconsistent coding in the celex database Forinstance the stem of prosper is coded as having a final schwafollowed by r but the inflected forms are coded without ther creating a mismatch between a partially rhotic stem andcompletely non-rhotic inflected variants

We next put model performance to a more stringent testby using 10-fold cross-validation for the inflected variantsFor each fold we trained on all stems and 90 of all inflectedforms and then evaluated performance on the 10of inflectedforms that were not seen in training In this way we can ascer-tain the extent to which our production system (network plusgraph-based algorithm for ordering triphones) is productiveWe excluded from the cross-validation procedure irregularinflected forms forms with celex phonological forms withinconsistent within-paradigm rhoticism as well as forms thestem of which was not available in the training set Thus

Complexity 23

cross-validation was carried out for a total of 6236 inflectedforms

As before the semantic vectors for inflected words wereobtained by addition of the corresponding content and inflec-tional semantic vectors For each training set we calculatedthe transformationmatrixG from the S andTmatrices of thattraining set For an out-of-bag inflected form in the test setwe calculated its semantic vector and multiplied this vectorwith the transformation matrix (using (21)) to obtain thepredicted triphone vector t

The proportion of forms that were predicted correctlywas 062 The proportion of forms ranked second was 025Forms that were incorrectly ranked first typically were otherinflected forms (including bare stems) that happened toreceive stronger support than the targeted form Such errorsare not uncommon in spoken English For instance inthe Buckeye corpus [110] closest is once reduced to closFurthermore Dell [90] classified forms such as concludementfor conclusion and he relax for he relaxes as (noncontextual)errors

The algorithm failed to produce the targeted form for3 of the cases Examples of the forms produced instead arethe past tense for blazed being realized with [zId] insteadof [zd] the plural mouths being predicted as having [Ts] ascoda rather than [Ds] and finest being reduced to finst Thevoiceless production formouth does not follow the dictionarynorm but is used as attested by on-line pronunciationdictionaries Furthermore the voicing alternation is partlyunpredictable (see eg [128] for final devoicing in Dutch)and hence model performance here is not unreasonable Wenext consider model accuracy for derived words

523 Performance on Derived Words When building thevector space model we distinguished between inflectionand word formation Inflected words did not receive theirown semantic vectors By contrast each derived word wasassigned its own lexome together with a lexome for itsderivational function Thus happiness was paired with twolexomes happiness and ness Since the semantic matrix forour dataset already contains semantic vectors for derivedwords we first investigated how well forms are predictedwhen derived words are assessed along with monolexomicwords without any further differentiation between the twoTo this end we constructed a semantic matrix S for 4885words (rows) by 4993 (columns) and constructed the trans-formation matrix G from this matrix and the correspondingtriphone matrix T The predicted form vectors of T =SG supported the targeted triphones above all other tri-phones without exception Furthermore the graph-basedalgorithm correctly reconstructed 99 of all forms withonly 5 cases where it assigned the correct form secondrank

Next we inspected how the algorithm performs whenthe semantic vectors of derived forms are obtained fromthe semantic vectors of their base words and those of theirderivational lexomes instead of using the semantic vectors ofthe derived words themselves To allow subsequent evalua-tion by cross-validation we selected those derived words that

contained an affix that occured at least 30 times in our data set(again (38) agent (177) ful (45) instrument (82) less(54) ly (127) and ness (57)) to a total of 770 complex words

We first combined these derived words with the 3987monolexomic words For the resulting 4885 words the Tand S matrices were constructed from which we derivedthe transformation matrix G and subsequently the matrixof predicted triphone strengths T The proportion of wordsfor which the targeted triphones were the best supportedtriphones was 096 and the graph algorithm performed withan accuracy of 989

In order to assess the productivity of the system weevaluated performance on derived words by means of 10-fold cross-validation For each fold we made sure that eachaffix was present proportionally to its frequency in the overalldataset and that a derived wordrsquos base word was included inthe training set

We first examined performance when the transformationmatrix is estimated from a semantic matrix that contains thesemantic vectors of the derived words themselves whereassemantic vectors for unseen derived words are obtained bysummation of the semantic vectors of base and affix It turnsout that this set-up results in a total failure In the presentframework the semantic vectors of derived words are tooidiosyncratic and too finely tuned to their own collocationalpreferences They are too scattered in semantic space tosupport a transformation matrix that supports the triphonesof both stem and affix

We then used exactly the same cross-validation procedureas outlined above for inflected words constructing semanticvectors for derived words from the semantic vectors of baseand affix and calculating the transformation matrix G fromthese summed vectors For unseen derivedwordsGwas usedto transform the semantic vectors for novel derived words(obtained by summing the vectors of base and affix) intotriphone space

For 75 of the derived words the graph algorithmreconstructed the targeted triphones For 14 of the derivednouns the targeted form was not retrieved These includecases such as resound which the model produced with [s]instead of the (unexpected) [z] sewer which the modelproduced with R instead of only R and tumbler where themodel used syllabic [l] instead of nonsyllabic [l] given in thecelex target form

53 Weak Links in the Triphone Graph and Delays in SpeechProduction An important property of the model is that thesupport for trigrams changes where a wordrsquos graph branchesout for different morphological variants This is illustratedin Figure 9 for the inflected form blending Support for thestem-final triphone End is still at 1 but then blend can endor continue as blends blended or blending The resultinguncertainty is reflected in the weights on the edges leavingEnd In this example the targeted ing form is driven by theinflectional semantic vector for continuous and hence theedge to ndI is best supported For other forms the edgeweights will be different and hence other paths will be bettersupported

24 Complexity

1

1

1051

024023

03

023

1

001 001

1

02

03

023

002

1 0051

bl

blE

lEn

EndndI

dIN

IN

INI

NIN

dId

Id

IdI

dII

IIN

ndz

dz

dzI

zIN

nd

EnI

nIN

lEI

EIN

Figure 9 The directed graph for blending Vertices represent triphones including triphones such as IIN that were posited by the graphalgorithm to bridge potentially relevant transitions that are not instantiated in the training data Edges are labelled with their edge weightsThe graph the edge weights of which are specific to the lexomes blend and continuous incorporates not only the path for blending butalso the paths for blend blends and blended

The uncertainty that arises at branching points in thetriphone graph is of interest in the light of several exper-imental results For instance inter keystroke intervals intyping become longer at syllable and morph boundaries[129 130] Evidence from articulography suggests variabilityis greater at morph boundaries [131] Longer keystroke exe-cution times and greater articulatory variability are exactlywhat is expected under reduced edge support We thereforeexamined whether the edge weights at the first branchingpoint are predictive for lexical processing To this end weinvestigated the acoustic duration of the segment at the centerof the first triphone with a reduced LDL edge weight (in thepresent example d in ndI henceforth lsquobranching segmentrsquo)for those words in our study that are attested in the Buckeyecorpus [110]

The dataset we extracted from the Buckeye corpus com-prised 15105 tokens of a total of 1327 word types collectedfrom 40 speakers For each of these words we calculatedthe relative duration of the branching segment calculatedby dividing segment duration by word duration For thepurposes of statistical evaluation the distribution of thisrelative duration was brought closer to normality bymeans ofa logarithmic transformationWe fitted a generalized additivemixed model [132] to log relative duration with randomintercepts for word and speaker log LDL edge weight aspredictor of interest and local speech rate (Phrase Rate)log neighborhood density (Ncount) and log word frequencyas control variables A summary of this model is presentedin Table 7 As expected an increase in LDL Edge Weight

goes hand in hand with a reduction in the duration of thebranching segment In other words when the edge weight isreduced production is slowed

The adverse effects of weaknesses in a wordrsquos path inthe graph where the path branches is of interest againstthe discussion about the function of weak links in diphonetransitions in the literature on comprehension [133ndash138] Forcomprehension it has been argued that bigram or diphonelsquotroughsrsquo ie low transitional probabilities in a chain of hightransitional probabilities provide points where sequencesare segmented and parsed into their constituents Fromthe perspective of discrimination learning however low-probability transitions function in exactly the opposite wayfor comprehension [54 124] Naive discriminative learningalso predicts that troughs should give rise to shorter process-ing times in comprehension but not because morphologicaldecomposition would proceed more effectively Since high-frequency boundary bigrams and diphones are typically usedword-internally across many words they have a low cuevalidity for thesewords Conversely low-frequency boundarybigrams are much more typical for very specific base+affixcombinations and hence are better discriminative cues thatafford enhanced activation of wordsrsquo lexomes This in turngives rise to faster processing (see also Ramscar et al [78] forthe discriminative function of low-frequency bigrams in low-frequency monolexomic words)

There is remarkable convergence between the directedgraphs for speech production such as illustrated in Figure 9and computational models using temporal self-organizing

Complexity 25

Table 7 Statistics for the partial effects of a generalized additive mixed model fitted to the relative duration of edge segments in the Buckeyecorpus The trend for log frequency is positive accelerating TPRS thin plate regression spline

A parametric coefficients Estimate Std Error t-value p-valueIntercept -19165 00592 -323908 lt 00001Log LDL Edge Weight -01358 00400 -33960 00007Phrase Rate 00047 00021 22449 00248Log Ncount 01114 00189 58837 lt 00001B smooth terms edf Refdf F-value p-valueTPRS log Frequency 19007 19050 75495 00004random intercepts word 11498839 13230000 314919 lt 00001random intercepts speaker 302995 390000 345579 lt 00001

maps (TSOMs [139ndash141]) TSOMs also use error-drivenlearning and in recent work attention is also drawn toweak edges in wordsrsquo paths in the TSOM and the conse-quences thereof for lexical processing [142] We are not usingTSOMs however one reason being that in our experiencethey do not scale up well to realistically sized lexiconsA second reason is that we are pursuing the hypothesisthat form serves meaning and that self-organization ofform by itself is not necessary This hypothesis is likelytoo strong especially as we do not provide a rationale forthe trigram and triphone units that we use to representaspects of form It is worth noting that spatial organizationof form features is not restricted to TSOMs Although inour approach the spatial organization of triphone features isleft unspecified such an organization can be easily enforced(see [85] for further details) by using an algorithm suchas graphopt which self-organizes the vertices of a graphinto a two-dimensional plane Figure 9 was obtained withthis algorithm (Graphopt was developed for the layout oflarge graphs httpwwwschmuhlorggraphopt andis implemented in the igraph package Graphopt uses basicprinciples of physics to iteratively determine an optimallayout Each node in the graph is given both mass and anelectric charge and edges between nodes are modeled asspringsThis sets up a system inwhich there are attracting andrepelling forces between the vertices of the graph and thisphysical system is simulated until it reaches an equilibrium)

54 Discussion Our production network reconstructsknown wordsrsquo forms with a high accuracy 999 formonolexomic words 92 for inflected words and 99 forderived words For novel complex words accuracy under10-fold cross-validation was 62 for inflected words and 75for derived words

The drop in performance for novel forms is perhapsunsurprising given that speakers understand many morewords than they themselves produce even though they hearnovel forms on a fairly regular basis as they proceed throughlife [79 143]

However we also encountered several technical problemsthat are not due to the algorithm but to the representationsthat the algorithm has had to work with First it is surprisingthat accuracy is as high as it is given that the semanticvectors are constructed from a small corpus with a very

simple discriminative algorithm Second we encounteredinconsistencies in the phonological forms retrieved from thecelex database inconsistencies that in part are due to theuse of discrete triphones Third many cases where the modelpredictions do not match the targeted triphone sequence thetargeted forms have a minor irregularity (eg resound with[z] instead of [s]) Fourth several of the typical errors that themodel makes are known kinds of speech errors or reducedforms that one might encounter in engaged conversationalspeech

It is noteworthy that themodel is almost completely data-driven The triphones are derived from celex and otherthan somemanual corrections for inconsistencies are derivedautomatically from wordsrsquo phone sequences The semanticvectors are based on the tasa corpus and were not in any wayoptimized for the production model Given the triphone andsemantic vectors the transformation matrix is completelydetermined No by-hand engineering of rules and exceptionsis required nor is it necessary to specify with hand-codedlinks what the first segment of a word is what its secondsegment is etc as in the weaver model [37] It is only in theheuristic graph algorithm that three thresholds are requiredin order to avoid that graphs become too large to remaincomputationally tractable

The speech production system that emerges from thisapproach comprises first of all an excellent memory forforms that have been encountered before Importantly thismemory is not a static memory but a dynamic one It is not arepository of stored forms but it reconstructs the forms fromthe semantics it is requested to encode For regular unseenforms however a second network is required that projects aregularized semantic space (obtained by accumulation of thesemantic vectors of content and inflectional or derivationalfunctions) onto the triphone output space Importantly itappears that no separate transformations or rules are requiredfor individual inflectional or derivational functions

Jointly the two networks both of which can also betrained incrementally using the learning rule ofWidrow-Hoff[44] define a dual route model attempts to build a singleintegrated network were not successfulThe semantic vectorsof derived words are too idiosyncratic to allow generalizationfor novel forms It is the property of the semantic vectorsof derived lexomes described in Section 323 namely thatthey are close to but not inside the cloud of their content

26 Complexity

lexomes which makes them productive Novel forms do notpartake in the idiosyncracies of lexicalized existing wordsbut instead remain bound to base and derivational lexomesIt is exactly this property that makes them pronounceableOnce produced a novel form will then gravitate as experi-ence accumulates towards the cloud of semantic vectors ofits morphological category meanwhile also developing itsown idiosyncracies in pronunciation including sometimeshighly reduced pronunciation variants (eg tyk for Dutchnatuurlijk ([natyrlk]) [111 144 145] It is perhaps possibleto merge the two routes into one but for this much morefine-grained semantic vectors are required that incorporateinformation about discourse and the speakerrsquos stance withrespect to the adressee (see eg [115] for detailed discussionof such factors)

6 Bringing in Time

Thus far we have not considered the role of time The audiosignal comes in over time longerwords are typically readwithmore than one fixation and likewise articulation is a temporalprocess In this section we briefly outline how time can bebrought into the model We do so by discussing the readingof proper names

Proper names (morphologically compounds) pose achallenge to compositional theories as the people referredto by names such as Richard Dawkins Richard NixonRichardThompson and SandraThompson are not obviouslysemantic composites of each other We therefore assignlexomes to names irrespective of whether the individu-als referred to are real or fictive alive or dead Further-more we assume that when the personal name and familyname receive their own fixations both names are under-stood as pertaining to the same named entity which istherefore coupled with its own unique lexome Thus forthe example sentences in Table 8 the letter trigrams ofJohn are paired with the lexome JohnClark and like-wise the letter trigrams of Clark are paired with this lex-ome

By way of illustration we obtained thematrix of semanticvectors training on 100 randomly ordered tokens of thesentences of Table 8 We then constructed a trigram cuematrix C specifying for each word (John Clark wrote )which trigrams it contains In parallel a matrix L specify-ing for each word its corresponding lexomes (JohnClarkJohnClark write ) was set up We then calculatedthe matrix F by solving CF = L and used F to cal-culate estimated (predicted) semantic vectors L Figure 10presents the correlations of the estimated semantic vectorsfor the word forms John John Welsch Janet and Clarkwith the targeted semantic vectors for the named entitiesJohnClark JohnWelsch JohnWiggam AnneHastieand JaneClark For the sequentially read words John andWelsch the semantic vector generated by John and thatgenerated by Welsch were summed to obtain the integratedsemantic vector for the composite name

Figure 10 upper left panel illustrates that upon readingthe personal name John there is considerable uncertainty

about which named entity is at issue When subsequentlythe family name is read (upper right panel) uncertainty isreduced and John Welsch now receives full support whereasother named entities have correlations close to zero or evennegative correlations As in the small world of the presenttoy example Janet is a unique personal name there is nouncertainty about what named entity is at issue when Janetis read For the family name Clark on its own by contrastJohn Clark and Janet Clark are both viable options

For the present example we assumed that every ortho-graphic word received one fixation However depending onwhether the eye lands far enough into the word namessuch as John Clark and compounds such as graph theorycan also be read with a single fixation For single-fixationreading learning events should comprise the joint trigrams ofthe two orthographic words which are then simultaneouslycalibrated against the targeted lexome (JohnClark graph-theory)

Obviously the true complexities of reading such asparafoveal preview are not captured by this example Never-theless as a rough first approximation this example showsa way forward for integrating time into the discriminativelexicon

7 General Discussion

We have presented a unified discrimination-driven modelfor visual and auditory word comprehension and wordproduction the discriminative lexicon The layout of thismodel is visualized in Figure 11 Input (cochlea and retina)and output (typing and speaking) systems are presented inlight gray the representations of themodel are shown in darkgray For auditory comprehension we make use of frequencyband summary (FBS) features low-level cues that we deriveautomatically from the speech signal The FBS features serveas input to the auditory network F119886 that maps vectors of FBSfeatures onto the semantic vectors of S For reading lettertrigrams represent the visual input at a level of granularitythat is optimal for a functional model of the lexicon (see [97]for a discussion of lower-level visual features) Trigrams arerelatively high-level features compared to the FBS featuresFor features implementing visual input at a much lower levelof visual granularity using histograms of oriented gradients[146] see Linke et al [15] Letter trigrams are mapped by thevisual network F119900 onto S

For reading we also implemented a network heredenoted by K119886 that maps trigrams onto auditory targets(auditory verbal images) represented by triphones Theseauditory triphones in turn are mapped by network H119886onto the semantic vectors S This network is motivated notonly by our observation that for predicting visual lexicaldecision latencies triphones outperform trigrams by a widemargin Network H119886 is also part of the control loop forspeech production see Hickok [147] for discussion of thiscontrol loop and the necessity of auditory targets in speechproduction Furthermore the acoustic durations with whichEnglish speakers realize the stem vowels of English verbs[148] as well as the duration of word final s in English [149]

Complexity 27

Table 8 Example sentences for the reading of proper names To keep the example simple inflectional lexomes are not taken into account

John Clark wrote great books about antsJohn Clark published a great book about dangerous antsJohn Welsch makes great photographs of airplanesJohn Welsch has a collection of great photographs of airplanes landingJohn Wiggam will build great harpsichordsJohn Wiggam will be a great expert in tuning harpsichordsAnne Hastie has taught statistics to great second year studentsAnne Hastie has taught probability to great first year studentsJanet Clark teaches graph theory to second year studentsJohnClark write great book about antJohnClark publish a great book about dangerous antJohnWelsch make great photograph of airplaneJohnWelsch have a collection of great photograph airplane landingJohnWiggam future build great harpsichordJohnWiggam future be a great expert in tuning harpsichordAnneHastie have teach statistics to great second year studentAnneHastie have teach probability to great first year studentJanetClark teach graphtheory to second year student

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

John

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

John Welsch

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

Janet

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

Clark

Pear

son

corr

elat

ion

minus0

50

00

51

0

Figure 10 Correlations of estimated semantic vectors for John JohnWelsch Janet andClarkwith the targeted semantic vectors of JohnClarkJohnWelsch JohnWiggam AnneHastie and JaneClark

can be predicted from ndl networks using auditory targets todiscriminate between lexomes [150]

Speech production is modeled by network G119886 whichmaps semantic vectors onto auditory targets which in turnserve as input for articulation Although current modelsof speech production assemble articulatory targets fromphonemes (see eg [126 147]) a possibility that is certainlycompatible with our model when phones are recontextual-ized as triphones we think that it is worthwhile investigatingwhether a network mapping semantic vectors onto vectors ofarticulatory parameters that in turn drive the articulators canbe made to work especially since many reduced forms arenot straightforwardly derivable from the phone sequences oftheir lsquocanonicalrsquo forms [111 144]

We have shown that the networks of our model havereasonable to high production and recognition accuraciesand also generate a wealth of well-supported predictions forlexical processing Central to all networks is the hypothesisthat the relation between form and meaning can be modeleddiscriminatively with large but otherwise surprisingly simplelinear networks the underlyingmathematics of which (linearalgebra) is well-understood Given the present results it isclear that the potential of linear algebra for understandingthe lexicon and lexical processing has thus far been severelyunderestimated (the model as outlined in Figure 11 is amodular one for reasons of computational convenience andinterpretabilityThe different networks probably interact (seefor some evidence [47]) One such interaction is explored

28 Complexity

typing speaking

orthographic targets

trigrams

auditory targets

triphones

semantic vectors

TASA

auditory cues

FBS features

orthographic cues

trigrams

cochlea retina

To Ta

Ha

GoGa

KaS

Fa Fo

CoCa

Figure 11 Overview of the discriminative lexicon Input and output systems are presented in light gray the representations of the model areshown in dark gray Each of these representations is a matrix with its own set of features Arcs are labeled with the discriminative networks(transformationmatrices) thatmapmatrices onto each otherThe arc from semantic vectors to orthographic vectors is in gray as thismappingis not explored in the present study

in Baayen et al [85] In their production model for Latininflection feedback from the projected triphone paths backto the semantics (synthesis by analysis) is allowed so thatof otherwise equally well-supported paths the path that bestexpresses the targeted semantics is selected For informationflows between systems in speech production see Hickok[147]) In what follows we touch upon a series of issues thatare relevant for evaluating our model

Incremental learning In the present study we esti-mated the weights on the connections of these networkswith matrix operations but importantly these weights canalso be estimated incrementally using the learning rule ofWidrow and Hoff [44] further improvements in accuracyare expected when using the Kalman filter [151] As allmatrices can be updated incrementally (and this holds aswell for the matrix with semantic vectors which are alsotime-variant) and in theory should be updated incrementally

whenever information about the order of learning events isavailable the present theory has potential for studying lexicalacquisition and the continuing development of the lexiconover the lifetime [79]

Morphology without compositional operations Wehave shown that our networks are predictive for a rangeof experimental findings Important from the perspective ofdiscriminative linguistics is that there are no compositionaloperations in the sense of Frege and Russell [1 2 152] Thework on compositionality in logic has deeply influencedformal linguistics (eg [3 153]) and has led to the beliefthat the ldquoarchitecture of the language facultyrdquo is groundedin a homomorphism between a calculus (or algebra) ofsyntactic representations and a calculus (or algebra) based onsemantic primitives Within this tradition compositionalityarises when rules combining representations of form arematched with rules combining representations of meaning

Complexity 29

The approach of Marelli and Baroni [60] who derive thesemantic vector of a complex word from the semantic vectorof its base and a dedicated affix-specific mapping from thepertinent base words to the corresponding derived wordsis in the spirit of compositionality in that a change in form(through affixation) goes hand in hand with a change inmeaning (modeled with a linear mapping in semantic space)

The perspective on morphological processing developedin the present study breaks with this tradition Althoughsemantic vectors for inflected words are obtained by summa-tion of the semantic vectors of the base word and those ofits inflectional functions (see [85] for modeling of the richverbal paradigms of Latin) the form vectors correspondingto the resulting semantic vectors are not obtained by thesummation of vectors for chunks of form nor by any otheroperations on forms Attempts to align chunks of wordsrsquo formwith their semantic structures mdash typically represented bygraphs constructed over semantic primitives (see eg [154])mdash are destined to end in quagmires of arbitrary decisionsabout the number of inflectional classes required (Estonian30 or 300) what the chunks should look like (does Germanhave a morpheme d in its articles and demonstratives derdie das diese ) whether Semitic morphology requiresoperations on form on one or two tiers [155ndash157] andhow many affixal position slots are required for Athebascanlanguages such asNavajo [158] A language that is particularlypainful for compositional theories is Yelı Dnye an Indo-Pacific language spoken on Rossel island (in the south-east ofPapuaNewGuinea)The language has a substantial inventoryof over a 1000 function words used for verb agreementTypically these words are monosyllables that simultaneouslyspecify values for negation tense aspect person and numberof subject deixis evidentiality associated motion and coun-terfactuality Furthermore verbs typically have no less thaneight different stems the use of which depends on particularconstellations of inflectional functions [159 160] Yelı Dnyethus provides a striking counterexample to the hypothesisthat a calculus of meaning would be paralleled by a calculusof form

Whereas the theory of linear discriminative learningradically rejects the idea that morphological processing isgrounded in compositionality as defined in mainstream for-mal linguistic theory it does allow for the conjoint expressionin form of lexical meanings and inflectional functions andfor forms to map onto such conjoint semantic vectors Butjust asmultiple inflectional functions can be expressed jointlyin a single word multiple words can map onto nonconjointsemantic vectors as was illustrated above for named entityrecognition John Wiggam can be understood to refer to avery specific person without this person being a composi-tional function of the proper name John and the surnameWiggam In both these cases there is no isomorphy betweeninflectional functions and lexical meanings on the one handand chunks of form on the other hand

Of course conjoint vectors for lexical meanings andinflectional functions can be implemented andmade to workonly because a Latin verb form such as sapıvissemus isunderstood when building the semantic space S to expresslexomes for person (first) number (plural) voice (active)

tense (pluperfect) mood (subjunctive) and lexical meaning(to know) (see [85] for computational modeling details)Although this could loosely be described as a decomposi-tional approach to inflected words we prefer not to use theterm lsquodecompositionalrsquo as in formal linguistics this termas outlined above has a very different meaning A moreadequate terminology would be that the system is conjointrealizational for production and conjoint inferential for com-prehension (we note here that we have focused on temporallyconjoint realization and inference leaving temporally disjointrealization and inference for sequences of words and possiblycompounds and fixed expression for further research)

Naive versus linear discriminative learning There isone important technical difference between the learningengine of the current study ldl and naive discriminativelearning (ndl) Because ndl makes the simplifying assump-tion that outcomes are orthogonal the weights from the cuesto a given outcome are independent of the weights from thecues to the other outcomes This independence assumptionwhich motivates the word naive in naive discriminativelearning makes it possible to very efficiently update networkweights during incremental learning In ldl by contrast thisindependence no longer holds learning is no longer naiveWe therefore refer to our networks not as ndl networksbut simply as linear discriminative learning networks Actualincremental updating of ldl networks is computationallymore costly but further numerical optimization is possibleand implementations are underway

Scaling An important property of the present approachis that good results are obtained already for datasets ofmodestsize Our semantic matrix is derived from the tasa corpuswhich with only 10 million words is dwarfed by the 28 billionwords of the ukWaC corpus used by Marelli and Baroni [60]more words than any individual speaker can ever encounterin their lifetime Similarly Arnold et al [18] report goodresults for word identification when models are trained on20 hours of speech which contrasts with the huge volumesof speech required for training deep learning networks forspeech recognition Although performance increases some-whatwithmore training data [161] it is clear that considerableheadway can be made with relatively moderate volumesof speech It thus becomes feasible to train models onfor instance Australian English and New Zealand Englishusing already existing speech corpora and to implementnetworks that make precise quantitative predictions for howunderstanding is mediated by listenersrsquo expectations abouttheir interlocutors (see eg [162])

In this study we have obtained good results with asemantic matrix with a dimensionality of around 4000lexomes By itself we find it surprising that a semantic vectorspace can be scaffolded with a mere 4000 words But thisfindingmay also shed light on the phenomenon of childhoodamnesia [163 164] Bauer and Larkina [165] pointed outthat young children have autobiographical memories thatthey may not remember a few years later They argue thatthe rate of forgetting diminishes as we grow older andstabilizes in adulthood Part of this process of forgettingmay relate to the changes in the semantic matrix Cruciallywe expect the correlational structure of lexomes to change

30 Complexity

considerably over time as experience accumulates If this isindeed the case the semantic vectors for the lexomes foryoung children are different from the corresponding lexomesfor older children which will again differ from those ofadults As a consequence autobiographical memories thatwere anchored to the lexomes at a young age can no longerbe accessed as the semantic vectors to which these memorieswere originally anchored no longer exist

No examplars Given how we estimate the transforma-tion matrices with which we navigate between form andmeaning it might seem that our approach is in essence animplementation of an exemplar-based theory of the mentallexicon After all for instance the C119886 matrix specifies foreach auditory word form (exemplar) its pertinent frequencyband summary features However the computational imple-mentation of the present study should not be confused withthe underlying algorithmic conceptualization The underly-ing conceptualization is that learning proceeds incrementallytrial by trial with the weights of transformation networksbeing updated with the learning rule of Widrow and Hoff[44] In the mean trial-by-trial updating with the Widrow-Hoff learning rule results in the same expected connectionweights as obtained by the present estimates using thegeneralized inverse The important conceptual advantage ofincremental learning is however that there is no need to storeexemplars By way of example for auditory comprehensionacoustic tokens are given with the input to the learningsystemThese tokens the ephemeral result of interactingwiththe environment leave their traces in the transformationmatrix F119886 but are not themselves stored The same holdsfor speech production We assume that speakers dynamicallyconstruct the semantic vectors from their past experiencesrather than retrieving these semantic vectors from somedictionary-like fixed representation familiar from currentfile systems The dynamic nature of these semantic vectorsemerged loud and clear from the modeling of novel complexwords for speech production under cross-validation In otherwords our hypothesis is that all input and output vectors areephemeral in the sense that they represent temporary statesof the cognitive system that are not themselves represented inthat system but that leave their traces in the transformationmatrices that jointly characterize and define the cognitivesystem that we approximate with the discriminative lexicon

This dynamic perspective on the mental lexicon asconsisting of several coupled networks that jointly bringthe cognitive system into states that in turn feed into fur-ther networks (not modeled here) for sentence integration(comprehension) and articulation (production) raises thequestion about the status of units in this theory Units playan important role when constructing numeric vectors forform and meaning Units for letter trigrams and phonetrigrams are of course context-enriched letter and phoneunits Content lexomes as well as lexomes for derivationaland inflectional functions are crutches for central semanticfunctions ranging from functions realizing relatively concreteonomasiological functions (lsquothis is a dog not a catrsquo) toabstract situational functions allowing entities events andstates to be specified on the dimensions of time spacequantity aspect etc However crucial to the central thrust of

the present study there are no morphological symbols (unitscombining form andmeaning) normorphological formunitssuch as stems and exponents of any kind in the modelAbove we have outlined the many linguistic considerationsthat have led us not to want to incorporate such units aspart of our theory Importantly as there are no hidden layersin our model it is not possible to seek for lsquosubsymbolicrsquoreflexes of morphological units in patterns of activation overhidden layers Here we depart from earlier connectionistmodels which rejected symbolic representations but retainedthe belief that there should be morphological representationalbeit subsymbolic ones Seidenberg and Gonnerman [166]for instance discuss morphological representations as beinglsquointerlevel representationsrsquo that are emergent reflections ofcorrelations among orthography phonology and semanticsThis is not to say that themore agglutinative a language is themore the present model will seem to the outside observer tobe operating withmorphemes But themore a language tendstowards fusional or polysyntheticmorphology the less strongthis impression will be

Model complexity Next consider the question of howdifficult lexical processing actually is Sidestepping the com-plexities of the neural embedding of lexical processing [126147] here we narrow down this question to algorithmiccomplexity State-of-the-art models in psychology [37 114167] implementmany layers of hierarchically organized unitsand many hold it for an established fact that such units arein some sense psychologically real (see eg [19 91 168])However empirical results can be equally well understoodwithout requiring the theoretical construct of the morpheme(see eg [47 54 58 166 169 170]) The present studyillustrates this point for lsquoboundaryrsquo effects in written andoral production When paths in the triphone graph branchuncertainty increases and processing slows down Althoughobserved lsquoboundaryrsquo effects are compatible with a post-Bloomfieldian lexicon the very same effects arise in ourmodel even though morphemes do not figure in any way inour computations

Importantly the simplicity of the linear mappings in ourmodel of the discriminative lexicon allows us to sidestep theproblem that typically arise in computationally implementedfull-scale hierarchical systems namely that errors arisingat lower levels propagate to higher levels The major stepsforward made in recent end-to-end models in languageengineering suggest that end-to-end cognitive models arealso likely to perform much better One might argue thatthe hidden layers of deep learning networks represent thetraditional lsquohidden layersrsquo of phonemes morphemes andword forms mediating between form and meaning in lin-guistic models and offshoots thereof in psychology Howeverthe case of baboon lexical learning [12 13] illustrates thatthis research strategy is not without risk and that simplerdiscriminative networks can substantially outperform deeplearning networks [15] We grant however that it is conceiv-able that model predictions will improve when the presentlinear networks are replaced by deep learning networks whenpresented with the same input and output representationsused here (see Zhao et al [171] for a discussion of lossfunctions for deep learning targeting numeric instead of

Complexity 31

categorical output vectors) What is unclear is whethersuch networks will improve our understandingmdashthe currentlinear networks offer the analyst connection weights betweeninput and output representations that are straightforwardlylinguistically interpretable Where we think deep learningnetworks really will come into their own is bridging the gapbetween for instance the cochlea and retina on the one handand the heuristic representations (letter trigrams) that wehave used as a starting point for our functional theory of themappings between form and meaning

Network flexibility An important challenge for deeplearning networks designed to model human lexical process-ing is to keep the networks flexible and open to rapid updat-ing This openness to continuous learning is required notonly by findings on categorization [40 41 41 42] includingphonetic categorization [172 173] but is also visible in timeseries of reaction times Baayen and Hendrix [17] reportedimproved prediction of visual lexical decision reaction timesin the British Lexicon Project when a naive discriminativelearning network is updated from trial to trial simulating thewithin-experiment learning that goes on as subjects proceedthrough the experiment In natural speech the phenomenonof phonetic convergence between interlocutors [174] likewisebears witness to a lexicon that is constantly recalibratedin interaction with its environment (for the many ways inwhich interlocutorsrsquo speech aligns see [175]) Likewise therapidity with which listeners adapt to foreign accents [176]within a few minutes shows that lexical networks exhibitfast local optimization As deep learning networks typicallyrequire huge numbers of cycles through the training dataonce trained they are at risk of not having the requiredflexibility for fast local adaptation This risk is reduced for(incremental) wide learning as illustrated by the phoneticconvergence evidenced by the model of Arnold et al [18] forauditory comprehension (a very similar point was made byLevelt [177] in his critique of connectionist models in thenineties)

Open questions This initial study necessarily leavesmany questions unanswered Compounds and inflectedderived words have not been addressed and morphologicaloperations such as reduplication infixation and nonconcate-native morphology provide further testing grounds for thepresent approach We also need to address the processingof compounds with multiple constituents ubiquitous inlanguages as different as German Mandarin and EstonianFurthermore in actual utterances words undergo assimila-tion at their boundaries and hence a crucial test case forthe discriminative lexicon is to model the production andcomprehension of words in multi-word utterances For ourproduction model we have also completely ignored stresssyllabification and pitch which however likely has renderedthe production task more difficult than necessary

A challenge for discriminative morphology is the devel-opment of proper semantic matrices For languages withrich morphology such as Estonian current morphologicalparsers will identify case-inflected words as nominativesgenitives or partitives (among others) but these labelsfor inflectional forms do not correspond to the semanticfunctions encoded in these forms (see eg [178 179]) What

is required are computational tools that detect the appropriateinflectional lexomes for these words in the sentences orutterances in which they are embedded

A related problem specifically for the speech productionmodel is how to predict the strongly reduced word formsthat are rampant in conversational speech [111] Our auditorycomprehension network F119886 is confronted with reduced formsin training and the study of Arnold et al [18] indicates thatpresent identification accuracymay approximate human per-formance for single-word recognition Whereas for auditorycomprehension we are making progress what is required forspeech production is a much more detailed understanding ofthe circumstances under which speakers actually use reducedword forms The factors driving reduction may be in partcaptured by the semantic vectors but we anticipate thataspects of the unfolding discourse play an important role aswell which brings us back to the unresolved question of howto best model not isolated words but words in context

An important challenge for our theory which formalizesincremental implicit learning is how to address and modelthe wealth of explicit knowledge that speakers have abouttheir language We play with the way words sound in poetrywe are perplexed by wordsrsquo meanings and reinterpret themso that they make more sense (eg the folk etymology [180]reflected in modern English crawfish a crustacean and nota fish originally Middle English crevis compare ecrevissein French) we teach morphology in schools and we evenfind that making morphological relations explicit may bebeneficial for aphasic patients [181] The rich culture of worduse cannot but interact with the implicit system but howexactly this interaction unfolds andwhat its consequences areat both sides of the consciousunconscious divide is presentlyunclear

Although many questions remain our model is remark-ably successful in capturing many aspects of lexical andmorphological processing in both comprehension and pro-duction Since themodel builds on a combination of linguisti-callymotivated lsquosmartrsquo low-level representations for form andmeaning and large but otherwise straightforward andmathe-matically well-understood two-layer linear networks (to beclear networks without any hidden layers) the conclusionseems justified that algorithmically the ldquomental lexiconrdquomaybe much simpler than previously thought

Appendix

A Vectors Matrices and MatrixMultiplication

Figure 12 presents two data points 119886 and 119887 with coordinates(3 4) and (-2 -3) Arrows drawn from the origin to thesepoints are shown in blue with a solid and dashed linerespectively We refer to these points as vectors Vectors aredenoted with lower case letters in bold and we place the 119909coordinate of a point next to the 119910 coordinate

a = (3 4) b = (minus2 minus3) (A1)

32 Complexity

minus4 minus2 0 2 4

minus4

minus2

02

4

x

y

a

b

x

y

Figure 12 Points 119886 and 119887 can be transformed into points 119901 and 119902 bya linear transformation

We can represent a and b jointly by placing them next to eachother in a matrix For matrices we use capital letters in boldfont

A = ( 3 4minus2 minus3) (A2)

In addition to the points 119886 and 119887 Figure 12 also shows twoother datapoints 119909 and 119910 The matrix for these two points is

B = (minus5 24 minus1) (A3)

Let us assume that points 119886 and 119887 represent the forms oftwo words 1205961 and 1205962 and that the points 119909 and 119910 representthe meanings of these two words (below we discuss in detailhow exactly wordsrsquo forms and meanings can be representedas vectors of numbers) As morphology is the study of therelation between wordsrsquo forms and their meanings we areinterested in how to transform points 119886 and b into points119909 and 119910 and likewise in how to transform points 119909 and119910 back into points 119886 and 119887 The first transformation isrequired for comprehension and the second transformationfor production

We begin with the transformation for comprehensionFormally we have matrix A which we want to transforminto matrix B A way of mapping A onto B is by means ofa linear transformation using matrix multiplication For 2 times2 matrices matrix multiplication is defined as follows(1198861 11988621198871 1198872) (1199091 11990921199101 1199102) = (11988611199091 + 11988621199101 11988611199092 + 1198862119910211988711199091 + 11988721199101 11988711199092 + 11988721199102) (A4)

Such a mapping of A onto B is given by a matrix F

F = ( 1 2minus2 minus1) (A5)

and it is straightforward to verify that indeed( 3 4minus2 minus3) ( 1 2minus2 minus1) = (minus5 24 minus1) (A6)

Using matrix notation we can write

AF = B (A7)

Given A and B how do we obtain F The answeris straightforward but we need two further concepts theidentity matrix and the matrix inverse For multiplication ofnumbers the identity multiplication is multiplication with 1For matrices multiplication with the identity matrix I leavesthe locations of the points unchangedThe identitymatrix hasones on the main diagonal and zeros elsewhere It is easilyverified that indeed(1 00 1) (1199091 11990921199101 1199102) = (1199091 11990921199101 1199102) (1 00 1) = (1199091 11990921199101 1199102) (A8)

For numbers the inverse of multiplication with 119904 is dividingby 119904 (1119904 ) (119904119909) = 119909 (A9)

For matrices the inverse of a square matrix X is that matrixXminus1 such that their product is the identity matrix

Xminus1X = XXminus1 = I (A10)

For nonsquare matrices the inverse Yminus1 is defined such that

Yminus1 (YX) = (XY) 119884minus1 = X (A11)

We find the square matrix F that maps the square matrices Aonto B as follows

AF = B

Aminus1AF = Aminus1B

IF = Aminus1B

F = Aminus1B(A12)

Since for the present example A and its inverse happen to beidentical (Aminus1 = A) we obtain (A6)

Fmaps wordsrsquo form vectors onto wordsrsquo semantic vectorsLet us now consider the reverse and see how we can obtainwordsrsquo form vectors from wordsrsquo semantic vectors That isgiven B we want to find that matrix G which maps B ontoA ie

BG = A (A13)

Solving this equation proceeds exactly as above

BG = A

Bminus1BG = Bminus1A

IG = Bminus1A

G = Bminus1A(A14)

Complexity 33

The inverse of B is

Bminus1 = (13 2343 53) (A15)

and hence we have

(13 2343 53) ( 3 4minus2 minus3) = (minus13 minus2323 13 ) = G (A16)

The inverse of a matrix needs not exist A square matrixthat does not have an inverse is referred to as a singularmatrix Almost all matrices with which we work in theremainder of this study are singular For singular matricesan approximation of the inverse can be used such as theMoore-Penrose generalized inverse (In R the generalizedinverse is available in theMASS package function ginv Thenumeric library that is used by ginv is LAPACK availableat httpwwwnetliborglapack) In this study wedenote the generalized inverse of a matrix X by X1015840

The examples presented here are restricted to 2 times 2matrices but themathematics generalize tomatrices of largerdimensions When multiplying two matrices X and Y theonly constraint is that the number of columns ofX is the sameas the number of rows ofYThe resultingmatrix has the samenumber of rows as X and the same number of columns as Y

In the present study the rows of matrices represent wordsand columns the features of thesewords For instance the rowvectors of A can represent wordsrsquo form features and the rowvectors of and B the corresponding semantic features Thefirst row vector of A is mapped onto the first row vector ofB as follows (3 4) ( 1 2minus2 minus1) = (minus5 2) (A17)

A transformationmatrix such as F can be represented as atwo-layer networkThe network corresponding to F is shownin Figure 13 When the input vector (3 4) is presented to thenetwork the nodes 1198941 and 1198942 are set to 3 and 4 respectively Toobtain the activations 1199001 = minus5 and 1199002 = 2 for the output vector(minus5 2) we cumulate the evidence at the input weighted bythe connection strengths specified on the edges of the graphThus for 1199001 we obtain the value 3 times 1 + 4 times minus2 = minus5 and for1199002 we have 3 times 2 + 4 times minus1 = 2 In other words the networkmaps the input vector (3 4) onto the output vector (minus5 2)

An important property of linear maps is that they areproductive We can present a novel form vector say

s = (2 2) (A18)

to the network and it will map this vector onto a novelsemantic vector Using matrices this new semantic vector isstraightforwardly calculated(2 2) ( 1 2minus2 minus1) = (minus2 2) (A19)

1 2 minus2 minus1

i1 i2

o1 o2

semantic vectors come out here

form features go in here

connection weightsare the transformationmatrix

Figure 13 The linear network corresponding to the transformationmatrix F There are two input nodes 1198941 and 1198942 (in blue)and twooutput nodes 1199001 and 1199002 in red When the input vector (3 4) ispresented to the network the value at 1199001 is 3 times 1 + 4 times minus2 = minus5and that at 1199002 is 3 times 2 + 4 times minus1 = 2 The network maps the point(3 4) onto the point (minus5 2)B Graph-Based Triphone Sequencing

The hypothesis underlying graph-based triphone sequencingis that in the directed graph that has triphones as verticesand an edge between any two vertices that overlap (iesegments 2 and 3 of triphone 1 are identical to segments 1and 2 of triphone 2) the path from a wordrsquos initial triphone(the triphone starting with ) to a wordrsquos final triphone (thetriphone ending with a ) is the path receiving the bestsupport of all possible paths from the initial to the finaltriphone Unfortunately the directed graph containing allvertices and edges is too large tomake computations tractablein a reasonable amount of computation time The followingalgorithm is a heuristic algorithm that first collects potentiallyrelevant vertices and edges and then calculates all pathsstarting fromany initial vertex using theall simple pathsfunction from the igraph package [127] As a first step allvertices that are supported by the stem and whole wordwith network support exceeding 01 are selected To theresulting set those vertices are added that are supported bythe affix which is conditional on these vertices having anetwork support exceeding 095 A directed graph can nowbe constructed and for each initial vertex all paths startingat these vertices are calculated The subset of those pathsreaching a final vertex is selected and the support for eachpath is calculated As longer paths trivially will have greatersupport the support for a path is weighted for its lengthsimply by dividing the raw support by the path length Pathsare ranked by weighted path support and the path withmaximal support is selected as the acoustic image drivingarticulation

It is possible that no path from an initial vertex to a finalvertex is found due to critical boundary triphones not beinginstantiated in the data on which the model was trainedThis happens when the model is assessed on novel inflected

34 Complexity

nobo

dyso

met

hing

ever

ythi

ngno

thin

gev

eryo

nean

ythi

ngan

yone

som

eone

hers

him his

via

than I

exce

ptou

tside

thei

rsw

hoev

erth

ence

min

ein

side

amon

gst its

thei

rou

rm

yyo

uyo

ur her

she

your

selv

esw

hate

ver us we it

they he

them m

em

yself

him

self

hers

elfou

rselv

esyo

urse

lfits

elfth

emse

lves

abou

tw

ithou

tw

ithin

beyo

ndam

ong

betw

een

thro

ugho

utdu

ring

since by

from in of

with at fo

r tobe

low

abov

e up offdo

wn

upon on

agai

nst

tow

ard

tow

ards

arou

ndne

aral

ong

unde

rbe

hind

besid

eth

roug

hon

toin

toov

erac

ross

nobodysomethingeverythingnothingeveryoneanythinganyonesomeonehershimhisviathanIexceptoutsidetheirswhoeverthencemineinsideamongstitstheirourmyyouyourhersheyourselveswhateverusweittheyhethemmemyselfhimselfherselfourselvesyourselfitselfthemselvesaboutwithoutwithinbeyondamongbetweenthroughoutduringsincebyfrominofwithatfortobelowaboveupoffdownupononagainsttowardtowardsaroundnearalongunderbehindbesidethroughontointooveracross

minus01 0 01 02Value

Color Keyand Histogram

050

015

0025

00C

ount

Figure 14 Heatmap for the correlations of the semantic vectors of pronouns and prepositions Note that quantifier pronouns form a groupof their own that prepositions and the other pronouns form distinguisable groups and that within groups further subclusters are visible egforwe and us she and her and across and over All diagonal elements represent correlations equal to 1 (this is not brought out by the heatmapwhich color-codes diagonal elements with the color for the highest correlation of the range we specified 025)

and derived words under cross-validationTherefore verticesand edges are added to the graph for all nonfinal verticesthat are at the end of any of the paths starting at any initialtriphone Such novel vertices and edges are assigned zeronetwork support Typically paths from the initial vertex tothe final vertex can now be constructed and path support isevaluated as above

For an algorithm that allows triphone paths to includecycles which may be the case in languages with much richer

morphology than English see Baayen et al [85] and for codethe R packageWpmWithLdl described therein

C A Heatmap for Function Words(Pronouns and Prepositions)

The heatmap for the correlations of the semantic vectors ofpronouns and prepositions is presented in Figure 14

Complexity 35

Data Availability

With the exception of the data from theDistributed Little RedHen Lab all primary data are publicly available from the citedsources

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The authors are indebted to Geoff Hollis Jessie Nixon ChrisWestbury and LuisMienhardt for their constructive feedbackon earlier versions of this manuscript This research wassupported by an ERC advancedGrant (no 742545) to the firstauthor

References

[1] G Frege ldquoBegriffsschrift a formula language modeled uponthat of arithmetic for pure thoughtrdquo in From Frege to GodelA Source Book in Mathematical Logic 1879-1931 pp 1ndash82 1967

[2] B RussellAn Inquiry intoMeaning andTruth Allen andUnwinLondon UK 1942

[3] R Montague ldquoThe proper treatment of quantification in ordi-nary englishrdquo in Approaches to Natural Language pp 221ndash242Springer 1973

[4] N Chomsky Syntactic Structures vol 33 Mouton The Hague1957

[5] W J M Levelt Formal Grammars in Linguistics And Psycholin-guistics Volume 1 An Introduction to the Theory of FormalLanguages and Automata Volume 2 Applications in LinguisticTheory Volume 3 Psycholinguistic Applications John BenjaminsPublishing 2008

[6] D F Kleinschmidt and T Florian Jaeger ldquoRobust speechperception Recognize the familiar generalize to the similar andadapt to the novelrdquo Psychological Review vol 122 no 2 pp 148ndash203 2015

[7] L Breiman ldquoRandom forestsrdquo Machine Learning vol 45 no 1pp 5ndash32 2001

[8] C Strobl A-L Boulesteix A Zeileis and T Hothorn ldquoBiasin random forest variable importance measures illustrationssources and a solutionrdquo BMC Bioinformatics vol 8 2007

[9] A Hannun C Case J Casper et al ldquoDeep speechScaling up end-to-end speech recognitionrdquo 2014httpsarxivorgabs14125567

[10] Z Tuske P Golik R Schluter and H Ney ldquoAcoustic modelingwith deep neural networks using raw time signal for lvcsrrdquo inProceedings of the INTERSPEECH pp 890ndash894 2014

[11] J Schmidhuber ldquoDeep learning in neural networks anoverviewrdquo Neural Networks vol 61 pp 85ndash117 2015

[12] T Hannagan J C Ziegler S Dufau J Fagot and J GraingerldquoDeep learning of orthographic representations in baboonsrdquoPLoS ONE vol 9 no 1 2014

[13] J Grainger S Dufau M Montant J C Ziegler and J FagotldquoOrthographic processing in baboons (Papio papio)rdquo Sciencevol 336 no 6078 pp 245ndash248 2012

[14] D Scarf K Boy A U Reinert J Devine O Gunturkun andM Colombo ldquoOrthographic processing in pigeons (columba

livia)rdquo Proceedings of the National Academy of Sciences vol 113no 40 pp 11272ndash11276 2016

[15] M Linke F Broker M Ramscar and R H Baayen ldquoArebaboons learning ldquoorthographicrdquo representations Probablynotrdquo PLoS ONE vol 12 no 8 2017

[16] D H Hubel and T N Wiesel ldquoReceptive fields binocularinteraction and functional architecture in the catrsquos visualcortexrdquo The Journal of Physiology vol 160 pp 106ndash154 1962

[17] R H Baayen and P Hendrix ldquoTwo-layer networks non-linear separation and human learningrdquo in From Semantics toDialectometry Festschrift in honor of JohnNerbonne Tributes 32M Wieling M Kroon G Van Noord and G Bouma Eds pp13ndash22 College Publications 2017

[18] D Arnold F Tomaschek K Sering F Lopez and R H BaayenldquoWords from spontaneous conversational speech can be rec-ognized with human-like accuracy by an error-driven learningalgorithm that discriminates between meanings straight fromsmart acoustic features bypassing the phoneme as recognitionunitrdquo PLoS ONE vol 12 no 4 2017

[19] P Zwitserlood ldquoProcessing and representation of morphologi-cal complexity in native language comprehension and produc-tionrdquo in The Construction of Words G E Booij Ed pp 583ndash602 Springer 2018

[20] G Booij ldquoConstruction morphologyrdquo Language and LinguisticsCompass vol 4 no 7 pp 543ndash555 2010

[21] J P Blevins ldquoWord-based morphologyrdquo Journal of Linguisticsvol 42 no 3 pp 531ndash573 2006

[22] G Stump Inflectional Morphology A Theory of ParadigmStructure Cambridge University Press 2001

[23] R Beard ldquoOn the extent and nature of irregularity in thelexiconrdquo Lingua vol 42 no 4 pp 305ndash341 1977

[24] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1974

[25] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1991

[26] C Hockett ldquoThe origin of speechrdquo Scientific American vol 203pp 89ndash97 1960

[27] M Erelt Estonian Language M Erelt Ed Estonian AcademyPublishers Tallinn Estonia 2003

[28] P H Matthews Grammatical Theory in the United States fromBloomfield to Chomsky Cambridge University Press Cam-bridge 1993

[29] N Chomsky andMHalleTheSound Pattern of English Harperand Row NY USA 1968

[30] T Landauer and S Dumais ldquoA solution to Platorsquos problem Thelatent semantic analysis theory of acquisition induction andrepresentation of knowledgerdquo Psychological Review vol 104 no2 pp 211ndash240 1997

[31] W Weaver ldquoTranslationrdquo in Machine Translation of LanguagesFourteen Essays W N Locke and A D Booth Eds pp 15ndash23MIT Press Cambridge 1955

[32] J R Firth Selected Papers of J R Firth 1952-59 IndianaUniversity Press 1968

[33] K Lund and C Burgess ldquoProducing high-dimensional seman-tic spaces from lexical co-occurrencerdquo Behavior Research Meth-ods Instruments and Computers vol 28 no 2 pp 203ndash2081996

[34] C Shaoul and C Westbury ldquoExploring lexical co-occurrencespace using HiDExrdquo Behavior Research Methods vol 42 no 2pp 393ndash413 2010

36 Complexity

[35] T Mikolov I Sutskever K Chen G Corrado and J DeanldquoDistributed representations of words and phrases and theircompositionalityrdquo inAdvances in Neural Information ProcessingSystems pp 3111ndash3119 2013

[36] A Roelofs ldquoThe WEAVER model of word-form encoding inspeech productionrdquo Cognition vol 64 no 3 pp 249ndash284 1997

[37] W J M Levelt A Roelofs and A S Meyer ldquoA theory of lexicalaccess in speech productionrdquoBehavioral and Brain Sciences vol22 no 1 pp 1ndash38 1999

[38] M Taft ldquoInteractive-activation as a Framework for Under-standing Morphological Processingrdquo Language and CognitiveProcesses vol 9 no 3 pp 271ndash294 1994

[39] R Schreuder and R H Baayen ldquoModeling morphologicalprocessingrdquo inMorphological Aspects of Language Processing LB Feldman Ed pp 131ndash154 Lawrence ErlbaumHillsdale NewJersey USA 1995

[40] B C Love D L Medin and T M Gureckis ldquoSUSTAIN ANetwork Model of Category Learningrdquo Psychological Reviewvol 111 no 2 pp 309ndash332 2004

[41] C J Marsolek ldquoWhat antipriming reveals about primingrdquoTrends in Cognitive Sciences vol 12 no 5 pp 176ndash181 2008

[42] M Ramscar and R Port ldquoCategorization (without categories)rdquoin Handbook of Cognitive Linguistics E Dabrowska and DDivjak Eds pp 75ndash99 De Gruyter Berlin Germany 2015

[43] R A Rescorla and A R Wagner ldquoA theory of Pavlovianconditioning Variations in the effectiveness of reinforcementand nonreinforcementrdquo in Classical Conditioning II CurrentResearch and Theory A H Black and W F Prokasy Eds pp64ndash99 Appleton Century Crofts NY USA 1972

[44] B Widrow and M E Hoff ldquoAdaptive switching circuitsrdquo 1960WESCON Convention Record Part IV pp 96ndash104 1960

[45] R H Baayen R Schreuder and R Sproat ldquoMorphology inthe Mental Lexicon A Computational Model for Visual WordRecognitionrdquo in Lexicon Development for Speech and LanguageProcessing F van Eynde and D Gibbon Eds pp 267ndash291Kluwer Academic Publishers 2000

[46] M Taft ldquoA morphological-decomposition model of lexicalrepresentationrdquo Linguistics vol 26 no 4 pp 657ndash667 1988

[47] MWHarmandM S Seidenberg ldquoComputing themeanings ofwords in reading Cooperative division of labor between visualand phonological processesrdquo Psychological Review vol 111 no3 pp 662ndash720 2004

[48] P C Trimmer J M McNamara A Houston and J A RMarshall ldquoDoes natural selection favour the Rescorla-Wagnerrulerdquo Journal of Theoretical Biology vol 302 pp 39ndash52 2012

[49] R R Miller R C Barnet and N J Grahame ldquoAssessment ofthe Rescorla-Wagner modelrdquo Psychological Bulletin vol 117 no3 pp 363ndash386 1995

[50] L J Kamin ldquoPredictability surprise attention and condition-ingrdquo in Punishment and Aversive Behavior B A Campbell andRMChurch Eds pp 276ndash296Appleton-Century-CroftsNYUSA 1969

[51] R A Rescorla ldquoPavlovian conditioning Itrsquos not what you thinkit isrdquo American Psychologist vol 43 no 3 pp 151ndash160 1988

[52] M Ramscar D Yarlett M Dye K Denny and K ThorpeldquoThe Effects of Feature-Label-Order and Their Implications forSymbolic Learningrdquo Cognitive Science vol 34 no 6 pp 909ndash957 2010

[53] M Ramscar and D Yarlett ldquoLinguistic self-correction in theabsence of feedback A new approach to the logical problem oflanguage acquisitionrdquo Cognitive Science vol 31 no 6 pp 927ndash960 2007

[54] R H Baayen P Milin D F ETHurđevic P Hendrix and MMarelli ldquoAn Amorphous Model for Morphological Process-ing in Visual Comprehension Based on Naive DiscriminativeLearningrdquo Psychological Review vol 118 no 3 pp 438ndash4812011

[55] F M Del Prado Martın R Bertram T Haikio R Schreuderand R H Baayen ldquoMorphological family size in a morphologi-cally rich languageThe case of Finnish compared to Dutch andHebrewrdquo Journal of Experimental Psychology LearningMemoryand Cognition vol 30 pp 1271ndash1278 2004

[56] P Milin D F Durdevic and F M Del Prado Martın ldquoThesimultaneous effects of inflectional paradigms and classes onlexical recognition Evidence from Serbianrdquo Journal of Memoryand Language vol 60 no 1 pp 50ndash64 2009

[57] B K Bergen ldquoThe psychological reality of phonaesthemesrdquoLanguage vol 80 no 2 pp 290ndash311 2004

[58] P Milin L B Feldman M Ramscar P Hendrix and R HBaayen ldquoDiscrimination in lexical decisionrdquo PLoS ONE vol 12no 2 2017

[59] R H Baayen P Milin and M Ramscar ldquoFrequency in lexicalprocessingrdquo Aphasiology vol 30 no 11 pp 1174ndash1220 2016

[60] M Marelli and M Baroni ldquoAffixation in semantic space Mod-eling morpheme meanings with compositional distributionalsemanticsrdquo Psychological Review vol 122 no 3 pp 485ndash5152015

[61] T Sering PMilin andRH Baayen ldquoLanguage comprehensionas amultiple label classification problemrdquo StatisticaNeerlandicapp 1ndash15 2018

[62] R Kaye and RWilson Linear Algebra Oxford University Press1998

[63] S H Ivens and B L Koslin Demands for Reading LiteracyRequire New Accountability Methods Touchstone Applied Sci-ence Associates 1991

[64] T K Landauer P W Foltz and D Laham ldquoAn introduction tolatent semantic analysisrdquoDiscourse Processes vol 25 no 2-3 pp259ndash284 1998

[65] H Schmid ldquoImprovements in Part-of-Speech Tagging with anApplication to Germanrdquo in Proceedings of the ACL SIGDAT-Workshop pp 13ndash25 Dublin Ireland 1995

[66] R H Baayen R Piepenbrock and L Gulikers The CELEXLexical Database (CD-ROM) Linguistic Data ConsortiumUniversity of Pennsylvania Philadelphia PA 1995

[67] J Mitchell and M Lapata ldquoVector-based models of semanticcompositionrdquo in Proceedings of the ACL pp 236ndash244 2008

[68] A Lazaridou M Marelli R Zamparelli and M BaronildquoCompositionally derived representations of morphologicallycomplex words in distributional semanticsrdquo in Proceedings ofthe ACL (1) pp 1517ndash1526 2013

[69] R Cotterell H Schutze and J Eisner ldquoMorphological smooth-ing and extrapolation of word embeddingsrdquo in Proceedings ofthe 54th Annual Meeting of the Association for ComputationalLinguistics vol 1 pp 1651ndash1660 2016

[70] B D Zeller S Pado and J Snajder ldquoTowards semantic valida-tion of a derivational lexiconrdquo in Proceedings of the COLING2014 the 25th International Conference on Computational Lin-guistics Technical Papers pp 1728ndash1739 2014

[71] H Marchand The Categories and Types of Present-Day EnglishWord Formation A Synchronic-Diachronic Approach BeckrsquoscheVerlagsbuchhandlung Munchen Germany 1969

[72] L Bauer EnglishWord-Formation Cambridge University PressCambridge 1983

Complexity 37

[73] I PlagWord-Formation in English CambridgeUniversity PressCambridge 2003

[74] R Beard Lexeme-Morpheme Base Morphology A GeneralTheory of Inflection and Word Formation State University ofNew York Press Albany NY USA 1995

[75] K Geeraert J Newman and R H Baayen ldquoIdiom VariationExperimental Data and a Blueprint of a Computational ModelrdquoTopics in Cognitive Science vol 9 no 3 pp 653ndash669 2017

[76] C Shaoul S Bitschau N Schilling et al ldquondl2 Naive discrim-inative learning an implementation in R R packagerdquo 2015

[77] P Milin D Divjak and R H Baayen ldquoA learning perspectiveon individual differences in skilled reading Exploring andexploiting orthographic and semantic discrimination cuesrdquoJournal of Experimental Psychology Learning Memory andCognition vol 43 no 11 pp 1730ndash1751 2017

[78] M Ramscar P Hendrix C Shaoul P Milin and R H BaayenldquoNonlinear dynamics of lifelong learning the myth of cognitivedeclinerdquo Topics in Cognitive Science vol 6 pp 5ndash42 2014

[79] M Ramscar C C Sun P Hendrix and R H Baayen ldquoTheMis-measurement of Mind Life-Span Changes in Paired-Associate-Learning Scores Reflect the ldquoCostrdquo of Learning Not CognitiveDeclinerdquo Psychological Science vol 28 no 8 pp 1171ndash1179 2017

[80] G desRosiers and D Ivison ldquoPaired associate learning Form 1and form 2 of the Wechsler memory scalerdquo Archives of ClinicalNeuropsychology vol 3 no 1 pp 47ndash67 1988

[81] E Bruni N K Tran andM Baroni ldquoMultimodal distributionalsemanticsrdquo Journal of Artificial Intelligence Research vol 49 pp1ndash47 2014

[82] W N Venables and B D RipleyModern Applied Statistics withS-PLUS Springer NY USA 2002

[83] A B Warriner V Kuperman and M Brysbaert ldquoNorms ofvalence arousal and dominance for 13915 English lemmasrdquoBehavior Research Methods vol 45 no 4 pp 1191ndash1207 2013

[84] D Kastovsky ldquoProductivity in word formationrdquo Linguistics vol24 no 3 pp 585ndash600 1986

[85] R H Baayen Y Chuang and J P Blevins ldquoInflectionalmorphology with linear mappingsrdquoTheMental Lexicon vol 13no 2 pp 232ndash270 2018

[86] Burnage CELEX A Guide for Users Centre for Lexical Infor-mation Nijmegen The Netherlands 1988

[87] C McBride-Chang ldquoModels of Speech Perception and Phono-logical Processing in Readingrdquo Child Development vol 67 no4 pp 1836ndash1856 1996

[88] G C Van Orden H Kloos et al ldquoThe question of phonologyand readingrdquo inThe Science of Reading A Handbook pp 61ndash782005

[89] D Jared and K OrsquoDonnell ldquoSkilled adult readers activate themeanings of high-frequency words using phonology Evidencefrom eye trackingrdquoMemoryampCognition vol 45 no 2 pp 334ndash346 2017

[90] G S Dell ldquoA Spreading-Activation Theory of Retrieval inSentence Productionrdquo Psychological Review vol 93 no 3 pp283ndash321 1986

[91] A Marantz ldquoNo escape from morphemes in morphologicalprocessingrdquo Language and Cognitive Processes vol 28 no 7 pp905ndash916 2013

[92] M Bozic W D Marslen-Wilson E A Stamatakis M HDavis and L K Tyler ldquoDifferentiating morphology formand meaning Neural correlates of morphological complexityrdquoCognitive Neuroscience vol 19 no 9 pp 1464ndash1475 2007

[93] R F Port andA P Leary ldquoAgainst formal phonologyrdquoLanguagevol 81 no 4 pp 927ndash964 2005

[94] J P Blevins ldquoStems and paradigmsrdquo Language vol 79 no 4 pp737ndash767 2003

[95] R Forsyth and D Holmes ldquoFeature-finding for test classifica-tionrdquo Literary and Linguistic Computing vol 11 no 4 pp 163ndash174 1996

[96] H Pham and R H Baayen ldquoVietnamese compounds showan anti-frequency effect in visual lexical decisionrdquo LanguageCognition and Neuroscience vol 30 no 9 pp 1077ndash1095 2015

[97] L Cohen and S Dehaene ldquoVentral and dorsal contributions toword readingrdquo inTheCognitive Neurosciences M S GazzanigaEd pp 789ndash804 Massachusetts Institute of Technology 2009

[98] E Keuleers P Lacey K Rastle and M Brysbaert ldquoThe BritishLexicon Project Lexical decision data for 28730 monosyllabicand disyllabic English wordsrdquo Behavior Research Methods vol44 no 1 pp 287ndash304 2012

[99] R K Wagner J K Torgesen and C A Rashotte ldquoDevel-opment of reading-related phonological processing abilitiesnew evidence of bidirectional causality from a latent variablelongitudinal studyrdquoDevelopmental Psychology vol 30 no 1 pp73ndash87 1994

[100] K F EWong andH-C Chen ldquoOrthographic and phonologicalprocessing in reading Chinese text Evidence from eye fixa-tionsrdquo Language and Cognitive Processes vol 14 no 5-6 pp461ndash480 1999

[101] R L Newman D Jared and C A Haigh ldquoDoes phonologyplay a role when skilled readers read high-frequency wordsEvidence from ERPsrdquo Language and Cognitive Processes vol 27no 9 pp 1361ndash1384 2012

[102] D Jared J Ashby S J Agauas and B A Levy ldquoPhonologicalactivation of word meanings in grade 5 readersrdquo Journal ofExperimental Psychology LearningMemory andCognition vol42 no 4 pp 524ndash541 2016

[103] T Bitan A Kaftory A Meiri-Leib Z Eviatar and O PelegldquoPhonological ambiguity modulates resolution of semanticambiguity during reading An fMRI study of Hebrewrdquo Neu-ropsychology vol 31 no 7 pp 759ndash777 2017

[104] D Jared and S Bainbridge ldquoReading homophone puns Evi-dence from eye trackingrdquo Canadian Journal of ExperimentalPsychology vol 71 no 1 pp 2ndash13 2017

[105] S Amenta M Marelli and S Sulpizio ldquoFrom sound tomeaning Phonology-to-Semantics mapping in visual wordrecognitionrdquo Psychonomic Bulletin amp Review vol 24 no 3 pp887ndash893 2017

[106] M Perrone-Bertolotti J Kujala J R Vidal et al ldquoHow silent issilent reading intracerebral evidence for top-down activationof temporal voice areas during readingrdquo The Journal of Neuro-science vol 32 no 49 pp 17554ndash17562 2012

[107] B Yao P Belin andC Scheepers ldquoSilent reading of direct versusindirect speech activates voice-selective areas in the auditorycortexrdquo Journal of Cognitive Neuroscience vol 23 no 10 pp3146ndash3152 2011

[108] M Coltheart B Curtis P Atkins and M Haller ldquoMod-els of Reading Aloud Dual-Route and Parallel-Distributed-Processing Approachesrdquo Psychological Review vol 100 no 4pp 589ndash608 1993

[109] M Coltheart ldquoModeling reading The dual-route approachrdquo inThe Science of Reading A Handbook pp 6ndash23 2005

38 Complexity

[110] M Pitt K Johnson E Hume S Kiesling and W RaymondldquoThe Buckeye corpus of conversational speech Labeling con-ventions and a test of transcriber reliabilityrdquo Speech Communi-cation vol 45 no 1 pp 89ndash95 2005

[111] K Johnson ldquoMassive reduction in conversational AmericanEnglishrdquo in Proceedings of the Spontaneous Speech Data andAnalysis Proceedings of the 1st Session of the 10th InternationalSymposium pp 29ndash54 The National International Institute forJapanese Language Tokyo Japan 2004

[112] C Phillips ldquoLevels of representation in the electrophysiology ofspeech perceptionrdquo Cognitive Science vol 25 no 5 pp 711ndash7312001

[113] R L Diehl A J Lotto and L L Holt ldquoSpeech perceptionrdquoAnnual Review of Psychology vol 55 pp 149ndash179 2004

[114] D Norris and J M McQueen ldquoShortlist B a Bayesian model ofcontinuous speech recognitionrdquo Psychological Review vol 115no 2 pp 357ndash395 2008

[115] S Hawkins ldquoRoles and representations of systematic finephonetic detail in speech understandingrdquo Journal of Phoneticsvol 31 no 3-4 pp 373ndash405 2003

[116] CCucchiarini andH Strik ldquoAutomatic Phonetic TranscriptionAn overviewrdquo in Proceedings of the 15th ICPhS pp 347ndash350Barcelona Spain 2003

[117] K JohnsonTheAuditoryPerceptual Basis for Speech Segmenta-tion Ohio State University Working Papers in Linguistics 1997

[118] H Fletcher ldquoAuditory patternsrdquo Reviews of Modern Physics vol12 no 1 pp 47ndash65 1940

[119] D Arnold ldquoAcousticndlcoder Coding sound files for use withndl R package version 101rdquo 2017

[120] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2016

[121] J Pickett and I Pollack ldquoIntelligibility of excerpts from fluentspeech Effects of rate of utterance and duration of excerptrdquoLanguage and Speech vol 6 pp 151ndash164 1963

[122] L Shockey ldquoPerception of reduced forms by non-native speak-ers of Englishrdquo in Sound Patterns of Spontaneous Speech DDuez Ed pp 97ndash100 ESCA Aix 1998

[123] M Ernestus H Baayen and R Schreuder ldquoThe recognition ofreduced word formsrdquo Brain and Language vol 81 no 1-3 pp162ndash173 2002

[124] R H Baayen C Shaoul J Willits and M Ramscar ldquoCom-prehension without segmentation a proof of concept withnaive discriminative learningrdquo Language Cognition and Neu-roscience vol 31 no 1 pp 106ndash128 2016

[125] C P Browman and L Goldstein ldquoArticulatory phonology Anoverviewrdquo Phonetica vol 49 no 3-4 pp 155ndash180 1992

[126] J A Tourville and F H Guenther ldquoThe DIVA model A neuraltheory of speech acquisition and productionrdquo Language andCognitive Processes vol 26 no 7 pp 952ndash981 2011

[127] G Csardi and T Nepusz ldquoThe igraph software package forcomplex network researchrdquo InterJournal Complex Systems vol1695 no 5 pp 1ndash9 2006

[128] M Ernestus and R H Baayen ldquoPredicting the unpredictableInterpreting neutralized segments in Dutchrdquo Language vol 79no 1 pp 5ndash38 2003

[129] R Weingarten G Nottbusch and U Will ldquoMorphemes sylla-bles and graphemes in written word productionrdquo inMultidisci-plinary Approaches to Speech Production T Pechmann and CHabel Eds pp 529ndash572 Mouton de Gruyter Berlin Germany2004

[130] R Bertram F E Toslashnnessen S Stromqvist J Hyona andP Niemi ldquoCascaded processing in written compound wordproductionrdquo Frontiers in Human Neuroscience vol 9 pp 1ndash102015

[131] T Cho ldquoEffects of morpheme boundaries on intergesturaltiming Evidence from Koreanrdquo Phonetica vol 58 no 3 pp129ndash162 2001

[132] S N Wood Generalized Additive Models Chapman ampHallCRC NY USA 2017

[133] M Seidenberg ldquoSublexical structures in visual word recogni-tion Access units or orthographic redundancyrdquo in Attentionand Performance XII M Coltheart Ed pp 245ndash264 LawrenceErlbaum Associates Hove 1987

[134] J M McQueen ldquoSegmentation of continuous speech usingphonotacticsrdquo Journal of Memory and Language vol 39 no 1pp 21ndash46 1998

[135] J B Hay Causes and Consequences of Word Structure Rout-ledge NY USA London UK 2003

[136] J B Hay ldquoFrom speech perception to morphology Affix-ordering revisitedrdquo Language vol 78 pp 527ndash555 2002

[137] J B Hay and R H Baayen ldquoPhonotactics parsing and produc-tivityrdquo Italian Journal of Linguistics vol 15 no 1 pp 99ndash1302003

[138] J Hay J Pierrehumbert and M Beckman ldquoSpeech perceptionwell-formedness and the statistics of the lexiconrdquo Papers inlaboratory phonology VI pp 58ndash74 2004

[139] M Ferro C Marzi and V Pirrelli ldquoA self-organizing modelof word storage and processing implications for morphologylearningrdquo Lingue e Linguaggio vol 10 no 2 pp 209ndash226 2011

[140] F ChersiM Ferro G Pezzulo andV Pirrelli ldquoTopological self-organization and prediction learning support both action andlexical chains in the brainrdquo Topics in Cognitive Science vol 6no 3 pp 476ndash491 2014

[141] V Pirrelli M Ferro and C Marzi ldquoComputational complexityof abstractive morphologyrdquo in Understanding and MeasuringMorphological Complexity M Bearman D Brown and G GCorbett Eds pp 141ndash166 Oxford University Press 2015

[142] C Marzi M Ferro and V Pirrelli ldquoIs inflectional irregularitydysfunctional to human processingrdquo in Abstract Booklet TheMental Lexicon 2018 V Kuperman Ed University of AlbertaEdmonton 2018

[143] E Keuleers M Stevens P Mandera and M Brysbaert ldquoWordknowledge in the crowd Measuring vocabulary size and wordprevalence in a massive online experimentrdquo The QuarterlyJournal of Experimental Psychology vol 68 no 8 pp 1665ndash16922015

[144] M Ernestus Voice Assimilation And Segment Reduction inCasual Dutch A Corpus-Based Study of the Phonology-PhoneticsInterface LOT Utrecht the Netherlands 2000

[145] R Kemps M Ernestus R Schreuder and H Baayen ldquoProcess-ing reducedword formsThe suffix restoration effectrdquoBrain andLanguage vol 90 no 1-3 pp 117ndash127 2004

[146] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) vol 1 pp 886ndash893 June 2005

[147] G Hickok ldquoThe architecture of speech production and the roleof the phoneme in speech processingrdquo Language Cognition andNeuroscience vol 29 no 1 pp 2ndash20 2014

[148] B V Tucker M Sims and R H Baayen Opposing Forceson Acoustic Duration Manuscript University of Alberta andUniversity of Tubingen 2018

Complexity 39

[149] I Plag J Homann and G Kunter ldquoHomophony and mor-phology The acoustics of word-final S in Englishrdquo Journal ofLinguistics vol 53 no 1 pp 181ndash216 2017

[150] F Tomaschek I Plag M Ernestus and R H BaayenModeling the Duration of Word-Final s in English withNaive Discriminative Learning Manuscript University ofSiegenTubingenNijmegen 2018

[151] R E Kalman ldquoA new approach to linear filtering and predictionproblemsrdquo Journal of Fluids Engineering vol 82 no 1 pp 35ndash451960

[152] B Russell ldquoOn denotingrdquo Mind vol 14 no 56 pp 479ndash4931905

[153] N Hornstein Logical Form FromGB toMinimalism Blackwell1995

[154] R Jackendoff Semantic StructuresMITPress Cambridge 1990[155] J J McCarthy ldquoA prosodic theory of non-concatenative mor-

phologyrdquo Linguistic Inquiry vol 12 pp 373ndash418 1981[156] A Ussishkin ldquoA Fixed Prosodic Theory of Nonconcatenative

Templaticmorphologyrdquo Natural Language amp Linguistic Theoryvol 23 no 1 pp 169ndash218 2005

[157] A Ussishkin ldquoAffix-favored contrast inequity and psycholin-guistic grounding for non-concatenative morphologyrdquo Mor-phology vol 16 no 1 pp 107ndash125 2006

[158] RW Young andWMorganTheNavajo Language A Grammarand Colloquial Dictionary University of New Mexico Press1980

[159] S C Levinson and A Majid ldquoThe island of time Yelı Dnye thelanguage of Rossel islandrdquo Frontiers in psychology vol 4 2013

[160] S C Levinson ldquoThe language of space in Yelı Dnyerdquo inGrammars of Space Explorations in Cognitive Diversity pp 157ndash203 Cambridge University Press 2006

[161] E Shafaei-Bajestan and R H Baayen ldquoWide Learning forAuditory Comprehensionrdquo in Proceedings of the Interspeech2018 2018

[162] J Hay and K Drager ldquoStuffed toys and speech perceptionrdquoLinguistics vol 48 no 4 pp 865ndash892 2010

[163] V Henri and C Henri ldquoOn our earliest recollections ofchildhoodrdquo Psychological Review vol 2 pp 215-216 1895

[164] S Freud ldquoChildhood and concealing memoriesrdquo in The BasicWritings of Sigmund Freud The Modern Library NY USA19051953

[165] P J Bauer and M Larkina ldquoThe onset of childhood amnesiain childhood A prospective investigation of the course anddeterminants of forgetting of early-life eventsrdquoMemory vol 22no 8 pp 907ndash924 2014

[166] M S Seidenberg and L M Gonnerman ldquoExplaining deriva-tional morphology as the convergence of codesrdquo Trends inCognitive Sciences vol 4 no 9 pp 353ndash361 2000

[167] M Coltheart K Rastle C Perry R Langdon and J ZieglerldquoDRC a dual route cascaded model of visual word recognitionand reading aloudrdquoPsychological Review vol 108 no 1 pp 204ndash256 2001

[168] M V Butz and E F Kutter How the Mind Comes into BeingIntroducing Cognitive Science from a Functional and Computa-tional Perspective Oxford University Press 2016

[169] M W Harm and M S Seidenberg ldquoPhonology readingacquisition and dyslexia Insights from connectionist modelsrdquoPsychological Review vol 106 no 3 pp 491ndash528 1999

[170] L M Gonnerman M S Seidenberg and E S AndersenldquoGraded semantic and phonological similarity effects in prim-ing Evidence for a distributed connectionist approach to

morphologyrdquo Journal of Experimental Psychology General vol136 no 2 pp 323ndash345 2007

[171] H Zhao O Gallo I Frosio and J Kautz ldquoLoss Functions forImage Restoration With Neural Networksrdquo IEEE Transactionson Computational Imaging vol 3 no 1 pp 47ndash57 2017

[172] C Clarke and P Luce ldquoPerceptual adaptation to speakercharacteristics Vot boundaries in stop voicing categorizationrdquoin Proceedings of the ISCA Workshop on Plasticity in SpeechPerception 2005

[173] DNorris JMMcQueen andACutler ldquoPerceptual learning inspeechrdquo Cognitive Psychology vol 47 no 2 pp 204ndash238 2003

[174] A Schweitzer and N Lewandowski ldquoSocial factors in conver-gence of f1 and f2 in spontaneous speechrdquo in Proceedings ofthe 10th International Seminar on Speech Production CologneGermany 2014

[175] M J Pickering and S Garrod ldquoToward a mechanistic psychol-ogy of dialoguerdquo Behavioral and Brain Sciences vol 27 no 2 pp169ndash190 2004

[176] C M Clarke and M F Garrett ldquoRapid adaptation to foreign-accented Englishrdquo The Journal of the Acoustical Society ofAmerica vol 116 no 6 pp 3647ndash3658 2004

[177] W J Levelt ldquoDie konnektionistische moderdquo Sprache UndKognition vol 10 no 2 pp 61ndash72 1991

[178] A Kostic ldquoInformational load constraints on processinginflected morphologyrdquo in Morphological Aspects of LanguageProcessing L B Feldman Ed Lawrence Erlbaum Inc Publish-ers New Jersey USA 1995

[179] J P BlevinsWord andParadigmMorphology OxfordUniversityPress 2016

[180] E Forstemann Uber Deutsche volksetymologie Zeitschrift furvergleichende Sprachforschung auf dem Gebiete des DeutschenGriechischen und Lateinischen 1852

[181] K Nault Morphological Therapy Protocal [Ph D thesis] Uni-versity of Alberta Edmonton 2010

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 14: The Discriminative Lexicon: A Unified Computational Model ...downloads.hindawi.com/journals/complexity/2019/4895891.pdf · as developed within computational linguistics and com-puter

14 Complexity

Table 4 Gaussian location-scale additive model fitted to the semantic transparency ratings for derived words te tensor product smooth ands thin plate regression spline

A parametric coefficients Estimate Std Error t-value p-valueintercept [location] 55016 02564 214537 lt 00001intercept [scale] -04135 00401 -103060 lt 00001B smooth terms edf Refdf F-value p-valuete(activation diversity word length) [location] 53879 62897 237798 00008s(derivational lexome) [location] 143180 150000 3806254 lt 00001s(activation diversity) [scale] 21282 25993 284336 lt 00001s(word length) [scale] 17722 21403 1327311 lt 00001

sheds light on the role of wordsrsquo ldquosound imagerdquo on wordrecognition in reading (cf [87ndash89] for phonological effectsin visual word Recognition) For auditory word recognitionwe make use of the acoustic features developed in Arnold etal [18]

Although the features that we selected as cues are basedon domain knowledge they do not have an ontological statusin our framework in contrast to units such as phonemes andmorphemes in standard linguistic theories In these theoriesphonemes and morphemes are Russelian atomic units offormal phonological and syntactic calculi and considerableresearch has been directed towards showing that these unitsare psychologically real For instance speech errors involvingmorphemes have been interpreted as clear evidence for theexistence in the mind of morphemes [90] Research hasbeen directed at finding behavioral and neural correlates ofphonemes and morphemes [6 91 92] ignoring fundamentalcriticisms of both the phoneme and the morpheme astheoretical constructs [25 93 94]

The features that we use for representing aspects of formare heuristic features that have been selected or developedprimarily because they work well as discriminators We willgladly exchange the present features for other features if theseother features can be shown to afford higher discriminativeaccuracy Nevertheless the features that we use are groundedin domain knowledge For instance the letter trigrams usedfor modeling reading (see eg[58]) are motivated by thefinding in stylometry that letter trigrams are outstandingfeatures for discriminating between authorial hands [95] (inwhat follows we work with letter triplets and triphoneswhich are basically contextually enriched letter and phoneunits For languages with strong phonotactic restrictionssuch that the syllable inventory is quite small (eg Viet-namese) digraphs work appear to work better than trigraphs[96] Baayen et al [85] show that working with four-gramsmay enhance performance and current work in progress onmany other languages shows that for highly inflecting lan-guages with long words 4-grams may outperform 3-gramsFor computational simplicity we have not experimented withmixtures of units of different length nor with algorithmswithwhich such units might be determined) Letter n-grams arealso posited by Cohen and Dehaene [97] for the visual wordform system at the higher endof a hierarchy of neurons tunedto increasingly large visual features of words

Triphones the units that we use for representing thelsquoacoustic imagersquo or lsquoauditory verbal imageryrsquo of canonicalword forms have the same discriminative potential as lettertriplets but have as additional advantage that they do betterjustice compared to phonemes to phonetic contextual inter-dependencies such as plosives being differentiated primarilyby formant transitions in adjacent vowels The acousticfeatures that we use for modeling auditory comprehensionto be introduced in more detail below are motivated in partby the sensitivity of specific areas on the cochlea to differentfrequencies in the speech signal

Evaluation of model predictions proceeds by comparingthe predicted semantic vector s obtained by multiplying anobserved cue vector c with the transformation matrix F (s =cF) with the corresponding target row vector s of S A word 119894is counted as correctly recognized when s119894 is most stronglycorrelated with the target semantic vector s119894 of all targetvectors s119895 across all words 119895

Inflected words do not have their own semantic vectors inS We therefore created semantic vectors for inflected wordsby adding the semantic vectors of stem and affix and addedthese as additional row vectors to S before calculating thetransformation matrix F

We note here that there is only one transformationmatrix ie one discrimination network that covers allaffixes inflectional and derivational as well as derived andmonolexomic (simple) words This approach contrasts withthat of Marelli and Baroni [60] who pair every affix with itsown transformation matrix

42 Visual Comprehension For visual comprehension wefirst consider a model straightforwardly mapping form vec-tors onto semantic vectors We then expand the model withan indirect route first mapping form vectors onto the vectorsfor the acoustic image (derived from wordsrsquo triphone repre-sentations) and thenmapping the acoustic image vectors ontothe semantic vectors

The dataset on which we evaluated our models com-prised 3987 monolexomic English words 6595 inflectedvariants of monolexomic words and 898 derived words withmonolexomic base words to a total of 11480 words Thesecounts follow from the simultaneous constraints of (i) a wordappearing with sufficient frequency in TASA (ii) the wordbeing available in the British Lexicon Project (BLP [98]) (iii)

Complexity 15

the word being available in the CELEX database and theword being of the abovementioned morphological type Inthe present study we did not include inflected variants ofderived words nor did we include compounds

421 The Direct Route Straight from Orthography to Seman-tics To model single word reading as gauged by the visuallexical decision task we used letter trigrams as cues In ourdataset there were a total of 3465 unique trigrams resultingin a 11480 times 3465 orthographic cue matrix C The semanticvectors of the monolexomic and derived words were takenfrom the semantic weight matrix described in Section 3 Forinflected words semantic vectors were obtained by summa-tion of the semantic vectors of base words and inflectionalfunctions For the resulting semantic matrix S we retainedthe 5030 column vectors with the highest variances settingthe cutoff value for the minimal variance to 034times10minus7 FromS and C we derived the transformation matrix F which weused to obtain estimates s of the semantic vectors s in S

For 59 of the words s had the highest correlation withthe targeted semantic vector s (for the correctly predictedwords the mean correlation was 083 and the median was086 With respect to the incorrectly predicted words themean and median correlation were both 048) The accuracyobtained with naive discriminative learning using orthog-onal lexomes as outcomes instead of semantic vectors was27Thus as expected performance of linear discriminationlearning (ldl) is substantially better than that of naivediscriminative learning (ndl)

To assess whether this model is productive in the sensethat it canmake sense of novel complex words we considereda separate set of inflected and derived words which were notincluded in the original dataset For an unseen complexwordboth base word and the inflectional or derivational functionappeared in the training set but not the complex word itselfThe network F therefore was presented with a novel formvector which it mapped onto a novel vector in the semanticspace Recognition was successful if this novel vector wasmore strongly correlated with the semantic vector obtainedby summing the semantic vectors of base and inflectional orderivational function than with any other semantic vector

Of 553 unseen inflected words 43 were recognized suc-cessfully The semantic vectors predicted from their trigramsby the network F were overall well correlated in the meanwith the targeted semantic vectors of the novel inflectedwords (obtained by summing the semantic vectors of theirbase and inflectional function) 119903 = 067 119901 lt 00001The predicted semantic vectors also correlated well with thesemantic vectors of the inflectional functions (119903 = 061 119901 lt00001) For the base words the mean correlation dropped to119903 = 028 (119901 lt 00001)

For unseen derivedwords (514 in total) we also calculatedthe predicted semantic vectors from their trigram vectorsThe resulting semantic vectors s had moderate positivecorrelations with the semantic vectors of their base words(119903 = 040 119901 lt 00001) but negative correlations withthe semantic vectors of their derivational functions (119903 =minus013 119901 lt 00001) They did not correlate with the semantic

vectors obtained by summation of the semantic vectors ofbase and affix (119903 = 001 119901 = 056)

The reduced correlations for the derived words as com-pared to those for the inflected words likely reflects tosome extent that the model was trained on many moreinflected words (in all 6595) than derived words (898)whereas the number of different inflectional functions (7)was much reduced compared to the number of derivationalfunctions (24) However the negative correlations of thederivational semantic vectors with the semantic vectors oftheir derivational functions fit well with the observation inSection 323 that derivational functions are antiprototypesthat enter into negative correlations with the semantic vectorsof the corresponding derived words We suspect that theabsence of a correlation of the predicted vectors with thesemantic vectors obtained by integrating over the vectorsof base and derivational function is due to the semanticidiosyncracies that are typical for derivedwords For instanceausterity can denote harsh discipline or simplicity or apolicy of deficit cutting or harshness to the taste Anda worker is not just someone who happens to work butsomeone earning wages or a nonreproductive bee or waspor a thread in a computer program Thus even within theusages of one and the same derived word there is a lotof semantic heterogeneity that stands in stark contrast tothe straightforward and uniform interpretation of inflectedwords

422 The Indirect Route from Orthography via Phonology toSemantics Although reading starts with orthographic inputit has been shown that phonology actually plays a role duringthe reading process as well For instance developmentalstudies indicate that childrenrsquos reading development can bepredicted by their phonological abilities [87 99] Evidence ofthe influence of phonology on adultsrsquo reading has also beenreported [89 100ndash105] As pointed out by Perrone-Bertolottiet al [106] silent reading often involves an imagery speechcomponent we hear our own ldquoinner voicerdquo while readingSince written words produce a vivid auditory experiencealmost effortlessly they argue that auditory verbal imageryshould be part of any neurocognitive model of readingInterestingly in a study using intracranial EEG recordingswith four epileptic neurosurgical patients they observed thatsilent reading elicits auditory processing in the absence ofany auditory stimulation suggesting that auditory images arespontaneously evoked during reading (see also [107])

To explore the role of auditory images in silent readingwe operationalized sound images bymeans of phone trigrams(in our model phone trigrams are independently motivatedas intermediate targets of speech production see Section 5for further discussion) We therefore ran the model againwith phone trigrams instead of letter triplets The semanticmatrix S was exactly the same as above However a newtransformation matrix F was obtained for mapping a 11480times 5929 cue matrix C of phone trigrams onto S To our sur-prise the triphone model outperformed the trigram modelsubstantially Overall accuracy rose to 78 an improvementof almost 20

16 Complexity

In order to integrate this finding into our model weimplemented an additional network K that maps ortho-graphic vectors (coding the presence or absence of trigrams)onto phonological vectors (indicating the presence or absenceof triphones) The result is a dual route model for (silent)reading with a first route utilizing a network mappingstraight from orthography to semantics and a secondindirect route utilizing two networks one mapping fromtrigrams to triphones and the other subsequently mappingfrom triphones to semantics

The network K which maps trigram vectors onto tri-phone vectors provides good support for the relevant tri-phones but does not provide guidance as to their orderingUsing a graph-based algorithm detailed in the Appendix (seealso Section 52) we found that for 92 of the words thecorrect sequence of triphones jointly spanning the wordrsquossequence of letters received maximal support

We then constructed amatrix Twith the triphone vectorspredicted from the trigrams by networkK and defined a newnetwork H mapping these predicted triphone vectors ontothe semantic vectors SThemean correlation for the correctlyrecognized words was 090 and the median correlation was094 For words that were not recognized correctly the meanand median correlation were 045 and 043 respectively Wenow have for each word two estimated semantic vectors avector s1 obtained with network F of the first route and avector s2 obtained with network H from the auditory targetsthat themselves were predicted by the orthographic cuesThemean of the correlations of these pairs of vectors was 119903 = 073(119901 lt 00001)

To assess to what extent the networks that we definedare informative for actual visual word recognition we madeuse of the reaction times in the visual lexical decision taskavailable in the BLPAll 11480words in the current dataset arealso included in BLPWe derived threemeasures from each ofthe two networks

The first measure is the sum of the L1-norms of s1 ands2 to which we will refer as a wordrsquos total activation diversityThe total activation diversity is an estimate of the support fora wordrsquos lexicality as provided by the two routes The secondmeasure is the correlation of s1 and s2 to which we will referas a wordrsquos route congruency The third measure is the L1-norm of a lexomersquos column vector in S to which we referas its prior This last measure has previously been observedto be a strong predictor for lexical decision latencies [59] Awordrsquos prior is an estimate of a wordrsquos entrenchment and prioravailability

We fitted a generalized additive model to the inverse-transformed response latencies in the BLP with these threemeasures as key covariates of interest including as controlpredictors word length andword type (derived inflected andmonolexomic with derived as reference level) As shown inTable 5 inflected words were responded to more slowly thanderived words and the same holds for monolexomic wordsalbeit to a lesser extent Response latencies also increasedwith length Total activation diversity revealed a U-shapedeffect (Figure 5 left panel) For all but the lowest totalactivation diversities we find that response times increase as

total activation diversity increases This result is consistentwith previous findings for auditory word recognition [18]As expected given the results of Baayen et al [59] the priorwas a strong predictor with larger priors affording shorterresponse times There was a small effect of route congruencywhich emerged in a nonlinear interaction with the priorsuch that for large priors a greater route congruency affordedfurther reduction in response time (Figure 5 right panel)Apparently when the two routes converge on the samesemantic vector uncertainty is reduced and a faster responsecan be initiated

For comparison the dual-route model was implementedwith ndl as well Three measures were derived from thenetworks and used to predict the same response latenciesThese measures include activation activation diversity andprior all of which have been reported to be reliable predictorsfor RT in visual lexical decision [54 58 59] However sincehere we are dealing with two routes the three measures canbe independently derived from both routes We thereforesummed up the measures of the two routes obtaining threemeasures total activation total activation diversity and totalprior With word type and word length as control factorsTable 6 shows that thesemeasures participated in a three-wayinteraction presented in Figure 6 Total activation showed aU-shaped effect on RT that is increasingly attenuated as totalactivation diversity is increased (left panel) Total activationalso interacted with the total prior (right panel) For mediumrange values of total activation RTs increased with totalactivation and as expected decreased with the total priorThecenter panel shows that RTs decrease with total prior formostof the range of total activation diversity and that the effect oftotal activation diversity changes sign going from low to highvalues of the total prior

Model comparison revealed that the generalized additivemodel with ldl measures (Table 5) provides a substantiallyimproved fit to the data compared to the GAM using ndlmeasures (Table 6) with a difference of no less than 65101AIC units At the same time the GAM based on ldl is lesscomplex

Given the substantially better predictability of ldl mea-sures on human behavioral data one remaining question iswhether it is really the case that two routes are involved insilent reading After all the superior accuracy of the secondroute might be an artefact of a simple machine learningtechnique performing better for triphones than for trigramsThis question can be addressed by examining whether thefit of the GAM summarized in Table 5 improves or worsensdepending on whether the activation diversity of the first orthe second route is taken out of commission

When the GAM is provided access to just the activationdiversity of s2 the AIC increased by 10059 units Howeverwhen themodel is based only on the activation diversity of s1themodel fit increased by no less than 14790 AIC units Fromthis we conclude that at least for the visual lexical decisionlatencies in the BLP the second route first mapping trigramsto triphones and then mapping triphones onto semanticvectors plays the more important role

The superiority of the second route may in part bedue to the number of triphone features being larger

Complexity 17

minus0295

minus00175

026

part

ial e

ffect

1 2 3 4 50Total activation diversity

000

005

010

015

Part

ial e

ffect

05

10

15

20

Prio

r

02 04 06 08 1000Selfminuscorrelation

Figure 5 The partial effects of total activation diversity (left) and the interaction of route congruency and prior (right) on RT in the BritishLexicon Project

Table 5 Summary of a generalized additive model fitted to response latencies in visual lexical decision using measures based on ldl s thinplate regression spline smooth te tensor product smooth

A parametric coefficients Estimate Std Error t-value p-valueintercept -17774 00087 -205414 lt0001word typeInflected 01110 00059 18742 lt0001word typeMonomorphemic 00233 00054 4322 lt0001word length 00117 00011 10981 lt0001B smooth terms edf Refdf F p-values(total activation diversity) 6002 7222 236 lt0001te(route congruency prior) 14673 17850 2137 lt0001

02 06 10 14

NDL total activation

minus0171

0031

0233

part

ial e

ffect

minus4 minus2 0 2 4

10

15

20

25

30

35

40

NDL total prior

ND

L to

tal a

ctiv

atio

n di

vers

ity

minus0416

01915

0799

part

ial e

ffect

02 06 10 14

minus4

minus2

02

4

NDL total activation

ND

L to

tal p

rior

minus0158

01635

0485

part

ial e

ffect

10

15

20

25

30

35

40

ND

L to

tal a

ctiv

atio

n di

vers

ity

Figure 6The interaction of total activation and total activation diversity (left) of total prior by total activation diversity (center) and of totalactivation by total prior (right) on RT in the British Lexicon Project based on ndl

18 Complexity

Table 6 Summary of a generalized additive model fitted to response latencies in visual lexical decision using measures from ndl te tensorproduct smooth

A parametric coefficients Estimate Std Error t-value p-valueintercept -16934 00086 -195933 lt0001word typeInflected 00286 00052 5486 lt0001word typeMonomorphemic 00042 00055 0770 = 4word length 00066 00011 5893 lt0001B smooth terms edf Refdf F p-valuete(activation activation diversity prior) 4134 4972 1084 lt0001

than the number of trigram features (3465 trigrams ver-sus 5929 triphones) More features which mathematicallyamount to more predictors enable more precise map-pings Furthermore the heavy use made in English ofletter sequences such as ough (with 10 different pro-nunciations httpswwwdictionarycomesough)reduces semantic discriminability compared to the corre-sponding spoken forms It is noteworthy however that thebenefits of the triphone-to-semantics mapping are possibleonly thanks to the high accuracy with which orthographictrigram vectors are mapped onto phonological triphonevectors (92)

It is unlikely that the lsquophonological routersquo is always dom-inant in silent reading Especially in fast lsquodiagonalrsquo readingthe lsquodirect routersquomay bemore dominantThere is remarkablealthough for the present authors unexpected convergencewith the dual route model for reading aloud of Coltheartet al [108] and Coltheart [109] However while the text-to-phonology route of their model has as primary function toexplain why nonwords can be pronounced our results showthat both routes can actually be active when silently readingreal words A fundamental difference is however that in ourmodel wordsrsquo semantics play a central role

43 Auditory Comprehension For the modeling of readingwe made use of letter trigrams as cues These cues abstractaway from the actual visual patterns that fall on the retinapatterns that are already transformed at the retina beforebeing sent to the visual cortex Our hypothesis is that lettertrigrams represent those high-level cells or cell assemblies inthe visual system that are critical for reading andwe thereforeleave the modeling possibly with deep learning networksof how patterns on the retina are transformed into lettertrigrams for further research

As a consequence of the high level of abstraction of thetrigrams a word form is represented by a unique vectorspecifying which of a fixed set of letter trigrams is present inthe word Although one might consider modeling auditorycomprehension with phone triplets (triphones) replacing theletter trigrams of visual comprehension such an approachwould not do justice to the enormous variability of actualspeech Whereas the modeling of reading printed wordscan depart from the assumption that the pixels of a wordrsquosletters on a computer screen are in a fixed configurationindependently of where the word is shown on the screenthe speech signal of the same word type varies from token

to token as illustrated in Figure 7 for the English wordcrisis A survey of the Buckeye corpus [110] of spontaneousconversations recorded at Columbus Ohio [111] indicatesthat around 5 of the words are spoken with one syllablemissing and that a little over 20 of words have at least onephone missing

It is widely believed that the phoneme as an abstractunit of sound is essential for coming to grips with thehuge variability that characterizes the speech signal [112ndash114]However the phoneme as theoretical linguistic constructis deeply problematic [93] and for many spoken formscanonical phonemes do not do justice to the phonetics ofthe actual sounds [115] Furthermore if words are defined assequences of phones the problem arises what representationsto posit for words with two ormore reduced variants Addingentries for reduced forms to the lexicon turns out not toafford better overal recognition [116] Although exemplarmodels have been put forward to overcome this problem[117] we take a different approach here and following Arnoldet al [18] lay out a discriminative approach to auditorycomprehension

The cues that we make use of to represent the acousticsignal are the Frequency Band Summary Features (FBSFs)introduced by Arnold et al [18] as input cues FBSFs summa-rize the information present in the spectrogram of a speechsignal The algorithm that derives FBSFs first chunks theinput at the minima of the Hilbert amplitude envelope of thesignalrsquos oscillogram (see the upper panel of Figure 8) Foreach chunk the algorithm distinguishes 21 frequency bandsin the MEL scaled spectrum and intensities are discretizedinto 5 levels (lower panel in Figure 8) for small intervalsof time For each chunk and for each frequency band inthese chunks a discrete feature is derived that specifies chunknumber frequency band number and a description of thetemporal variation in the band bringing together minimummaximum median initial and final intensity values The21 frequency bands are inspired by the 21 receptive areason the cochlear membrane that are sensitive to differentranges of frequencies [118] Thus a given FBSF is a proxyfor cell assemblies in the auditory cortex that respond to aparticular pattern of changes over time in spectral intensityThe AcousticNDLCodeR package [119] for R [120] wasemployed to extract the FBSFs from the audio files

We tested ldl on 20 hours of speech sampled from theaudio files of the ucla library broadcast newsscapedata a vast repository of multimodal TV news broadcasts

Complexity 19

0 5000 10000 15000 20000 25000 30000 35000 40000minus6000

minus4000

minus2000

0

2000

4000

6000

8000

0 5000 10000 15000 20000 25000 30000 35000 40000minus6000

minus4000

minus2000

0

2000

4000

6000

8000

Figure 7Oscillogram for twodifferent realizations of theword crisiswith different degrees of reduction in theNewsScape archive [kraız](left)and [khraısıs](right)

00 01 02 03 04 05 06time

minus020minus015minus010minus005

000005010015

ampl

itude

1

21

frequ

ency

ban

d

Figure 8 Oscillogram of the speech signal for a realization of the word economic with Hilbert envelope (in orange) outlining the signal isshown on the top panel The lower panel depicts the discretized MEL scaled spectrum of the signalThe vertical bars are the boundary pointsthat partition the signal into 4 chunks For one chunk horizontal lines (in red) highlight one of the frequency bands for which the FBSFsprovide summaries of the variation over time in that frequency band The FBSF for the highlighted band is ldquoband11-start3-median3-min2-max5-end3-part2rdquo

provided to us by the Distributed Little Red Hen Lab Theaudio files of this resource were automatically classified asclean for relatively clean parts where there is speech withoutbackground noise or music and noisy for speech snippetswhere background noise or music is present Here we reportresults for 20 hours of clean speech to a total of 131673word tokens (representing 4779word types) with in all 40639distinct FBSFs

The FBSFs for the word tokens are brought together ina matrix C119886 with dimensions 131673 audio tokens times 40639FBSFs The targeted semantic vectors are taken from the Smatrix which is expanded to a matrix with 131673 rows onefor each audio token and 4609 columns the dimension of

the semantic vectors Although the transformation matrix Fcould be obtained by calculating C1015840S the calculation of C1015840is numerically expensive To reduce computational costs wecalculated F as follows (the transpose of a square matrix Xdenoted by X119879 is obtained by replacing the upper triangleof the matrix by the lower triangle and vice versa Thus( 3 87 2 )119879 = ( 3 78 2 ) For a nonsquare matrix the transpose isobtained by switching rows and columnsThus a 4times2 matrixbecomes a 2 times 4 matrix when transposed)

CF = S

C119879CF = C119879S

20 Complexity

(C119879C)minus1 (C119879C) F = (C119879C)minus1 C119879SF = (C119879C)minus1 C119879S

(17)

In this way matrix inversion is required for a much smallermatrixC119879C which is a square matrix of size 40639 times 40639

To evaluate model performance we compared the esti-mated semantic vectors with the targeted semantic vectorsusing the Pearson correlation We therefore calculated the131673 times 131673 correlation matrix for all possible pairs ofestimated and targeted semantic vectors Precisionwas calcu-lated by comparing predicted vectors with the gold standardprovided by the targeted vectors Recognition was defined tobe successful if the correlation of the predicted vector withthe targeted gold vector was the highest of all the pairwisecorrelations of this predicted vector with any of the other goldsemantic vectors Precision defined as the proportion of cor-rect recognitions divided by the total number of audio tokenswas at 3361 For the correctly identified words the meancorrelation was 072 and for the incorrectly identified wordsit was 055 To place this in perspective a naive discrimi-nation model with discrete lexomes as output performed at12 and a deep convolution network Mozilla DeepSpeech(httpsgithubcommozillaDeepSpeech based onHannun et al [9]) performed at 6 The low performanceofMozilla DeepSpeech is due primarily due to its dependenceon a languagemodelWhenpresentedwith utterances insteadof single words it performs remarkably well

44 Discussion A problem with naive discriminative learn-ing that has been puzzling for a long time is that measuresbased on ndl performed well as predictors of processingtimes [54 58] whereas accuracy for lexical decisions waslow An accuracy of 27 for a model performing an 11480-classification task is perhaps reasonable but the lack ofprecision is unsatisfactory when the goal is to model humanvisual lexicality decisions Bymoving fromndl to ldlmodelaccuracy is substantially improved (to 59) At the sametime predictions for reaction times improved considerablyas well As lexicality decisions do not require word identifica-tion further improvement in predicting decision behavior isexpected to be possible by considering not only whether thepredicted semantic is closest to the targeted vector but alsomeasures such as how densely the space around the predictedsemantic vector is populated

Accuracy for auditory comprehension is lower for thedata we considered above at around 33 Interestingly aseries of studies indicates that recognizing isolated wordstaken out of running speech is a nontrivial task also forhuman listeners [121ndash123] Correct identification by nativespeakers of 1000 randomly sampled word tokens from aGerman corpus of spontaneous conversational speech rangedbetween 21 and 44 [18] For both human listeners andautomatic speech recognition systems recognition improvesconsiderably when words are presented in their naturalcontext Given that ldl with FBSFs performs very wellon isolated word recognition it seems worth investigating

further whether the present approach can be developed into afully-fledged model of auditory comprehension that can takefull utterances as input For a blueprint of how we plan toimplement such a model see Baayen et al [124]

5 Speech Production

This section examines whether we can predict wordsrsquo formsfrom the semantic vectors of S If this is possible for thepresent dataset with reasonable accuracy we have a proofof concept that discriminative morphology is feasible notonly for comprehension but also for speech productionThe first subsection introduces the computational imple-mentation The next subsection reports on the modelrsquosperformance which is evaluated first for monomorphemicwords then for inflected words and finally for derivedwords Section 53 provides further evidence for the pro-duction network by showing that as the support from thesemantics for the triphones becomes weaker the amount oftime required for articulating the corresponding segmentsincreases

51 Computational Implementation For a productionmodelsome representational format is required for the output thatin turn drives articulation In what follows we make useof triphones as output features Triphones capture part ofthe contextual dependencies that characterize speech andthat render problematic the phoneme as elementary unit ofa phonological calculus [93] Triphones are in many waysnot ideal in that they inherit the limitations that come withdiscrete units Other output features structured along thelines of gestural phonology [125] or time series of movementsof key articulators registered with electromagnetic articulog-raphy or ultrasound are on our list for further explorationFor now we use triphones as a convenience construct andwe will show that given the support from the semantics forthe triphones the sequence of phones can be reconstructedwith high accuracy As some models of speech productionassemble articulatory targets from phone segments (eg[126]) the present implementation can be seen as a front-endfor this type of model

Before putting the model to the test we first clarify theway the model works by means of our toy lexicon with thewords one two and three Above we introduced the semanticmatrix S (13) which we repeat here for convenience

S = one two threeonetwothree

( 100201031001

040110 ) (18)

as well as an C indicator matrix specifying which triphonesoccur in which words (12) As in what follows this matrixspecifies the triphones targeted for productionwe henceforthrefer to this matrix as the Tmatrix

Complexity 21

T

= wV wVn Vn tu tu Tr Tri rionetwothree

( 100100

100010

010001

001001 ) (19)

For production our interest is in thematrixG that transformsthe row vectors of S into the row vectors of T ie we need tosolve

SG = T (20)

Given G we can predict for any semantic vector s the vectorof triphones t that quantifies the support for the triphonesprovided by s simply by multiplying s with G

t = sG (21)

As before the transformation matrix G is straightforwardto estimate Let S1015840 denote the Moore-Penrose generalizedinverse of S Since

S1015840SG = S1015840T (22)

we have

G = S1015840T (23)

Given G we can predict the triphone matrix T from thesemantic matrix S

SG = T (24)

For the present example the inverse of S and S1015840 is

S1015840 = one two threeonetwothree

( 110minus021minus009minus029107minus008

minus041minus002104 ) (25)

and the transformation matrix G is

G = wV wVn Vn tu tu Tr Tri ri

onetwothree

( 110minus021minus009110minus021minus009

110minus021minus009minus029107minus008

minus029107minus008minus041minus002104

minus041minus002104minus041minus002104 ) (26)

For this simple example T is virtually identical to T Forrealistic data T will not be identical to T but will be anapproximation of it that is optimal in the least squares senseThe triphones with the strongest support are expected to bethe most likely to be the triphones defining a wordrsquos form

We made use of the S matrix which we derived fromthe tasa corpus as described in Section 3 The majority ofcolumns of the 23561 times 23561 matrix S show very smalldeviations from zero and hence are uninformative As beforewe reduced the number of columns of S by removing columnswith very low variance Here one option is to remove allcolumns with a variance below a preset threshold 120579 Howeverto avoid adding a threshold as a free parameter we setthe number of columns retained to the number of differenttriphones in a given dataset a number which is around 119899 =4500 In summary S denotes a 119908times119899 matrix that specifies foreach of 119908 words an 119899-dimensional semantic vector

52 Model Performance

521 Performance on Monolexomic Words We first exam-ined model performance on a dataset comprising monolex-omicwords that did not carry any inflectional exponentsThisdataset of 3987words comprised three irregular comparatives

(elder less andmore) two irregular superlatives (least most)28 irregular past tense forms and one irregular past participle(smelt) For this set of words we constructed a 3987 times 4446matrix of semantic vectors S119898 (a submatrix of the S matrixintroduced above in Section 3) and a 3987 times 4446 triphonematrix T119898 (a submatrix of T) The number of colums of S119898was set to the number of different triphones (the columndimension of T119898) Those column vectors of S119898 were retainedthat had the 4446 highest variances We then estimated thetransformation matrix G and used this matrix to predict thetriphones that define wordsrsquo forms

We evaluated model performance in two ways First weinspected whether the triphones with maximal support wereindeed the targeted triphones This turned out to be the casefor all words Targeted triphones had an activation valueclose to one and nontargeted triphones an activation closeto zero As the triphones are not ordered we also investi-gated whether the sequence of phones could be constructedcorrectly for these words To this end we set a thresholdof 099 extracted all triphones with an activation exceedingthis threshold and used the all simple paths (a path issimple if the vertices it visits are not visited more than once)function from the igraph package [127] to calculate all pathsstarting with any left-edge triphone in the set of extractedtriphones From the resulting set of paths we selected the

22 Complexity

longest path which invariably was perfectly aligned with thesequence of triphones that defines wordsrsquo forms

We also evaluated model performance with a secondmore general heuristic algorithm that also makes use of thesame algorithm from graph theory Our algorithm sets up agraph with vertices collected from the triphones that are bestsupported by the relevant semantic vectors and considers allpaths it can find that lead from an initial triphone to a finaltriphoneThis algorithm which is presented inmore detail inthe appendix and which is essential for novel complex wordsproduced the correct form for 3982 out of 3987 words Itselected a shorter form for five words int for intent lin forlinnen mis for mistress oint for ointment and pin for pippinThe correct forms were also found but ranked second due toa simple length penalty that is implemented in the algorithm

From these analyses it is clear that mapping nearly4000 semantic vectors on their corresponding triphone pathscan be accomplished with very high accuracy for Englishmonolexomic words The question to be addressed nextis how well this approach works for complex words Wefirst address inflected forms and limit ourselves here to theinflected variants of the present set of 3987 monolexomicwords

522 Performance on Inflected Words Following the classicdistinction between inflection and word formation inflectedwords did not receive semantic vectors of their own Never-theless we can create semantic vectors for inflected words byadding the semantic vector of an inflectional function to thesemantic vector of its base However there are several ways inwhich themapping frommeaning to form for inflectedwordscan be set up To explain this we need some further notation

Let S119898 and T119898 denote the submatrices of S and T thatcontain the semantic and triphone vectors of monolexomicwords Assume that a subset of 119896 of these monolexomicwords is attested in the training corpus with inflectionalfunction 119886 Let T119886 denote the matrix with the triphonevectors of these inflected words and let S119886 denote thecorresponding semantic vectors To obtain S119886 we take thepertinent submatrix S119898119886 from S119898 and add the semantic vectors119886 of the affix

S119886 = S119898119886 + i⨂ s119886 (27)

Here i is a unit vector of length 119896 and ⨂ is the generalizedKronecker product which in (27) stacks 119896 copies of s119886 row-wise As a first step we could define a separate mapping G119886for each inflectional function 119886

S119886G119886 = T119886 (28)

but in this set-up learning of inflected words does not benefitfrom the knowledge of the base words This can be remediedby a mapping for augmented matrices that contain the rowvectors for both base words and inflected words[S119898

S119886]G119886 = [T119898

T119886] (29)

The dimension of G119886 (length of semantic vector by lengthof triphone vector) remains the same so this option is notmore costly than the preceding one Nevertheless for eachinflectional function a separate large matrix is required Amuch more parsimonious solution is to build augmentedmatrices for base words and all inflected words jointly

[[[[[[[[[[S119898S1198861S1198862S119886119899

]]]]]]]]]]G = [[[[[[[[[[

T119898T1198861T1198862T119886119899

]]]]]]]]]](30)

The dimension of G is identical to that of G119886 but now allinflectional functions are dealt with by a single mappingIn what follows we report the results obtained with thismapping

We selected 6595 inflected variants which met the cri-terion that the frequency of the corresponding inflectionalfunction was at least 50 This resulted in a dataset with 91comparatives 97 superlatives 2401 plurals 1333 continuousforms (eg walking) 859 past tense forms and 1086 formsclassed as perfective (past participles) as well as 728 thirdperson verb forms (eg walks) Many forms can be analyzedas either past tenses or as past participles We followed theanalyses of the treetagger which resulted in a dataset inwhichboth inflectional functions are well-attested

Following (30) we obtained an (augmented) 10582 times5483 semantic matrix S whereas before we retained the 5483columns with the highest column variance The (augmented)triphone matrix T for this dataset had the same dimensions

Inspection of the activations of the triphones revealedthat targeted triphones had top activations for 85 of themonolexomic words and 86 of the inflected words Theproportion of words with at most one intruding triphone was97 for both monolexomic and inflected words The graph-based algorithm performed with an overall accuracy of 94accuracies broken down bymorphology revealed an accuracyof 99 for the monolexomic words and an accuracy of 92for inflected words One source of errors for the productionalgorithm is inconsistent coding in the celex database Forinstance the stem of prosper is coded as having a final schwafollowed by r but the inflected forms are coded without ther creating a mismatch between a partially rhotic stem andcompletely non-rhotic inflected variants

We next put model performance to a more stringent testby using 10-fold cross-validation for the inflected variantsFor each fold we trained on all stems and 90 of all inflectedforms and then evaluated performance on the 10of inflectedforms that were not seen in training In this way we can ascer-tain the extent to which our production system (network plusgraph-based algorithm for ordering triphones) is productiveWe excluded from the cross-validation procedure irregularinflected forms forms with celex phonological forms withinconsistent within-paradigm rhoticism as well as forms thestem of which was not available in the training set Thus

Complexity 23

cross-validation was carried out for a total of 6236 inflectedforms

As before the semantic vectors for inflected words wereobtained by addition of the corresponding content and inflec-tional semantic vectors For each training set we calculatedthe transformationmatrixG from the S andTmatrices of thattraining set For an out-of-bag inflected form in the test setwe calculated its semantic vector and multiplied this vectorwith the transformation matrix (using (21)) to obtain thepredicted triphone vector t

The proportion of forms that were predicted correctlywas 062 The proportion of forms ranked second was 025Forms that were incorrectly ranked first typically were otherinflected forms (including bare stems) that happened toreceive stronger support than the targeted form Such errorsare not uncommon in spoken English For instance inthe Buckeye corpus [110] closest is once reduced to closFurthermore Dell [90] classified forms such as concludementfor conclusion and he relax for he relaxes as (noncontextual)errors

The algorithm failed to produce the targeted form for3 of the cases Examples of the forms produced instead arethe past tense for blazed being realized with [zId] insteadof [zd] the plural mouths being predicted as having [Ts] ascoda rather than [Ds] and finest being reduced to finst Thevoiceless production formouth does not follow the dictionarynorm but is used as attested by on-line pronunciationdictionaries Furthermore the voicing alternation is partlyunpredictable (see eg [128] for final devoicing in Dutch)and hence model performance here is not unreasonable Wenext consider model accuracy for derived words

523 Performance on Derived Words When building thevector space model we distinguished between inflectionand word formation Inflected words did not receive theirown semantic vectors By contrast each derived word wasassigned its own lexome together with a lexome for itsderivational function Thus happiness was paired with twolexomes happiness and ness Since the semantic matrix forour dataset already contains semantic vectors for derivedwords we first investigated how well forms are predictedwhen derived words are assessed along with monolexomicwords without any further differentiation between the twoTo this end we constructed a semantic matrix S for 4885words (rows) by 4993 (columns) and constructed the trans-formation matrix G from this matrix and the correspondingtriphone matrix T The predicted form vectors of T =SG supported the targeted triphones above all other tri-phones without exception Furthermore the graph-basedalgorithm correctly reconstructed 99 of all forms withonly 5 cases where it assigned the correct form secondrank

Next we inspected how the algorithm performs whenthe semantic vectors of derived forms are obtained fromthe semantic vectors of their base words and those of theirderivational lexomes instead of using the semantic vectors ofthe derived words themselves To allow subsequent evalua-tion by cross-validation we selected those derived words that

contained an affix that occured at least 30 times in our data set(again (38) agent (177) ful (45) instrument (82) less(54) ly (127) and ness (57)) to a total of 770 complex words

We first combined these derived words with the 3987monolexomic words For the resulting 4885 words the Tand S matrices were constructed from which we derivedthe transformation matrix G and subsequently the matrixof predicted triphone strengths T The proportion of wordsfor which the targeted triphones were the best supportedtriphones was 096 and the graph algorithm performed withan accuracy of 989

In order to assess the productivity of the system weevaluated performance on derived words by means of 10-fold cross-validation For each fold we made sure that eachaffix was present proportionally to its frequency in the overalldataset and that a derived wordrsquos base word was included inthe training set

We first examined performance when the transformationmatrix is estimated from a semantic matrix that contains thesemantic vectors of the derived words themselves whereassemantic vectors for unseen derived words are obtained bysummation of the semantic vectors of base and affix It turnsout that this set-up results in a total failure In the presentframework the semantic vectors of derived words are tooidiosyncratic and too finely tuned to their own collocationalpreferences They are too scattered in semantic space tosupport a transformation matrix that supports the triphonesof both stem and affix

We then used exactly the same cross-validation procedureas outlined above for inflected words constructing semanticvectors for derived words from the semantic vectors of baseand affix and calculating the transformation matrix G fromthese summed vectors For unseen derivedwordsGwas usedto transform the semantic vectors for novel derived words(obtained by summing the vectors of base and affix) intotriphone space

For 75 of the derived words the graph algorithmreconstructed the targeted triphones For 14 of the derivednouns the targeted form was not retrieved These includecases such as resound which the model produced with [s]instead of the (unexpected) [z] sewer which the modelproduced with R instead of only R and tumbler where themodel used syllabic [l] instead of nonsyllabic [l] given in thecelex target form

53 Weak Links in the Triphone Graph and Delays in SpeechProduction An important property of the model is that thesupport for trigrams changes where a wordrsquos graph branchesout for different morphological variants This is illustratedin Figure 9 for the inflected form blending Support for thestem-final triphone End is still at 1 but then blend can endor continue as blends blended or blending The resultinguncertainty is reflected in the weights on the edges leavingEnd In this example the targeted ing form is driven by theinflectional semantic vector for continuous and hence theedge to ndI is best supported For other forms the edgeweights will be different and hence other paths will be bettersupported

24 Complexity

1

1

1051

024023

03

023

1

001 001

1

02

03

023

002

1 0051

bl

blE

lEn

EndndI

dIN

IN

INI

NIN

dId

Id

IdI

dII

IIN

ndz

dz

dzI

zIN

nd

EnI

nIN

lEI

EIN

Figure 9 The directed graph for blending Vertices represent triphones including triphones such as IIN that were posited by the graphalgorithm to bridge potentially relevant transitions that are not instantiated in the training data Edges are labelled with their edge weightsThe graph the edge weights of which are specific to the lexomes blend and continuous incorporates not only the path for blending butalso the paths for blend blends and blended

The uncertainty that arises at branching points in thetriphone graph is of interest in the light of several exper-imental results For instance inter keystroke intervals intyping become longer at syllable and morph boundaries[129 130] Evidence from articulography suggests variabilityis greater at morph boundaries [131] Longer keystroke exe-cution times and greater articulatory variability are exactlywhat is expected under reduced edge support We thereforeexamined whether the edge weights at the first branchingpoint are predictive for lexical processing To this end weinvestigated the acoustic duration of the segment at the centerof the first triphone with a reduced LDL edge weight (in thepresent example d in ndI henceforth lsquobranching segmentrsquo)for those words in our study that are attested in the Buckeyecorpus [110]

The dataset we extracted from the Buckeye corpus com-prised 15105 tokens of a total of 1327 word types collectedfrom 40 speakers For each of these words we calculatedthe relative duration of the branching segment calculatedby dividing segment duration by word duration For thepurposes of statistical evaluation the distribution of thisrelative duration was brought closer to normality bymeans ofa logarithmic transformationWe fitted a generalized additivemixed model [132] to log relative duration with randomintercepts for word and speaker log LDL edge weight aspredictor of interest and local speech rate (Phrase Rate)log neighborhood density (Ncount) and log word frequencyas control variables A summary of this model is presentedin Table 7 As expected an increase in LDL Edge Weight

goes hand in hand with a reduction in the duration of thebranching segment In other words when the edge weight isreduced production is slowed

The adverse effects of weaknesses in a wordrsquos path inthe graph where the path branches is of interest againstthe discussion about the function of weak links in diphonetransitions in the literature on comprehension [133ndash138] Forcomprehension it has been argued that bigram or diphonelsquotroughsrsquo ie low transitional probabilities in a chain of hightransitional probabilities provide points where sequencesare segmented and parsed into their constituents Fromthe perspective of discrimination learning however low-probability transitions function in exactly the opposite wayfor comprehension [54 124] Naive discriminative learningalso predicts that troughs should give rise to shorter process-ing times in comprehension but not because morphologicaldecomposition would proceed more effectively Since high-frequency boundary bigrams and diphones are typically usedword-internally across many words they have a low cuevalidity for thesewords Conversely low-frequency boundarybigrams are much more typical for very specific base+affixcombinations and hence are better discriminative cues thatafford enhanced activation of wordsrsquo lexomes This in turngives rise to faster processing (see also Ramscar et al [78] forthe discriminative function of low-frequency bigrams in low-frequency monolexomic words)

There is remarkable convergence between the directedgraphs for speech production such as illustrated in Figure 9and computational models using temporal self-organizing

Complexity 25

Table 7 Statistics for the partial effects of a generalized additive mixed model fitted to the relative duration of edge segments in the Buckeyecorpus The trend for log frequency is positive accelerating TPRS thin plate regression spline

A parametric coefficients Estimate Std Error t-value p-valueIntercept -19165 00592 -323908 lt 00001Log LDL Edge Weight -01358 00400 -33960 00007Phrase Rate 00047 00021 22449 00248Log Ncount 01114 00189 58837 lt 00001B smooth terms edf Refdf F-value p-valueTPRS log Frequency 19007 19050 75495 00004random intercepts word 11498839 13230000 314919 lt 00001random intercepts speaker 302995 390000 345579 lt 00001

maps (TSOMs [139ndash141]) TSOMs also use error-drivenlearning and in recent work attention is also drawn toweak edges in wordsrsquo paths in the TSOM and the conse-quences thereof for lexical processing [142] We are not usingTSOMs however one reason being that in our experiencethey do not scale up well to realistically sized lexiconsA second reason is that we are pursuing the hypothesisthat form serves meaning and that self-organization ofform by itself is not necessary This hypothesis is likelytoo strong especially as we do not provide a rationale forthe trigram and triphone units that we use to representaspects of form It is worth noting that spatial organizationof form features is not restricted to TSOMs Although inour approach the spatial organization of triphone features isleft unspecified such an organization can be easily enforced(see [85] for further details) by using an algorithm suchas graphopt which self-organizes the vertices of a graphinto a two-dimensional plane Figure 9 was obtained withthis algorithm (Graphopt was developed for the layout oflarge graphs httpwwwschmuhlorggraphopt andis implemented in the igraph package Graphopt uses basicprinciples of physics to iteratively determine an optimallayout Each node in the graph is given both mass and anelectric charge and edges between nodes are modeled asspringsThis sets up a system inwhich there are attracting andrepelling forces between the vertices of the graph and thisphysical system is simulated until it reaches an equilibrium)

54 Discussion Our production network reconstructsknown wordsrsquo forms with a high accuracy 999 formonolexomic words 92 for inflected words and 99 forderived words For novel complex words accuracy under10-fold cross-validation was 62 for inflected words and 75for derived words

The drop in performance for novel forms is perhapsunsurprising given that speakers understand many morewords than they themselves produce even though they hearnovel forms on a fairly regular basis as they proceed throughlife [79 143]

However we also encountered several technical problemsthat are not due to the algorithm but to the representationsthat the algorithm has had to work with First it is surprisingthat accuracy is as high as it is given that the semanticvectors are constructed from a small corpus with a very

simple discriminative algorithm Second we encounteredinconsistencies in the phonological forms retrieved from thecelex database inconsistencies that in part are due to theuse of discrete triphones Third many cases where the modelpredictions do not match the targeted triphone sequence thetargeted forms have a minor irregularity (eg resound with[z] instead of [s]) Fourth several of the typical errors that themodel makes are known kinds of speech errors or reducedforms that one might encounter in engaged conversationalspeech

It is noteworthy that themodel is almost completely data-driven The triphones are derived from celex and otherthan somemanual corrections for inconsistencies are derivedautomatically from wordsrsquo phone sequences The semanticvectors are based on the tasa corpus and were not in any wayoptimized for the production model Given the triphone andsemantic vectors the transformation matrix is completelydetermined No by-hand engineering of rules and exceptionsis required nor is it necessary to specify with hand-codedlinks what the first segment of a word is what its secondsegment is etc as in the weaver model [37] It is only in theheuristic graph algorithm that three thresholds are requiredin order to avoid that graphs become too large to remaincomputationally tractable

The speech production system that emerges from thisapproach comprises first of all an excellent memory forforms that have been encountered before Importantly thismemory is not a static memory but a dynamic one It is not arepository of stored forms but it reconstructs the forms fromthe semantics it is requested to encode For regular unseenforms however a second network is required that projects aregularized semantic space (obtained by accumulation of thesemantic vectors of content and inflectional or derivationalfunctions) onto the triphone output space Importantly itappears that no separate transformations or rules are requiredfor individual inflectional or derivational functions

Jointly the two networks both of which can also betrained incrementally using the learning rule ofWidrow-Hoff[44] define a dual route model attempts to build a singleintegrated network were not successfulThe semantic vectorsof derived words are too idiosyncratic to allow generalizationfor novel forms It is the property of the semantic vectorsof derived lexomes described in Section 323 namely thatthey are close to but not inside the cloud of their content

26 Complexity

lexomes which makes them productive Novel forms do notpartake in the idiosyncracies of lexicalized existing wordsbut instead remain bound to base and derivational lexomesIt is exactly this property that makes them pronounceableOnce produced a novel form will then gravitate as experi-ence accumulates towards the cloud of semantic vectors ofits morphological category meanwhile also developing itsown idiosyncracies in pronunciation including sometimeshighly reduced pronunciation variants (eg tyk for Dutchnatuurlijk ([natyrlk]) [111 144 145] It is perhaps possibleto merge the two routes into one but for this much morefine-grained semantic vectors are required that incorporateinformation about discourse and the speakerrsquos stance withrespect to the adressee (see eg [115] for detailed discussionof such factors)

6 Bringing in Time

Thus far we have not considered the role of time The audiosignal comes in over time longerwords are typically readwithmore than one fixation and likewise articulation is a temporalprocess In this section we briefly outline how time can bebrought into the model We do so by discussing the readingof proper names

Proper names (morphologically compounds) pose achallenge to compositional theories as the people referredto by names such as Richard Dawkins Richard NixonRichardThompson and SandraThompson are not obviouslysemantic composites of each other We therefore assignlexomes to names irrespective of whether the individu-als referred to are real or fictive alive or dead Further-more we assume that when the personal name and familyname receive their own fixations both names are under-stood as pertaining to the same named entity which istherefore coupled with its own unique lexome Thus forthe example sentences in Table 8 the letter trigrams ofJohn are paired with the lexome JohnClark and like-wise the letter trigrams of Clark are paired with this lex-ome

By way of illustration we obtained thematrix of semanticvectors training on 100 randomly ordered tokens of thesentences of Table 8 We then constructed a trigram cuematrix C specifying for each word (John Clark wrote )which trigrams it contains In parallel a matrix L specify-ing for each word its corresponding lexomes (JohnClarkJohnClark write ) was set up We then calculatedthe matrix F by solving CF = L and used F to cal-culate estimated (predicted) semantic vectors L Figure 10presents the correlations of the estimated semantic vectorsfor the word forms John John Welsch Janet and Clarkwith the targeted semantic vectors for the named entitiesJohnClark JohnWelsch JohnWiggam AnneHastieand JaneClark For the sequentially read words John andWelsch the semantic vector generated by John and thatgenerated by Welsch were summed to obtain the integratedsemantic vector for the composite name

Figure 10 upper left panel illustrates that upon readingthe personal name John there is considerable uncertainty

about which named entity is at issue When subsequentlythe family name is read (upper right panel) uncertainty isreduced and John Welsch now receives full support whereasother named entities have correlations close to zero or evennegative correlations As in the small world of the presenttoy example Janet is a unique personal name there is nouncertainty about what named entity is at issue when Janetis read For the family name Clark on its own by contrastJohn Clark and Janet Clark are both viable options

For the present example we assumed that every ortho-graphic word received one fixation However depending onwhether the eye lands far enough into the word namessuch as John Clark and compounds such as graph theorycan also be read with a single fixation For single-fixationreading learning events should comprise the joint trigrams ofthe two orthographic words which are then simultaneouslycalibrated against the targeted lexome (JohnClark graph-theory)

Obviously the true complexities of reading such asparafoveal preview are not captured by this example Never-theless as a rough first approximation this example showsa way forward for integrating time into the discriminativelexicon

7 General Discussion

We have presented a unified discrimination-driven modelfor visual and auditory word comprehension and wordproduction the discriminative lexicon The layout of thismodel is visualized in Figure 11 Input (cochlea and retina)and output (typing and speaking) systems are presented inlight gray the representations of themodel are shown in darkgray For auditory comprehension we make use of frequencyband summary (FBS) features low-level cues that we deriveautomatically from the speech signal The FBS features serveas input to the auditory network F119886 that maps vectors of FBSfeatures onto the semantic vectors of S For reading lettertrigrams represent the visual input at a level of granularitythat is optimal for a functional model of the lexicon (see [97]for a discussion of lower-level visual features) Trigrams arerelatively high-level features compared to the FBS featuresFor features implementing visual input at a much lower levelof visual granularity using histograms of oriented gradients[146] see Linke et al [15] Letter trigrams are mapped by thevisual network F119900 onto S

For reading we also implemented a network heredenoted by K119886 that maps trigrams onto auditory targets(auditory verbal images) represented by triphones Theseauditory triphones in turn are mapped by network H119886onto the semantic vectors S This network is motivated notonly by our observation that for predicting visual lexicaldecision latencies triphones outperform trigrams by a widemargin Network H119886 is also part of the control loop forspeech production see Hickok [147] for discussion of thiscontrol loop and the necessity of auditory targets in speechproduction Furthermore the acoustic durations with whichEnglish speakers realize the stem vowels of English verbs[148] as well as the duration of word final s in English [149]

Complexity 27

Table 8 Example sentences for the reading of proper names To keep the example simple inflectional lexomes are not taken into account

John Clark wrote great books about antsJohn Clark published a great book about dangerous antsJohn Welsch makes great photographs of airplanesJohn Welsch has a collection of great photographs of airplanes landingJohn Wiggam will build great harpsichordsJohn Wiggam will be a great expert in tuning harpsichordsAnne Hastie has taught statistics to great second year studentsAnne Hastie has taught probability to great first year studentsJanet Clark teaches graph theory to second year studentsJohnClark write great book about antJohnClark publish a great book about dangerous antJohnWelsch make great photograph of airplaneJohnWelsch have a collection of great photograph airplane landingJohnWiggam future build great harpsichordJohnWiggam future be a great expert in tuning harpsichordAnneHastie have teach statistics to great second year studentAnneHastie have teach probability to great first year studentJanetClark teach graphtheory to second year student

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

John

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

John Welsch

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

Janet

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

Clark

Pear

son

corr

elat

ion

minus0

50

00

51

0

Figure 10 Correlations of estimated semantic vectors for John JohnWelsch Janet andClarkwith the targeted semantic vectors of JohnClarkJohnWelsch JohnWiggam AnneHastie and JaneClark

can be predicted from ndl networks using auditory targets todiscriminate between lexomes [150]

Speech production is modeled by network G119886 whichmaps semantic vectors onto auditory targets which in turnserve as input for articulation Although current modelsof speech production assemble articulatory targets fromphonemes (see eg [126 147]) a possibility that is certainlycompatible with our model when phones are recontextual-ized as triphones we think that it is worthwhile investigatingwhether a network mapping semantic vectors onto vectors ofarticulatory parameters that in turn drive the articulators canbe made to work especially since many reduced forms arenot straightforwardly derivable from the phone sequences oftheir lsquocanonicalrsquo forms [111 144]

We have shown that the networks of our model havereasonable to high production and recognition accuraciesand also generate a wealth of well-supported predictions forlexical processing Central to all networks is the hypothesisthat the relation between form and meaning can be modeleddiscriminatively with large but otherwise surprisingly simplelinear networks the underlyingmathematics of which (linearalgebra) is well-understood Given the present results it isclear that the potential of linear algebra for understandingthe lexicon and lexical processing has thus far been severelyunderestimated (the model as outlined in Figure 11 is amodular one for reasons of computational convenience andinterpretabilityThe different networks probably interact (seefor some evidence [47]) One such interaction is explored

28 Complexity

typing speaking

orthographic targets

trigrams

auditory targets

triphones

semantic vectors

TASA

auditory cues

FBS features

orthographic cues

trigrams

cochlea retina

To Ta

Ha

GoGa

KaS

Fa Fo

CoCa

Figure 11 Overview of the discriminative lexicon Input and output systems are presented in light gray the representations of the model areshown in dark gray Each of these representations is a matrix with its own set of features Arcs are labeled with the discriminative networks(transformationmatrices) thatmapmatrices onto each otherThe arc from semantic vectors to orthographic vectors is in gray as thismappingis not explored in the present study

in Baayen et al [85] In their production model for Latininflection feedback from the projected triphone paths backto the semantics (synthesis by analysis) is allowed so thatof otherwise equally well-supported paths the path that bestexpresses the targeted semantics is selected For informationflows between systems in speech production see Hickok[147]) In what follows we touch upon a series of issues thatare relevant for evaluating our model

Incremental learning In the present study we esti-mated the weights on the connections of these networkswith matrix operations but importantly these weights canalso be estimated incrementally using the learning rule ofWidrow and Hoff [44] further improvements in accuracyare expected when using the Kalman filter [151] As allmatrices can be updated incrementally (and this holds aswell for the matrix with semantic vectors which are alsotime-variant) and in theory should be updated incrementally

whenever information about the order of learning events isavailable the present theory has potential for studying lexicalacquisition and the continuing development of the lexiconover the lifetime [79]

Morphology without compositional operations Wehave shown that our networks are predictive for a rangeof experimental findings Important from the perspective ofdiscriminative linguistics is that there are no compositionaloperations in the sense of Frege and Russell [1 2 152] Thework on compositionality in logic has deeply influencedformal linguistics (eg [3 153]) and has led to the beliefthat the ldquoarchitecture of the language facultyrdquo is groundedin a homomorphism between a calculus (or algebra) ofsyntactic representations and a calculus (or algebra) based onsemantic primitives Within this tradition compositionalityarises when rules combining representations of form arematched with rules combining representations of meaning

Complexity 29

The approach of Marelli and Baroni [60] who derive thesemantic vector of a complex word from the semantic vectorof its base and a dedicated affix-specific mapping from thepertinent base words to the corresponding derived wordsis in the spirit of compositionality in that a change in form(through affixation) goes hand in hand with a change inmeaning (modeled with a linear mapping in semantic space)

The perspective on morphological processing developedin the present study breaks with this tradition Althoughsemantic vectors for inflected words are obtained by summa-tion of the semantic vectors of the base word and those ofits inflectional functions (see [85] for modeling of the richverbal paradigms of Latin) the form vectors correspondingto the resulting semantic vectors are not obtained by thesummation of vectors for chunks of form nor by any otheroperations on forms Attempts to align chunks of wordsrsquo formwith their semantic structures mdash typically represented bygraphs constructed over semantic primitives (see eg [154])mdash are destined to end in quagmires of arbitrary decisionsabout the number of inflectional classes required (Estonian30 or 300) what the chunks should look like (does Germanhave a morpheme d in its articles and demonstratives derdie das diese ) whether Semitic morphology requiresoperations on form on one or two tiers [155ndash157] andhow many affixal position slots are required for Athebascanlanguages such asNavajo [158] A language that is particularlypainful for compositional theories is Yelı Dnye an Indo-Pacific language spoken on Rossel island (in the south-east ofPapuaNewGuinea)The language has a substantial inventoryof over a 1000 function words used for verb agreementTypically these words are monosyllables that simultaneouslyspecify values for negation tense aspect person and numberof subject deixis evidentiality associated motion and coun-terfactuality Furthermore verbs typically have no less thaneight different stems the use of which depends on particularconstellations of inflectional functions [159 160] Yelı Dnyethus provides a striking counterexample to the hypothesisthat a calculus of meaning would be paralleled by a calculusof form

Whereas the theory of linear discriminative learningradically rejects the idea that morphological processing isgrounded in compositionality as defined in mainstream for-mal linguistic theory it does allow for the conjoint expressionin form of lexical meanings and inflectional functions andfor forms to map onto such conjoint semantic vectors Butjust asmultiple inflectional functions can be expressed jointlyin a single word multiple words can map onto nonconjointsemantic vectors as was illustrated above for named entityrecognition John Wiggam can be understood to refer to avery specific person without this person being a composi-tional function of the proper name John and the surnameWiggam In both these cases there is no isomorphy betweeninflectional functions and lexical meanings on the one handand chunks of form on the other hand

Of course conjoint vectors for lexical meanings andinflectional functions can be implemented andmade to workonly because a Latin verb form such as sapıvissemus isunderstood when building the semantic space S to expresslexomes for person (first) number (plural) voice (active)

tense (pluperfect) mood (subjunctive) and lexical meaning(to know) (see [85] for computational modeling details)Although this could loosely be described as a decomposi-tional approach to inflected words we prefer not to use theterm lsquodecompositionalrsquo as in formal linguistics this termas outlined above has a very different meaning A moreadequate terminology would be that the system is conjointrealizational for production and conjoint inferential for com-prehension (we note here that we have focused on temporallyconjoint realization and inference leaving temporally disjointrealization and inference for sequences of words and possiblycompounds and fixed expression for further research)

Naive versus linear discriminative learning There isone important technical difference between the learningengine of the current study ldl and naive discriminativelearning (ndl) Because ndl makes the simplifying assump-tion that outcomes are orthogonal the weights from the cuesto a given outcome are independent of the weights from thecues to the other outcomes This independence assumptionwhich motivates the word naive in naive discriminativelearning makes it possible to very efficiently update networkweights during incremental learning In ldl by contrast thisindependence no longer holds learning is no longer naiveWe therefore refer to our networks not as ndl networksbut simply as linear discriminative learning networks Actualincremental updating of ldl networks is computationallymore costly but further numerical optimization is possibleand implementations are underway

Scaling An important property of the present approachis that good results are obtained already for datasets ofmodestsize Our semantic matrix is derived from the tasa corpuswhich with only 10 million words is dwarfed by the 28 billionwords of the ukWaC corpus used by Marelli and Baroni [60]more words than any individual speaker can ever encounterin their lifetime Similarly Arnold et al [18] report goodresults for word identification when models are trained on20 hours of speech which contrasts with the huge volumesof speech required for training deep learning networks forspeech recognition Although performance increases some-whatwithmore training data [161] it is clear that considerableheadway can be made with relatively moderate volumesof speech It thus becomes feasible to train models onfor instance Australian English and New Zealand Englishusing already existing speech corpora and to implementnetworks that make precise quantitative predictions for howunderstanding is mediated by listenersrsquo expectations abouttheir interlocutors (see eg [162])

In this study we have obtained good results with asemantic matrix with a dimensionality of around 4000lexomes By itself we find it surprising that a semantic vectorspace can be scaffolded with a mere 4000 words But thisfindingmay also shed light on the phenomenon of childhoodamnesia [163 164] Bauer and Larkina [165] pointed outthat young children have autobiographical memories thatthey may not remember a few years later They argue thatthe rate of forgetting diminishes as we grow older andstabilizes in adulthood Part of this process of forgettingmay relate to the changes in the semantic matrix Cruciallywe expect the correlational structure of lexomes to change

30 Complexity

considerably over time as experience accumulates If this isindeed the case the semantic vectors for the lexomes foryoung children are different from the corresponding lexomesfor older children which will again differ from those ofadults As a consequence autobiographical memories thatwere anchored to the lexomes at a young age can no longerbe accessed as the semantic vectors to which these memorieswere originally anchored no longer exist

No examplars Given how we estimate the transforma-tion matrices with which we navigate between form andmeaning it might seem that our approach is in essence animplementation of an exemplar-based theory of the mentallexicon After all for instance the C119886 matrix specifies foreach auditory word form (exemplar) its pertinent frequencyband summary features However the computational imple-mentation of the present study should not be confused withthe underlying algorithmic conceptualization The underly-ing conceptualization is that learning proceeds incrementallytrial by trial with the weights of transformation networksbeing updated with the learning rule of Widrow and Hoff[44] In the mean trial-by-trial updating with the Widrow-Hoff learning rule results in the same expected connectionweights as obtained by the present estimates using thegeneralized inverse The important conceptual advantage ofincremental learning is however that there is no need to storeexemplars By way of example for auditory comprehensionacoustic tokens are given with the input to the learningsystemThese tokens the ephemeral result of interactingwiththe environment leave their traces in the transformationmatrix F119886 but are not themselves stored The same holdsfor speech production We assume that speakers dynamicallyconstruct the semantic vectors from their past experiencesrather than retrieving these semantic vectors from somedictionary-like fixed representation familiar from currentfile systems The dynamic nature of these semantic vectorsemerged loud and clear from the modeling of novel complexwords for speech production under cross-validation In otherwords our hypothesis is that all input and output vectors areephemeral in the sense that they represent temporary statesof the cognitive system that are not themselves represented inthat system but that leave their traces in the transformationmatrices that jointly characterize and define the cognitivesystem that we approximate with the discriminative lexicon

This dynamic perspective on the mental lexicon asconsisting of several coupled networks that jointly bringthe cognitive system into states that in turn feed into fur-ther networks (not modeled here) for sentence integration(comprehension) and articulation (production) raises thequestion about the status of units in this theory Units playan important role when constructing numeric vectors forform and meaning Units for letter trigrams and phonetrigrams are of course context-enriched letter and phoneunits Content lexomes as well as lexomes for derivationaland inflectional functions are crutches for central semanticfunctions ranging from functions realizing relatively concreteonomasiological functions (lsquothis is a dog not a catrsquo) toabstract situational functions allowing entities events andstates to be specified on the dimensions of time spacequantity aspect etc However crucial to the central thrust of

the present study there are no morphological symbols (unitscombining form andmeaning) normorphological formunitssuch as stems and exponents of any kind in the modelAbove we have outlined the many linguistic considerationsthat have led us not to want to incorporate such units aspart of our theory Importantly as there are no hidden layersin our model it is not possible to seek for lsquosubsymbolicrsquoreflexes of morphological units in patterns of activation overhidden layers Here we depart from earlier connectionistmodels which rejected symbolic representations but retainedthe belief that there should be morphological representationalbeit subsymbolic ones Seidenberg and Gonnerman [166]for instance discuss morphological representations as beinglsquointerlevel representationsrsquo that are emergent reflections ofcorrelations among orthography phonology and semanticsThis is not to say that themore agglutinative a language is themore the present model will seem to the outside observer tobe operating withmorphemes But themore a language tendstowards fusional or polysyntheticmorphology the less strongthis impression will be

Model complexity Next consider the question of howdifficult lexical processing actually is Sidestepping the com-plexities of the neural embedding of lexical processing [126147] here we narrow down this question to algorithmiccomplexity State-of-the-art models in psychology [37 114167] implementmany layers of hierarchically organized unitsand many hold it for an established fact that such units arein some sense psychologically real (see eg [19 91 168])However empirical results can be equally well understoodwithout requiring the theoretical construct of the morpheme(see eg [47 54 58 166 169 170]) The present studyillustrates this point for lsquoboundaryrsquo effects in written andoral production When paths in the triphone graph branchuncertainty increases and processing slows down Althoughobserved lsquoboundaryrsquo effects are compatible with a post-Bloomfieldian lexicon the very same effects arise in ourmodel even though morphemes do not figure in any way inour computations

Importantly the simplicity of the linear mappings in ourmodel of the discriminative lexicon allows us to sidestep theproblem that typically arise in computationally implementedfull-scale hierarchical systems namely that errors arisingat lower levels propagate to higher levels The major stepsforward made in recent end-to-end models in languageengineering suggest that end-to-end cognitive models arealso likely to perform much better One might argue thatthe hidden layers of deep learning networks represent thetraditional lsquohidden layersrsquo of phonemes morphemes andword forms mediating between form and meaning in lin-guistic models and offshoots thereof in psychology Howeverthe case of baboon lexical learning [12 13] illustrates thatthis research strategy is not without risk and that simplerdiscriminative networks can substantially outperform deeplearning networks [15] We grant however that it is conceiv-able that model predictions will improve when the presentlinear networks are replaced by deep learning networks whenpresented with the same input and output representationsused here (see Zhao et al [171] for a discussion of lossfunctions for deep learning targeting numeric instead of

Complexity 31

categorical output vectors) What is unclear is whethersuch networks will improve our understandingmdashthe currentlinear networks offer the analyst connection weights betweeninput and output representations that are straightforwardlylinguistically interpretable Where we think deep learningnetworks really will come into their own is bridging the gapbetween for instance the cochlea and retina on the one handand the heuristic representations (letter trigrams) that wehave used as a starting point for our functional theory of themappings between form and meaning

Network flexibility An important challenge for deeplearning networks designed to model human lexical process-ing is to keep the networks flexible and open to rapid updat-ing This openness to continuous learning is required notonly by findings on categorization [40 41 41 42] includingphonetic categorization [172 173] but is also visible in timeseries of reaction times Baayen and Hendrix [17] reportedimproved prediction of visual lexical decision reaction timesin the British Lexicon Project when a naive discriminativelearning network is updated from trial to trial simulating thewithin-experiment learning that goes on as subjects proceedthrough the experiment In natural speech the phenomenonof phonetic convergence between interlocutors [174] likewisebears witness to a lexicon that is constantly recalibratedin interaction with its environment (for the many ways inwhich interlocutorsrsquo speech aligns see [175]) Likewise therapidity with which listeners adapt to foreign accents [176]within a few minutes shows that lexical networks exhibitfast local optimization As deep learning networks typicallyrequire huge numbers of cycles through the training dataonce trained they are at risk of not having the requiredflexibility for fast local adaptation This risk is reduced for(incremental) wide learning as illustrated by the phoneticconvergence evidenced by the model of Arnold et al [18] forauditory comprehension (a very similar point was made byLevelt [177] in his critique of connectionist models in thenineties)

Open questions This initial study necessarily leavesmany questions unanswered Compounds and inflectedderived words have not been addressed and morphologicaloperations such as reduplication infixation and nonconcate-native morphology provide further testing grounds for thepresent approach We also need to address the processingof compounds with multiple constituents ubiquitous inlanguages as different as German Mandarin and EstonianFurthermore in actual utterances words undergo assimila-tion at their boundaries and hence a crucial test case forthe discriminative lexicon is to model the production andcomprehension of words in multi-word utterances For ourproduction model we have also completely ignored stresssyllabification and pitch which however likely has renderedthe production task more difficult than necessary

A challenge for discriminative morphology is the devel-opment of proper semantic matrices For languages withrich morphology such as Estonian current morphologicalparsers will identify case-inflected words as nominativesgenitives or partitives (among others) but these labelsfor inflectional forms do not correspond to the semanticfunctions encoded in these forms (see eg [178 179]) What

is required are computational tools that detect the appropriateinflectional lexomes for these words in the sentences orutterances in which they are embedded

A related problem specifically for the speech productionmodel is how to predict the strongly reduced word formsthat are rampant in conversational speech [111] Our auditorycomprehension network F119886 is confronted with reduced formsin training and the study of Arnold et al [18] indicates thatpresent identification accuracymay approximate human per-formance for single-word recognition Whereas for auditorycomprehension we are making progress what is required forspeech production is a much more detailed understanding ofthe circumstances under which speakers actually use reducedword forms The factors driving reduction may be in partcaptured by the semantic vectors but we anticipate thataspects of the unfolding discourse play an important role aswell which brings us back to the unresolved question of howto best model not isolated words but words in context

An important challenge for our theory which formalizesincremental implicit learning is how to address and modelthe wealth of explicit knowledge that speakers have abouttheir language We play with the way words sound in poetrywe are perplexed by wordsrsquo meanings and reinterpret themso that they make more sense (eg the folk etymology [180]reflected in modern English crawfish a crustacean and nota fish originally Middle English crevis compare ecrevissein French) we teach morphology in schools and we evenfind that making morphological relations explicit may bebeneficial for aphasic patients [181] The rich culture of worduse cannot but interact with the implicit system but howexactly this interaction unfolds andwhat its consequences areat both sides of the consciousunconscious divide is presentlyunclear

Although many questions remain our model is remark-ably successful in capturing many aspects of lexical andmorphological processing in both comprehension and pro-duction Since themodel builds on a combination of linguisti-callymotivated lsquosmartrsquo low-level representations for form andmeaning and large but otherwise straightforward andmathe-matically well-understood two-layer linear networks (to beclear networks without any hidden layers) the conclusionseems justified that algorithmically the ldquomental lexiconrdquomaybe much simpler than previously thought

Appendix

A Vectors Matrices and MatrixMultiplication

Figure 12 presents two data points 119886 and 119887 with coordinates(3 4) and (-2 -3) Arrows drawn from the origin to thesepoints are shown in blue with a solid and dashed linerespectively We refer to these points as vectors Vectors aredenoted with lower case letters in bold and we place the 119909coordinate of a point next to the 119910 coordinate

a = (3 4) b = (minus2 minus3) (A1)

32 Complexity

minus4 minus2 0 2 4

minus4

minus2

02

4

x

y

a

b

x

y

Figure 12 Points 119886 and 119887 can be transformed into points 119901 and 119902 bya linear transformation

We can represent a and b jointly by placing them next to eachother in a matrix For matrices we use capital letters in boldfont

A = ( 3 4minus2 minus3) (A2)

In addition to the points 119886 and 119887 Figure 12 also shows twoother datapoints 119909 and 119910 The matrix for these two points is

B = (minus5 24 minus1) (A3)

Let us assume that points 119886 and 119887 represent the forms oftwo words 1205961 and 1205962 and that the points 119909 and 119910 representthe meanings of these two words (below we discuss in detailhow exactly wordsrsquo forms and meanings can be representedas vectors of numbers) As morphology is the study of therelation between wordsrsquo forms and their meanings we areinterested in how to transform points 119886 and b into points119909 and 119910 and likewise in how to transform points 119909 and119910 back into points 119886 and 119887 The first transformation isrequired for comprehension and the second transformationfor production

We begin with the transformation for comprehensionFormally we have matrix A which we want to transforminto matrix B A way of mapping A onto B is by means ofa linear transformation using matrix multiplication For 2 times2 matrices matrix multiplication is defined as follows(1198861 11988621198871 1198872) (1199091 11990921199101 1199102) = (11988611199091 + 11988621199101 11988611199092 + 1198862119910211988711199091 + 11988721199101 11988711199092 + 11988721199102) (A4)

Such a mapping of A onto B is given by a matrix F

F = ( 1 2minus2 minus1) (A5)

and it is straightforward to verify that indeed( 3 4minus2 minus3) ( 1 2minus2 minus1) = (minus5 24 minus1) (A6)

Using matrix notation we can write

AF = B (A7)

Given A and B how do we obtain F The answeris straightforward but we need two further concepts theidentity matrix and the matrix inverse For multiplication ofnumbers the identity multiplication is multiplication with 1For matrices multiplication with the identity matrix I leavesthe locations of the points unchangedThe identitymatrix hasones on the main diagonal and zeros elsewhere It is easilyverified that indeed(1 00 1) (1199091 11990921199101 1199102) = (1199091 11990921199101 1199102) (1 00 1) = (1199091 11990921199101 1199102) (A8)

For numbers the inverse of multiplication with 119904 is dividingby 119904 (1119904 ) (119904119909) = 119909 (A9)

For matrices the inverse of a square matrix X is that matrixXminus1 such that their product is the identity matrix

Xminus1X = XXminus1 = I (A10)

For nonsquare matrices the inverse Yminus1 is defined such that

Yminus1 (YX) = (XY) 119884minus1 = X (A11)

We find the square matrix F that maps the square matrices Aonto B as follows

AF = B

Aminus1AF = Aminus1B

IF = Aminus1B

F = Aminus1B(A12)

Since for the present example A and its inverse happen to beidentical (Aminus1 = A) we obtain (A6)

Fmaps wordsrsquo form vectors onto wordsrsquo semantic vectorsLet us now consider the reverse and see how we can obtainwordsrsquo form vectors from wordsrsquo semantic vectors That isgiven B we want to find that matrix G which maps B ontoA ie

BG = A (A13)

Solving this equation proceeds exactly as above

BG = A

Bminus1BG = Bminus1A

IG = Bminus1A

G = Bminus1A(A14)

Complexity 33

The inverse of B is

Bminus1 = (13 2343 53) (A15)

and hence we have

(13 2343 53) ( 3 4minus2 minus3) = (minus13 minus2323 13 ) = G (A16)

The inverse of a matrix needs not exist A square matrixthat does not have an inverse is referred to as a singularmatrix Almost all matrices with which we work in theremainder of this study are singular For singular matricesan approximation of the inverse can be used such as theMoore-Penrose generalized inverse (In R the generalizedinverse is available in theMASS package function ginv Thenumeric library that is used by ginv is LAPACK availableat httpwwwnetliborglapack) In this study wedenote the generalized inverse of a matrix X by X1015840

The examples presented here are restricted to 2 times 2matrices but themathematics generalize tomatrices of largerdimensions When multiplying two matrices X and Y theonly constraint is that the number of columns ofX is the sameas the number of rows ofYThe resultingmatrix has the samenumber of rows as X and the same number of columns as Y

In the present study the rows of matrices represent wordsand columns the features of thesewords For instance the rowvectors of A can represent wordsrsquo form features and the rowvectors of and B the corresponding semantic features Thefirst row vector of A is mapped onto the first row vector ofB as follows (3 4) ( 1 2minus2 minus1) = (minus5 2) (A17)

A transformationmatrix such as F can be represented as atwo-layer networkThe network corresponding to F is shownin Figure 13 When the input vector (3 4) is presented to thenetwork the nodes 1198941 and 1198942 are set to 3 and 4 respectively Toobtain the activations 1199001 = minus5 and 1199002 = 2 for the output vector(minus5 2) we cumulate the evidence at the input weighted bythe connection strengths specified on the edges of the graphThus for 1199001 we obtain the value 3 times 1 + 4 times minus2 = minus5 and for1199002 we have 3 times 2 + 4 times minus1 = 2 In other words the networkmaps the input vector (3 4) onto the output vector (minus5 2)

An important property of linear maps is that they areproductive We can present a novel form vector say

s = (2 2) (A18)

to the network and it will map this vector onto a novelsemantic vector Using matrices this new semantic vector isstraightforwardly calculated(2 2) ( 1 2minus2 minus1) = (minus2 2) (A19)

1 2 minus2 minus1

i1 i2

o1 o2

semantic vectors come out here

form features go in here

connection weightsare the transformationmatrix

Figure 13 The linear network corresponding to the transformationmatrix F There are two input nodes 1198941 and 1198942 (in blue)and twooutput nodes 1199001 and 1199002 in red When the input vector (3 4) ispresented to the network the value at 1199001 is 3 times 1 + 4 times minus2 = minus5and that at 1199002 is 3 times 2 + 4 times minus1 = 2 The network maps the point(3 4) onto the point (minus5 2)B Graph-Based Triphone Sequencing

The hypothesis underlying graph-based triphone sequencingis that in the directed graph that has triphones as verticesand an edge between any two vertices that overlap (iesegments 2 and 3 of triphone 1 are identical to segments 1and 2 of triphone 2) the path from a wordrsquos initial triphone(the triphone starting with ) to a wordrsquos final triphone (thetriphone ending with a ) is the path receiving the bestsupport of all possible paths from the initial to the finaltriphone Unfortunately the directed graph containing allvertices and edges is too large tomake computations tractablein a reasonable amount of computation time The followingalgorithm is a heuristic algorithm that first collects potentiallyrelevant vertices and edges and then calculates all pathsstarting fromany initial vertex using theall simple pathsfunction from the igraph package [127] As a first step allvertices that are supported by the stem and whole wordwith network support exceeding 01 are selected To theresulting set those vertices are added that are supported bythe affix which is conditional on these vertices having anetwork support exceeding 095 A directed graph can nowbe constructed and for each initial vertex all paths startingat these vertices are calculated The subset of those pathsreaching a final vertex is selected and the support for eachpath is calculated As longer paths trivially will have greatersupport the support for a path is weighted for its lengthsimply by dividing the raw support by the path length Pathsare ranked by weighted path support and the path withmaximal support is selected as the acoustic image drivingarticulation

It is possible that no path from an initial vertex to a finalvertex is found due to critical boundary triphones not beinginstantiated in the data on which the model was trainedThis happens when the model is assessed on novel inflected

34 Complexity

nobo

dyso

met

hing

ever

ythi

ngno

thin

gev

eryo

nean

ythi

ngan

yone

som

eone

hers

him his

via

than I

exce

ptou

tside

thei

rsw

hoev

erth

ence

min

ein

side

amon

gst its

thei

rou

rm

yyo

uyo

ur her

she

your

selv

esw

hate

ver us we it

they he

them m

em

yself

him

self

hers

elfou

rselv

esyo

urse

lfits

elfth

emse

lves

abou

tw

ithou

tw

ithin

beyo

ndam

ong

betw

een

thro

ugho

utdu

ring

since by

from in of

with at fo

r tobe

low

abov

e up offdo

wn

upon on

agai

nst

tow

ard

tow

ards

arou

ndne

aral

ong

unde

rbe

hind

besid

eth

roug

hon

toin

toov

erac

ross

nobodysomethingeverythingnothingeveryoneanythinganyonesomeonehershimhisviathanIexceptoutsidetheirswhoeverthencemineinsideamongstitstheirourmyyouyourhersheyourselveswhateverusweittheyhethemmemyselfhimselfherselfourselvesyourselfitselfthemselvesaboutwithoutwithinbeyondamongbetweenthroughoutduringsincebyfrominofwithatfortobelowaboveupoffdownupononagainsttowardtowardsaroundnearalongunderbehindbesidethroughontointooveracross

minus01 0 01 02Value

Color Keyand Histogram

050

015

0025

00C

ount

Figure 14 Heatmap for the correlations of the semantic vectors of pronouns and prepositions Note that quantifier pronouns form a groupof their own that prepositions and the other pronouns form distinguisable groups and that within groups further subclusters are visible egforwe and us she and her and across and over All diagonal elements represent correlations equal to 1 (this is not brought out by the heatmapwhich color-codes diagonal elements with the color for the highest correlation of the range we specified 025)

and derived words under cross-validationTherefore verticesand edges are added to the graph for all nonfinal verticesthat are at the end of any of the paths starting at any initialtriphone Such novel vertices and edges are assigned zeronetwork support Typically paths from the initial vertex tothe final vertex can now be constructed and path support isevaluated as above

For an algorithm that allows triphone paths to includecycles which may be the case in languages with much richer

morphology than English see Baayen et al [85] and for codethe R packageWpmWithLdl described therein

C A Heatmap for Function Words(Pronouns and Prepositions)

The heatmap for the correlations of the semantic vectors ofpronouns and prepositions is presented in Figure 14

Complexity 35

Data Availability

With the exception of the data from theDistributed Little RedHen Lab all primary data are publicly available from the citedsources

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The authors are indebted to Geoff Hollis Jessie Nixon ChrisWestbury and LuisMienhardt for their constructive feedbackon earlier versions of this manuscript This research wassupported by an ERC advancedGrant (no 742545) to the firstauthor

References

[1] G Frege ldquoBegriffsschrift a formula language modeled uponthat of arithmetic for pure thoughtrdquo in From Frege to GodelA Source Book in Mathematical Logic 1879-1931 pp 1ndash82 1967

[2] B RussellAn Inquiry intoMeaning andTruth Allen andUnwinLondon UK 1942

[3] R Montague ldquoThe proper treatment of quantification in ordi-nary englishrdquo in Approaches to Natural Language pp 221ndash242Springer 1973

[4] N Chomsky Syntactic Structures vol 33 Mouton The Hague1957

[5] W J M Levelt Formal Grammars in Linguistics And Psycholin-guistics Volume 1 An Introduction to the Theory of FormalLanguages and Automata Volume 2 Applications in LinguisticTheory Volume 3 Psycholinguistic Applications John BenjaminsPublishing 2008

[6] D F Kleinschmidt and T Florian Jaeger ldquoRobust speechperception Recognize the familiar generalize to the similar andadapt to the novelrdquo Psychological Review vol 122 no 2 pp 148ndash203 2015

[7] L Breiman ldquoRandom forestsrdquo Machine Learning vol 45 no 1pp 5ndash32 2001

[8] C Strobl A-L Boulesteix A Zeileis and T Hothorn ldquoBiasin random forest variable importance measures illustrationssources and a solutionrdquo BMC Bioinformatics vol 8 2007

[9] A Hannun C Case J Casper et al ldquoDeep speechScaling up end-to-end speech recognitionrdquo 2014httpsarxivorgabs14125567

[10] Z Tuske P Golik R Schluter and H Ney ldquoAcoustic modelingwith deep neural networks using raw time signal for lvcsrrdquo inProceedings of the INTERSPEECH pp 890ndash894 2014

[11] J Schmidhuber ldquoDeep learning in neural networks anoverviewrdquo Neural Networks vol 61 pp 85ndash117 2015

[12] T Hannagan J C Ziegler S Dufau J Fagot and J GraingerldquoDeep learning of orthographic representations in baboonsrdquoPLoS ONE vol 9 no 1 2014

[13] J Grainger S Dufau M Montant J C Ziegler and J FagotldquoOrthographic processing in baboons (Papio papio)rdquo Sciencevol 336 no 6078 pp 245ndash248 2012

[14] D Scarf K Boy A U Reinert J Devine O Gunturkun andM Colombo ldquoOrthographic processing in pigeons (columba

livia)rdquo Proceedings of the National Academy of Sciences vol 113no 40 pp 11272ndash11276 2016

[15] M Linke F Broker M Ramscar and R H Baayen ldquoArebaboons learning ldquoorthographicrdquo representations Probablynotrdquo PLoS ONE vol 12 no 8 2017

[16] D H Hubel and T N Wiesel ldquoReceptive fields binocularinteraction and functional architecture in the catrsquos visualcortexrdquo The Journal of Physiology vol 160 pp 106ndash154 1962

[17] R H Baayen and P Hendrix ldquoTwo-layer networks non-linear separation and human learningrdquo in From Semantics toDialectometry Festschrift in honor of JohnNerbonne Tributes 32M Wieling M Kroon G Van Noord and G Bouma Eds pp13ndash22 College Publications 2017

[18] D Arnold F Tomaschek K Sering F Lopez and R H BaayenldquoWords from spontaneous conversational speech can be rec-ognized with human-like accuracy by an error-driven learningalgorithm that discriminates between meanings straight fromsmart acoustic features bypassing the phoneme as recognitionunitrdquo PLoS ONE vol 12 no 4 2017

[19] P Zwitserlood ldquoProcessing and representation of morphologi-cal complexity in native language comprehension and produc-tionrdquo in The Construction of Words G E Booij Ed pp 583ndash602 Springer 2018

[20] G Booij ldquoConstruction morphologyrdquo Language and LinguisticsCompass vol 4 no 7 pp 543ndash555 2010

[21] J P Blevins ldquoWord-based morphologyrdquo Journal of Linguisticsvol 42 no 3 pp 531ndash573 2006

[22] G Stump Inflectional Morphology A Theory of ParadigmStructure Cambridge University Press 2001

[23] R Beard ldquoOn the extent and nature of irregularity in thelexiconrdquo Lingua vol 42 no 4 pp 305ndash341 1977

[24] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1974

[25] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1991

[26] C Hockett ldquoThe origin of speechrdquo Scientific American vol 203pp 89ndash97 1960

[27] M Erelt Estonian Language M Erelt Ed Estonian AcademyPublishers Tallinn Estonia 2003

[28] P H Matthews Grammatical Theory in the United States fromBloomfield to Chomsky Cambridge University Press Cam-bridge 1993

[29] N Chomsky andMHalleTheSound Pattern of English Harperand Row NY USA 1968

[30] T Landauer and S Dumais ldquoA solution to Platorsquos problem Thelatent semantic analysis theory of acquisition induction andrepresentation of knowledgerdquo Psychological Review vol 104 no2 pp 211ndash240 1997

[31] W Weaver ldquoTranslationrdquo in Machine Translation of LanguagesFourteen Essays W N Locke and A D Booth Eds pp 15ndash23MIT Press Cambridge 1955

[32] J R Firth Selected Papers of J R Firth 1952-59 IndianaUniversity Press 1968

[33] K Lund and C Burgess ldquoProducing high-dimensional seman-tic spaces from lexical co-occurrencerdquo Behavior Research Meth-ods Instruments and Computers vol 28 no 2 pp 203ndash2081996

[34] C Shaoul and C Westbury ldquoExploring lexical co-occurrencespace using HiDExrdquo Behavior Research Methods vol 42 no 2pp 393ndash413 2010

36 Complexity

[35] T Mikolov I Sutskever K Chen G Corrado and J DeanldquoDistributed representations of words and phrases and theircompositionalityrdquo inAdvances in Neural Information ProcessingSystems pp 3111ndash3119 2013

[36] A Roelofs ldquoThe WEAVER model of word-form encoding inspeech productionrdquo Cognition vol 64 no 3 pp 249ndash284 1997

[37] W J M Levelt A Roelofs and A S Meyer ldquoA theory of lexicalaccess in speech productionrdquoBehavioral and Brain Sciences vol22 no 1 pp 1ndash38 1999

[38] M Taft ldquoInteractive-activation as a Framework for Under-standing Morphological Processingrdquo Language and CognitiveProcesses vol 9 no 3 pp 271ndash294 1994

[39] R Schreuder and R H Baayen ldquoModeling morphologicalprocessingrdquo inMorphological Aspects of Language Processing LB Feldman Ed pp 131ndash154 Lawrence ErlbaumHillsdale NewJersey USA 1995

[40] B C Love D L Medin and T M Gureckis ldquoSUSTAIN ANetwork Model of Category Learningrdquo Psychological Reviewvol 111 no 2 pp 309ndash332 2004

[41] C J Marsolek ldquoWhat antipriming reveals about primingrdquoTrends in Cognitive Sciences vol 12 no 5 pp 176ndash181 2008

[42] M Ramscar and R Port ldquoCategorization (without categories)rdquoin Handbook of Cognitive Linguistics E Dabrowska and DDivjak Eds pp 75ndash99 De Gruyter Berlin Germany 2015

[43] R A Rescorla and A R Wagner ldquoA theory of Pavlovianconditioning Variations in the effectiveness of reinforcementand nonreinforcementrdquo in Classical Conditioning II CurrentResearch and Theory A H Black and W F Prokasy Eds pp64ndash99 Appleton Century Crofts NY USA 1972

[44] B Widrow and M E Hoff ldquoAdaptive switching circuitsrdquo 1960WESCON Convention Record Part IV pp 96ndash104 1960

[45] R H Baayen R Schreuder and R Sproat ldquoMorphology inthe Mental Lexicon A Computational Model for Visual WordRecognitionrdquo in Lexicon Development for Speech and LanguageProcessing F van Eynde and D Gibbon Eds pp 267ndash291Kluwer Academic Publishers 2000

[46] M Taft ldquoA morphological-decomposition model of lexicalrepresentationrdquo Linguistics vol 26 no 4 pp 657ndash667 1988

[47] MWHarmandM S Seidenberg ldquoComputing themeanings ofwords in reading Cooperative division of labor between visualand phonological processesrdquo Psychological Review vol 111 no3 pp 662ndash720 2004

[48] P C Trimmer J M McNamara A Houston and J A RMarshall ldquoDoes natural selection favour the Rescorla-Wagnerrulerdquo Journal of Theoretical Biology vol 302 pp 39ndash52 2012

[49] R R Miller R C Barnet and N J Grahame ldquoAssessment ofthe Rescorla-Wagner modelrdquo Psychological Bulletin vol 117 no3 pp 363ndash386 1995

[50] L J Kamin ldquoPredictability surprise attention and condition-ingrdquo in Punishment and Aversive Behavior B A Campbell andRMChurch Eds pp 276ndash296Appleton-Century-CroftsNYUSA 1969

[51] R A Rescorla ldquoPavlovian conditioning Itrsquos not what you thinkit isrdquo American Psychologist vol 43 no 3 pp 151ndash160 1988

[52] M Ramscar D Yarlett M Dye K Denny and K ThorpeldquoThe Effects of Feature-Label-Order and Their Implications forSymbolic Learningrdquo Cognitive Science vol 34 no 6 pp 909ndash957 2010

[53] M Ramscar and D Yarlett ldquoLinguistic self-correction in theabsence of feedback A new approach to the logical problem oflanguage acquisitionrdquo Cognitive Science vol 31 no 6 pp 927ndash960 2007

[54] R H Baayen P Milin D F ETHurđevic P Hendrix and MMarelli ldquoAn Amorphous Model for Morphological Process-ing in Visual Comprehension Based on Naive DiscriminativeLearningrdquo Psychological Review vol 118 no 3 pp 438ndash4812011

[55] F M Del Prado Martın R Bertram T Haikio R Schreuderand R H Baayen ldquoMorphological family size in a morphologi-cally rich languageThe case of Finnish compared to Dutch andHebrewrdquo Journal of Experimental Psychology LearningMemoryand Cognition vol 30 pp 1271ndash1278 2004

[56] P Milin D F Durdevic and F M Del Prado Martın ldquoThesimultaneous effects of inflectional paradigms and classes onlexical recognition Evidence from Serbianrdquo Journal of Memoryand Language vol 60 no 1 pp 50ndash64 2009

[57] B K Bergen ldquoThe psychological reality of phonaesthemesrdquoLanguage vol 80 no 2 pp 290ndash311 2004

[58] P Milin L B Feldman M Ramscar P Hendrix and R HBaayen ldquoDiscrimination in lexical decisionrdquo PLoS ONE vol 12no 2 2017

[59] R H Baayen P Milin and M Ramscar ldquoFrequency in lexicalprocessingrdquo Aphasiology vol 30 no 11 pp 1174ndash1220 2016

[60] M Marelli and M Baroni ldquoAffixation in semantic space Mod-eling morpheme meanings with compositional distributionalsemanticsrdquo Psychological Review vol 122 no 3 pp 485ndash5152015

[61] T Sering PMilin andRH Baayen ldquoLanguage comprehensionas amultiple label classification problemrdquo StatisticaNeerlandicapp 1ndash15 2018

[62] R Kaye and RWilson Linear Algebra Oxford University Press1998

[63] S H Ivens and B L Koslin Demands for Reading LiteracyRequire New Accountability Methods Touchstone Applied Sci-ence Associates 1991

[64] T K Landauer P W Foltz and D Laham ldquoAn introduction tolatent semantic analysisrdquoDiscourse Processes vol 25 no 2-3 pp259ndash284 1998

[65] H Schmid ldquoImprovements in Part-of-Speech Tagging with anApplication to Germanrdquo in Proceedings of the ACL SIGDAT-Workshop pp 13ndash25 Dublin Ireland 1995

[66] R H Baayen R Piepenbrock and L Gulikers The CELEXLexical Database (CD-ROM) Linguistic Data ConsortiumUniversity of Pennsylvania Philadelphia PA 1995

[67] J Mitchell and M Lapata ldquoVector-based models of semanticcompositionrdquo in Proceedings of the ACL pp 236ndash244 2008

[68] A Lazaridou M Marelli R Zamparelli and M BaronildquoCompositionally derived representations of morphologicallycomplex words in distributional semanticsrdquo in Proceedings ofthe ACL (1) pp 1517ndash1526 2013

[69] R Cotterell H Schutze and J Eisner ldquoMorphological smooth-ing and extrapolation of word embeddingsrdquo in Proceedings ofthe 54th Annual Meeting of the Association for ComputationalLinguistics vol 1 pp 1651ndash1660 2016

[70] B D Zeller S Pado and J Snajder ldquoTowards semantic valida-tion of a derivational lexiconrdquo in Proceedings of the COLING2014 the 25th International Conference on Computational Lin-guistics Technical Papers pp 1728ndash1739 2014

[71] H Marchand The Categories and Types of Present-Day EnglishWord Formation A Synchronic-Diachronic Approach BeckrsquoscheVerlagsbuchhandlung Munchen Germany 1969

[72] L Bauer EnglishWord-Formation Cambridge University PressCambridge 1983

Complexity 37

[73] I PlagWord-Formation in English CambridgeUniversity PressCambridge 2003

[74] R Beard Lexeme-Morpheme Base Morphology A GeneralTheory of Inflection and Word Formation State University ofNew York Press Albany NY USA 1995

[75] K Geeraert J Newman and R H Baayen ldquoIdiom VariationExperimental Data and a Blueprint of a Computational ModelrdquoTopics in Cognitive Science vol 9 no 3 pp 653ndash669 2017

[76] C Shaoul S Bitschau N Schilling et al ldquondl2 Naive discrim-inative learning an implementation in R R packagerdquo 2015

[77] P Milin D Divjak and R H Baayen ldquoA learning perspectiveon individual differences in skilled reading Exploring andexploiting orthographic and semantic discrimination cuesrdquoJournal of Experimental Psychology Learning Memory andCognition vol 43 no 11 pp 1730ndash1751 2017

[78] M Ramscar P Hendrix C Shaoul P Milin and R H BaayenldquoNonlinear dynamics of lifelong learning the myth of cognitivedeclinerdquo Topics in Cognitive Science vol 6 pp 5ndash42 2014

[79] M Ramscar C C Sun P Hendrix and R H Baayen ldquoTheMis-measurement of Mind Life-Span Changes in Paired-Associate-Learning Scores Reflect the ldquoCostrdquo of Learning Not CognitiveDeclinerdquo Psychological Science vol 28 no 8 pp 1171ndash1179 2017

[80] G desRosiers and D Ivison ldquoPaired associate learning Form 1and form 2 of the Wechsler memory scalerdquo Archives of ClinicalNeuropsychology vol 3 no 1 pp 47ndash67 1988

[81] E Bruni N K Tran andM Baroni ldquoMultimodal distributionalsemanticsrdquo Journal of Artificial Intelligence Research vol 49 pp1ndash47 2014

[82] W N Venables and B D RipleyModern Applied Statistics withS-PLUS Springer NY USA 2002

[83] A B Warriner V Kuperman and M Brysbaert ldquoNorms ofvalence arousal and dominance for 13915 English lemmasrdquoBehavior Research Methods vol 45 no 4 pp 1191ndash1207 2013

[84] D Kastovsky ldquoProductivity in word formationrdquo Linguistics vol24 no 3 pp 585ndash600 1986

[85] R H Baayen Y Chuang and J P Blevins ldquoInflectionalmorphology with linear mappingsrdquoTheMental Lexicon vol 13no 2 pp 232ndash270 2018

[86] Burnage CELEX A Guide for Users Centre for Lexical Infor-mation Nijmegen The Netherlands 1988

[87] C McBride-Chang ldquoModels of Speech Perception and Phono-logical Processing in Readingrdquo Child Development vol 67 no4 pp 1836ndash1856 1996

[88] G C Van Orden H Kloos et al ldquoThe question of phonologyand readingrdquo inThe Science of Reading A Handbook pp 61ndash782005

[89] D Jared and K OrsquoDonnell ldquoSkilled adult readers activate themeanings of high-frequency words using phonology Evidencefrom eye trackingrdquoMemoryampCognition vol 45 no 2 pp 334ndash346 2017

[90] G S Dell ldquoA Spreading-Activation Theory of Retrieval inSentence Productionrdquo Psychological Review vol 93 no 3 pp283ndash321 1986

[91] A Marantz ldquoNo escape from morphemes in morphologicalprocessingrdquo Language and Cognitive Processes vol 28 no 7 pp905ndash916 2013

[92] M Bozic W D Marslen-Wilson E A Stamatakis M HDavis and L K Tyler ldquoDifferentiating morphology formand meaning Neural correlates of morphological complexityrdquoCognitive Neuroscience vol 19 no 9 pp 1464ndash1475 2007

[93] R F Port andA P Leary ldquoAgainst formal phonologyrdquoLanguagevol 81 no 4 pp 927ndash964 2005

[94] J P Blevins ldquoStems and paradigmsrdquo Language vol 79 no 4 pp737ndash767 2003

[95] R Forsyth and D Holmes ldquoFeature-finding for test classifica-tionrdquo Literary and Linguistic Computing vol 11 no 4 pp 163ndash174 1996

[96] H Pham and R H Baayen ldquoVietnamese compounds showan anti-frequency effect in visual lexical decisionrdquo LanguageCognition and Neuroscience vol 30 no 9 pp 1077ndash1095 2015

[97] L Cohen and S Dehaene ldquoVentral and dorsal contributions toword readingrdquo inTheCognitive Neurosciences M S GazzanigaEd pp 789ndash804 Massachusetts Institute of Technology 2009

[98] E Keuleers P Lacey K Rastle and M Brysbaert ldquoThe BritishLexicon Project Lexical decision data for 28730 monosyllabicand disyllabic English wordsrdquo Behavior Research Methods vol44 no 1 pp 287ndash304 2012

[99] R K Wagner J K Torgesen and C A Rashotte ldquoDevel-opment of reading-related phonological processing abilitiesnew evidence of bidirectional causality from a latent variablelongitudinal studyrdquoDevelopmental Psychology vol 30 no 1 pp73ndash87 1994

[100] K F EWong andH-C Chen ldquoOrthographic and phonologicalprocessing in reading Chinese text Evidence from eye fixa-tionsrdquo Language and Cognitive Processes vol 14 no 5-6 pp461ndash480 1999

[101] R L Newman D Jared and C A Haigh ldquoDoes phonologyplay a role when skilled readers read high-frequency wordsEvidence from ERPsrdquo Language and Cognitive Processes vol 27no 9 pp 1361ndash1384 2012

[102] D Jared J Ashby S J Agauas and B A Levy ldquoPhonologicalactivation of word meanings in grade 5 readersrdquo Journal ofExperimental Psychology LearningMemory andCognition vol42 no 4 pp 524ndash541 2016

[103] T Bitan A Kaftory A Meiri-Leib Z Eviatar and O PelegldquoPhonological ambiguity modulates resolution of semanticambiguity during reading An fMRI study of Hebrewrdquo Neu-ropsychology vol 31 no 7 pp 759ndash777 2017

[104] D Jared and S Bainbridge ldquoReading homophone puns Evi-dence from eye trackingrdquo Canadian Journal of ExperimentalPsychology vol 71 no 1 pp 2ndash13 2017

[105] S Amenta M Marelli and S Sulpizio ldquoFrom sound tomeaning Phonology-to-Semantics mapping in visual wordrecognitionrdquo Psychonomic Bulletin amp Review vol 24 no 3 pp887ndash893 2017

[106] M Perrone-Bertolotti J Kujala J R Vidal et al ldquoHow silent issilent reading intracerebral evidence for top-down activationof temporal voice areas during readingrdquo The Journal of Neuro-science vol 32 no 49 pp 17554ndash17562 2012

[107] B Yao P Belin andC Scheepers ldquoSilent reading of direct versusindirect speech activates voice-selective areas in the auditorycortexrdquo Journal of Cognitive Neuroscience vol 23 no 10 pp3146ndash3152 2011

[108] M Coltheart B Curtis P Atkins and M Haller ldquoMod-els of Reading Aloud Dual-Route and Parallel-Distributed-Processing Approachesrdquo Psychological Review vol 100 no 4pp 589ndash608 1993

[109] M Coltheart ldquoModeling reading The dual-route approachrdquo inThe Science of Reading A Handbook pp 6ndash23 2005

38 Complexity

[110] M Pitt K Johnson E Hume S Kiesling and W RaymondldquoThe Buckeye corpus of conversational speech Labeling con-ventions and a test of transcriber reliabilityrdquo Speech Communi-cation vol 45 no 1 pp 89ndash95 2005

[111] K Johnson ldquoMassive reduction in conversational AmericanEnglishrdquo in Proceedings of the Spontaneous Speech Data andAnalysis Proceedings of the 1st Session of the 10th InternationalSymposium pp 29ndash54 The National International Institute forJapanese Language Tokyo Japan 2004

[112] C Phillips ldquoLevels of representation in the electrophysiology ofspeech perceptionrdquo Cognitive Science vol 25 no 5 pp 711ndash7312001

[113] R L Diehl A J Lotto and L L Holt ldquoSpeech perceptionrdquoAnnual Review of Psychology vol 55 pp 149ndash179 2004

[114] D Norris and J M McQueen ldquoShortlist B a Bayesian model ofcontinuous speech recognitionrdquo Psychological Review vol 115no 2 pp 357ndash395 2008

[115] S Hawkins ldquoRoles and representations of systematic finephonetic detail in speech understandingrdquo Journal of Phoneticsvol 31 no 3-4 pp 373ndash405 2003

[116] CCucchiarini andH Strik ldquoAutomatic Phonetic TranscriptionAn overviewrdquo in Proceedings of the 15th ICPhS pp 347ndash350Barcelona Spain 2003

[117] K JohnsonTheAuditoryPerceptual Basis for Speech Segmenta-tion Ohio State University Working Papers in Linguistics 1997

[118] H Fletcher ldquoAuditory patternsrdquo Reviews of Modern Physics vol12 no 1 pp 47ndash65 1940

[119] D Arnold ldquoAcousticndlcoder Coding sound files for use withndl R package version 101rdquo 2017

[120] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2016

[121] J Pickett and I Pollack ldquoIntelligibility of excerpts from fluentspeech Effects of rate of utterance and duration of excerptrdquoLanguage and Speech vol 6 pp 151ndash164 1963

[122] L Shockey ldquoPerception of reduced forms by non-native speak-ers of Englishrdquo in Sound Patterns of Spontaneous Speech DDuez Ed pp 97ndash100 ESCA Aix 1998

[123] M Ernestus H Baayen and R Schreuder ldquoThe recognition ofreduced word formsrdquo Brain and Language vol 81 no 1-3 pp162ndash173 2002

[124] R H Baayen C Shaoul J Willits and M Ramscar ldquoCom-prehension without segmentation a proof of concept withnaive discriminative learningrdquo Language Cognition and Neu-roscience vol 31 no 1 pp 106ndash128 2016

[125] C P Browman and L Goldstein ldquoArticulatory phonology Anoverviewrdquo Phonetica vol 49 no 3-4 pp 155ndash180 1992

[126] J A Tourville and F H Guenther ldquoThe DIVA model A neuraltheory of speech acquisition and productionrdquo Language andCognitive Processes vol 26 no 7 pp 952ndash981 2011

[127] G Csardi and T Nepusz ldquoThe igraph software package forcomplex network researchrdquo InterJournal Complex Systems vol1695 no 5 pp 1ndash9 2006

[128] M Ernestus and R H Baayen ldquoPredicting the unpredictableInterpreting neutralized segments in Dutchrdquo Language vol 79no 1 pp 5ndash38 2003

[129] R Weingarten G Nottbusch and U Will ldquoMorphemes sylla-bles and graphemes in written word productionrdquo inMultidisci-plinary Approaches to Speech Production T Pechmann and CHabel Eds pp 529ndash572 Mouton de Gruyter Berlin Germany2004

[130] R Bertram F E Toslashnnessen S Stromqvist J Hyona andP Niemi ldquoCascaded processing in written compound wordproductionrdquo Frontiers in Human Neuroscience vol 9 pp 1ndash102015

[131] T Cho ldquoEffects of morpheme boundaries on intergesturaltiming Evidence from Koreanrdquo Phonetica vol 58 no 3 pp129ndash162 2001

[132] S N Wood Generalized Additive Models Chapman ampHallCRC NY USA 2017

[133] M Seidenberg ldquoSublexical structures in visual word recogni-tion Access units or orthographic redundancyrdquo in Attentionand Performance XII M Coltheart Ed pp 245ndash264 LawrenceErlbaum Associates Hove 1987

[134] J M McQueen ldquoSegmentation of continuous speech usingphonotacticsrdquo Journal of Memory and Language vol 39 no 1pp 21ndash46 1998

[135] J B Hay Causes and Consequences of Word Structure Rout-ledge NY USA London UK 2003

[136] J B Hay ldquoFrom speech perception to morphology Affix-ordering revisitedrdquo Language vol 78 pp 527ndash555 2002

[137] J B Hay and R H Baayen ldquoPhonotactics parsing and produc-tivityrdquo Italian Journal of Linguistics vol 15 no 1 pp 99ndash1302003

[138] J Hay J Pierrehumbert and M Beckman ldquoSpeech perceptionwell-formedness and the statistics of the lexiconrdquo Papers inlaboratory phonology VI pp 58ndash74 2004

[139] M Ferro C Marzi and V Pirrelli ldquoA self-organizing modelof word storage and processing implications for morphologylearningrdquo Lingue e Linguaggio vol 10 no 2 pp 209ndash226 2011

[140] F ChersiM Ferro G Pezzulo andV Pirrelli ldquoTopological self-organization and prediction learning support both action andlexical chains in the brainrdquo Topics in Cognitive Science vol 6no 3 pp 476ndash491 2014

[141] V Pirrelli M Ferro and C Marzi ldquoComputational complexityof abstractive morphologyrdquo in Understanding and MeasuringMorphological Complexity M Bearman D Brown and G GCorbett Eds pp 141ndash166 Oxford University Press 2015

[142] C Marzi M Ferro and V Pirrelli ldquoIs inflectional irregularitydysfunctional to human processingrdquo in Abstract Booklet TheMental Lexicon 2018 V Kuperman Ed University of AlbertaEdmonton 2018

[143] E Keuleers M Stevens P Mandera and M Brysbaert ldquoWordknowledge in the crowd Measuring vocabulary size and wordprevalence in a massive online experimentrdquo The QuarterlyJournal of Experimental Psychology vol 68 no 8 pp 1665ndash16922015

[144] M Ernestus Voice Assimilation And Segment Reduction inCasual Dutch A Corpus-Based Study of the Phonology-PhoneticsInterface LOT Utrecht the Netherlands 2000

[145] R Kemps M Ernestus R Schreuder and H Baayen ldquoProcess-ing reducedword formsThe suffix restoration effectrdquoBrain andLanguage vol 90 no 1-3 pp 117ndash127 2004

[146] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) vol 1 pp 886ndash893 June 2005

[147] G Hickok ldquoThe architecture of speech production and the roleof the phoneme in speech processingrdquo Language Cognition andNeuroscience vol 29 no 1 pp 2ndash20 2014

[148] B V Tucker M Sims and R H Baayen Opposing Forceson Acoustic Duration Manuscript University of Alberta andUniversity of Tubingen 2018

Complexity 39

[149] I Plag J Homann and G Kunter ldquoHomophony and mor-phology The acoustics of word-final S in Englishrdquo Journal ofLinguistics vol 53 no 1 pp 181ndash216 2017

[150] F Tomaschek I Plag M Ernestus and R H BaayenModeling the Duration of Word-Final s in English withNaive Discriminative Learning Manuscript University ofSiegenTubingenNijmegen 2018

[151] R E Kalman ldquoA new approach to linear filtering and predictionproblemsrdquo Journal of Fluids Engineering vol 82 no 1 pp 35ndash451960

[152] B Russell ldquoOn denotingrdquo Mind vol 14 no 56 pp 479ndash4931905

[153] N Hornstein Logical Form FromGB toMinimalism Blackwell1995

[154] R Jackendoff Semantic StructuresMITPress Cambridge 1990[155] J J McCarthy ldquoA prosodic theory of non-concatenative mor-

phologyrdquo Linguistic Inquiry vol 12 pp 373ndash418 1981[156] A Ussishkin ldquoA Fixed Prosodic Theory of Nonconcatenative

Templaticmorphologyrdquo Natural Language amp Linguistic Theoryvol 23 no 1 pp 169ndash218 2005

[157] A Ussishkin ldquoAffix-favored contrast inequity and psycholin-guistic grounding for non-concatenative morphologyrdquo Mor-phology vol 16 no 1 pp 107ndash125 2006

[158] RW Young andWMorganTheNavajo Language A Grammarand Colloquial Dictionary University of New Mexico Press1980

[159] S C Levinson and A Majid ldquoThe island of time Yelı Dnye thelanguage of Rossel islandrdquo Frontiers in psychology vol 4 2013

[160] S C Levinson ldquoThe language of space in Yelı Dnyerdquo inGrammars of Space Explorations in Cognitive Diversity pp 157ndash203 Cambridge University Press 2006

[161] E Shafaei-Bajestan and R H Baayen ldquoWide Learning forAuditory Comprehensionrdquo in Proceedings of the Interspeech2018 2018

[162] J Hay and K Drager ldquoStuffed toys and speech perceptionrdquoLinguistics vol 48 no 4 pp 865ndash892 2010

[163] V Henri and C Henri ldquoOn our earliest recollections ofchildhoodrdquo Psychological Review vol 2 pp 215-216 1895

[164] S Freud ldquoChildhood and concealing memoriesrdquo in The BasicWritings of Sigmund Freud The Modern Library NY USA19051953

[165] P J Bauer and M Larkina ldquoThe onset of childhood amnesiain childhood A prospective investigation of the course anddeterminants of forgetting of early-life eventsrdquoMemory vol 22no 8 pp 907ndash924 2014

[166] M S Seidenberg and L M Gonnerman ldquoExplaining deriva-tional morphology as the convergence of codesrdquo Trends inCognitive Sciences vol 4 no 9 pp 353ndash361 2000

[167] M Coltheart K Rastle C Perry R Langdon and J ZieglerldquoDRC a dual route cascaded model of visual word recognitionand reading aloudrdquoPsychological Review vol 108 no 1 pp 204ndash256 2001

[168] M V Butz and E F Kutter How the Mind Comes into BeingIntroducing Cognitive Science from a Functional and Computa-tional Perspective Oxford University Press 2016

[169] M W Harm and M S Seidenberg ldquoPhonology readingacquisition and dyslexia Insights from connectionist modelsrdquoPsychological Review vol 106 no 3 pp 491ndash528 1999

[170] L M Gonnerman M S Seidenberg and E S AndersenldquoGraded semantic and phonological similarity effects in prim-ing Evidence for a distributed connectionist approach to

morphologyrdquo Journal of Experimental Psychology General vol136 no 2 pp 323ndash345 2007

[171] H Zhao O Gallo I Frosio and J Kautz ldquoLoss Functions forImage Restoration With Neural Networksrdquo IEEE Transactionson Computational Imaging vol 3 no 1 pp 47ndash57 2017

[172] C Clarke and P Luce ldquoPerceptual adaptation to speakercharacteristics Vot boundaries in stop voicing categorizationrdquoin Proceedings of the ISCA Workshop on Plasticity in SpeechPerception 2005

[173] DNorris JMMcQueen andACutler ldquoPerceptual learning inspeechrdquo Cognitive Psychology vol 47 no 2 pp 204ndash238 2003

[174] A Schweitzer and N Lewandowski ldquoSocial factors in conver-gence of f1 and f2 in spontaneous speechrdquo in Proceedings ofthe 10th International Seminar on Speech Production CologneGermany 2014

[175] M J Pickering and S Garrod ldquoToward a mechanistic psychol-ogy of dialoguerdquo Behavioral and Brain Sciences vol 27 no 2 pp169ndash190 2004

[176] C M Clarke and M F Garrett ldquoRapid adaptation to foreign-accented Englishrdquo The Journal of the Acoustical Society ofAmerica vol 116 no 6 pp 3647ndash3658 2004

[177] W J Levelt ldquoDie konnektionistische moderdquo Sprache UndKognition vol 10 no 2 pp 61ndash72 1991

[178] A Kostic ldquoInformational load constraints on processinginflected morphologyrdquo in Morphological Aspects of LanguageProcessing L B Feldman Ed Lawrence Erlbaum Inc Publish-ers New Jersey USA 1995

[179] J P BlevinsWord andParadigmMorphology OxfordUniversityPress 2016

[180] E Forstemann Uber Deutsche volksetymologie Zeitschrift furvergleichende Sprachforschung auf dem Gebiete des DeutschenGriechischen und Lateinischen 1852

[181] K Nault Morphological Therapy Protocal [Ph D thesis] Uni-versity of Alberta Edmonton 2010

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 15: The Discriminative Lexicon: A Unified Computational Model ...downloads.hindawi.com/journals/complexity/2019/4895891.pdf · as developed within computational linguistics and com-puter

Complexity 15

the word being available in the CELEX database and theword being of the abovementioned morphological type Inthe present study we did not include inflected variants ofderived words nor did we include compounds

421 The Direct Route Straight from Orthography to Seman-tics To model single word reading as gauged by the visuallexical decision task we used letter trigrams as cues In ourdataset there were a total of 3465 unique trigrams resultingin a 11480 times 3465 orthographic cue matrix C The semanticvectors of the monolexomic and derived words were takenfrom the semantic weight matrix described in Section 3 Forinflected words semantic vectors were obtained by summa-tion of the semantic vectors of base words and inflectionalfunctions For the resulting semantic matrix S we retainedthe 5030 column vectors with the highest variances settingthe cutoff value for the minimal variance to 034times10minus7 FromS and C we derived the transformation matrix F which weused to obtain estimates s of the semantic vectors s in S

For 59 of the words s had the highest correlation withthe targeted semantic vector s (for the correctly predictedwords the mean correlation was 083 and the median was086 With respect to the incorrectly predicted words themean and median correlation were both 048) The accuracyobtained with naive discriminative learning using orthog-onal lexomes as outcomes instead of semantic vectors was27Thus as expected performance of linear discriminationlearning (ldl) is substantially better than that of naivediscriminative learning (ndl)

To assess whether this model is productive in the sensethat it canmake sense of novel complex words we considereda separate set of inflected and derived words which were notincluded in the original dataset For an unseen complexwordboth base word and the inflectional or derivational functionappeared in the training set but not the complex word itselfThe network F therefore was presented with a novel formvector which it mapped onto a novel vector in the semanticspace Recognition was successful if this novel vector wasmore strongly correlated with the semantic vector obtainedby summing the semantic vectors of base and inflectional orderivational function than with any other semantic vector

Of 553 unseen inflected words 43 were recognized suc-cessfully The semantic vectors predicted from their trigramsby the network F were overall well correlated in the meanwith the targeted semantic vectors of the novel inflectedwords (obtained by summing the semantic vectors of theirbase and inflectional function) 119903 = 067 119901 lt 00001The predicted semantic vectors also correlated well with thesemantic vectors of the inflectional functions (119903 = 061 119901 lt00001) For the base words the mean correlation dropped to119903 = 028 (119901 lt 00001)

For unseen derivedwords (514 in total) we also calculatedthe predicted semantic vectors from their trigram vectorsThe resulting semantic vectors s had moderate positivecorrelations with the semantic vectors of their base words(119903 = 040 119901 lt 00001) but negative correlations withthe semantic vectors of their derivational functions (119903 =minus013 119901 lt 00001) They did not correlate with the semantic

vectors obtained by summation of the semantic vectors ofbase and affix (119903 = 001 119901 = 056)

The reduced correlations for the derived words as com-pared to those for the inflected words likely reflects tosome extent that the model was trained on many moreinflected words (in all 6595) than derived words (898)whereas the number of different inflectional functions (7)was much reduced compared to the number of derivationalfunctions (24) However the negative correlations of thederivational semantic vectors with the semantic vectors oftheir derivational functions fit well with the observation inSection 323 that derivational functions are antiprototypesthat enter into negative correlations with the semantic vectorsof the corresponding derived words We suspect that theabsence of a correlation of the predicted vectors with thesemantic vectors obtained by integrating over the vectorsof base and derivational function is due to the semanticidiosyncracies that are typical for derivedwords For instanceausterity can denote harsh discipline or simplicity or apolicy of deficit cutting or harshness to the taste Anda worker is not just someone who happens to work butsomeone earning wages or a nonreproductive bee or waspor a thread in a computer program Thus even within theusages of one and the same derived word there is a lotof semantic heterogeneity that stands in stark contrast tothe straightforward and uniform interpretation of inflectedwords

422 The Indirect Route from Orthography via Phonology toSemantics Although reading starts with orthographic inputit has been shown that phonology actually plays a role duringthe reading process as well For instance developmentalstudies indicate that childrenrsquos reading development can bepredicted by their phonological abilities [87 99] Evidence ofthe influence of phonology on adultsrsquo reading has also beenreported [89 100ndash105] As pointed out by Perrone-Bertolottiet al [106] silent reading often involves an imagery speechcomponent we hear our own ldquoinner voicerdquo while readingSince written words produce a vivid auditory experiencealmost effortlessly they argue that auditory verbal imageryshould be part of any neurocognitive model of readingInterestingly in a study using intracranial EEG recordingswith four epileptic neurosurgical patients they observed thatsilent reading elicits auditory processing in the absence ofany auditory stimulation suggesting that auditory images arespontaneously evoked during reading (see also [107])

To explore the role of auditory images in silent readingwe operationalized sound images bymeans of phone trigrams(in our model phone trigrams are independently motivatedas intermediate targets of speech production see Section 5for further discussion) We therefore ran the model againwith phone trigrams instead of letter triplets The semanticmatrix S was exactly the same as above However a newtransformation matrix F was obtained for mapping a 11480times 5929 cue matrix C of phone trigrams onto S To our sur-prise the triphone model outperformed the trigram modelsubstantially Overall accuracy rose to 78 an improvementof almost 20

16 Complexity

In order to integrate this finding into our model weimplemented an additional network K that maps ortho-graphic vectors (coding the presence or absence of trigrams)onto phonological vectors (indicating the presence or absenceof triphones) The result is a dual route model for (silent)reading with a first route utilizing a network mappingstraight from orthography to semantics and a secondindirect route utilizing two networks one mapping fromtrigrams to triphones and the other subsequently mappingfrom triphones to semantics

The network K which maps trigram vectors onto tri-phone vectors provides good support for the relevant tri-phones but does not provide guidance as to their orderingUsing a graph-based algorithm detailed in the Appendix (seealso Section 52) we found that for 92 of the words thecorrect sequence of triphones jointly spanning the wordrsquossequence of letters received maximal support

We then constructed amatrix Twith the triphone vectorspredicted from the trigrams by networkK and defined a newnetwork H mapping these predicted triphone vectors ontothe semantic vectors SThemean correlation for the correctlyrecognized words was 090 and the median correlation was094 For words that were not recognized correctly the meanand median correlation were 045 and 043 respectively Wenow have for each word two estimated semantic vectors avector s1 obtained with network F of the first route and avector s2 obtained with network H from the auditory targetsthat themselves were predicted by the orthographic cuesThemean of the correlations of these pairs of vectors was 119903 = 073(119901 lt 00001)

To assess to what extent the networks that we definedare informative for actual visual word recognition we madeuse of the reaction times in the visual lexical decision taskavailable in the BLPAll 11480words in the current dataset arealso included in BLPWe derived threemeasures from each ofthe two networks

The first measure is the sum of the L1-norms of s1 ands2 to which we will refer as a wordrsquos total activation diversityThe total activation diversity is an estimate of the support fora wordrsquos lexicality as provided by the two routes The secondmeasure is the correlation of s1 and s2 to which we will referas a wordrsquos route congruency The third measure is the L1-norm of a lexomersquos column vector in S to which we referas its prior This last measure has previously been observedto be a strong predictor for lexical decision latencies [59] Awordrsquos prior is an estimate of a wordrsquos entrenchment and prioravailability

We fitted a generalized additive model to the inverse-transformed response latencies in the BLP with these threemeasures as key covariates of interest including as controlpredictors word length andword type (derived inflected andmonolexomic with derived as reference level) As shown inTable 5 inflected words were responded to more slowly thanderived words and the same holds for monolexomic wordsalbeit to a lesser extent Response latencies also increasedwith length Total activation diversity revealed a U-shapedeffect (Figure 5 left panel) For all but the lowest totalactivation diversities we find that response times increase as

total activation diversity increases This result is consistentwith previous findings for auditory word recognition [18]As expected given the results of Baayen et al [59] the priorwas a strong predictor with larger priors affording shorterresponse times There was a small effect of route congruencywhich emerged in a nonlinear interaction with the priorsuch that for large priors a greater route congruency affordedfurther reduction in response time (Figure 5 right panel)Apparently when the two routes converge on the samesemantic vector uncertainty is reduced and a faster responsecan be initiated

For comparison the dual-route model was implementedwith ndl as well Three measures were derived from thenetworks and used to predict the same response latenciesThese measures include activation activation diversity andprior all of which have been reported to be reliable predictorsfor RT in visual lexical decision [54 58 59] However sincehere we are dealing with two routes the three measures canbe independently derived from both routes We thereforesummed up the measures of the two routes obtaining threemeasures total activation total activation diversity and totalprior With word type and word length as control factorsTable 6 shows that thesemeasures participated in a three-wayinteraction presented in Figure 6 Total activation showed aU-shaped effect on RT that is increasingly attenuated as totalactivation diversity is increased (left panel) Total activationalso interacted with the total prior (right panel) For mediumrange values of total activation RTs increased with totalactivation and as expected decreased with the total priorThecenter panel shows that RTs decrease with total prior formostof the range of total activation diversity and that the effect oftotal activation diversity changes sign going from low to highvalues of the total prior

Model comparison revealed that the generalized additivemodel with ldl measures (Table 5) provides a substantiallyimproved fit to the data compared to the GAM using ndlmeasures (Table 6) with a difference of no less than 65101AIC units At the same time the GAM based on ldl is lesscomplex

Given the substantially better predictability of ldl mea-sures on human behavioral data one remaining question iswhether it is really the case that two routes are involved insilent reading After all the superior accuracy of the secondroute might be an artefact of a simple machine learningtechnique performing better for triphones than for trigramsThis question can be addressed by examining whether thefit of the GAM summarized in Table 5 improves or worsensdepending on whether the activation diversity of the first orthe second route is taken out of commission

When the GAM is provided access to just the activationdiversity of s2 the AIC increased by 10059 units Howeverwhen themodel is based only on the activation diversity of s1themodel fit increased by no less than 14790 AIC units Fromthis we conclude that at least for the visual lexical decisionlatencies in the BLP the second route first mapping trigramsto triphones and then mapping triphones onto semanticvectors plays the more important role

The superiority of the second route may in part bedue to the number of triphone features being larger

Complexity 17

minus0295

minus00175

026

part

ial e

ffect

1 2 3 4 50Total activation diversity

000

005

010

015

Part

ial e

ffect

05

10

15

20

Prio

r

02 04 06 08 1000Selfminuscorrelation

Figure 5 The partial effects of total activation diversity (left) and the interaction of route congruency and prior (right) on RT in the BritishLexicon Project

Table 5 Summary of a generalized additive model fitted to response latencies in visual lexical decision using measures based on ldl s thinplate regression spline smooth te tensor product smooth

A parametric coefficients Estimate Std Error t-value p-valueintercept -17774 00087 -205414 lt0001word typeInflected 01110 00059 18742 lt0001word typeMonomorphemic 00233 00054 4322 lt0001word length 00117 00011 10981 lt0001B smooth terms edf Refdf F p-values(total activation diversity) 6002 7222 236 lt0001te(route congruency prior) 14673 17850 2137 lt0001

02 06 10 14

NDL total activation

minus0171

0031

0233

part

ial e

ffect

minus4 minus2 0 2 4

10

15

20

25

30

35

40

NDL total prior

ND

L to

tal a

ctiv

atio

n di

vers

ity

minus0416

01915

0799

part

ial e

ffect

02 06 10 14

minus4

minus2

02

4

NDL total activation

ND

L to

tal p

rior

minus0158

01635

0485

part

ial e

ffect

10

15

20

25

30

35

40

ND

L to

tal a

ctiv

atio

n di

vers

ity

Figure 6The interaction of total activation and total activation diversity (left) of total prior by total activation diversity (center) and of totalactivation by total prior (right) on RT in the British Lexicon Project based on ndl

18 Complexity

Table 6 Summary of a generalized additive model fitted to response latencies in visual lexical decision using measures from ndl te tensorproduct smooth

A parametric coefficients Estimate Std Error t-value p-valueintercept -16934 00086 -195933 lt0001word typeInflected 00286 00052 5486 lt0001word typeMonomorphemic 00042 00055 0770 = 4word length 00066 00011 5893 lt0001B smooth terms edf Refdf F p-valuete(activation activation diversity prior) 4134 4972 1084 lt0001

than the number of trigram features (3465 trigrams ver-sus 5929 triphones) More features which mathematicallyamount to more predictors enable more precise map-pings Furthermore the heavy use made in English ofletter sequences such as ough (with 10 different pro-nunciations httpswwwdictionarycomesough)reduces semantic discriminability compared to the corre-sponding spoken forms It is noteworthy however that thebenefits of the triphone-to-semantics mapping are possibleonly thanks to the high accuracy with which orthographictrigram vectors are mapped onto phonological triphonevectors (92)

It is unlikely that the lsquophonological routersquo is always dom-inant in silent reading Especially in fast lsquodiagonalrsquo readingthe lsquodirect routersquomay bemore dominantThere is remarkablealthough for the present authors unexpected convergencewith the dual route model for reading aloud of Coltheartet al [108] and Coltheart [109] However while the text-to-phonology route of their model has as primary function toexplain why nonwords can be pronounced our results showthat both routes can actually be active when silently readingreal words A fundamental difference is however that in ourmodel wordsrsquo semantics play a central role

43 Auditory Comprehension For the modeling of readingwe made use of letter trigrams as cues These cues abstractaway from the actual visual patterns that fall on the retinapatterns that are already transformed at the retina beforebeing sent to the visual cortex Our hypothesis is that lettertrigrams represent those high-level cells or cell assemblies inthe visual system that are critical for reading andwe thereforeleave the modeling possibly with deep learning networksof how patterns on the retina are transformed into lettertrigrams for further research

As a consequence of the high level of abstraction of thetrigrams a word form is represented by a unique vectorspecifying which of a fixed set of letter trigrams is present inthe word Although one might consider modeling auditorycomprehension with phone triplets (triphones) replacing theletter trigrams of visual comprehension such an approachwould not do justice to the enormous variability of actualspeech Whereas the modeling of reading printed wordscan depart from the assumption that the pixels of a wordrsquosletters on a computer screen are in a fixed configurationindependently of where the word is shown on the screenthe speech signal of the same word type varies from token

to token as illustrated in Figure 7 for the English wordcrisis A survey of the Buckeye corpus [110] of spontaneousconversations recorded at Columbus Ohio [111] indicatesthat around 5 of the words are spoken with one syllablemissing and that a little over 20 of words have at least onephone missing

It is widely believed that the phoneme as an abstractunit of sound is essential for coming to grips with thehuge variability that characterizes the speech signal [112ndash114]However the phoneme as theoretical linguistic constructis deeply problematic [93] and for many spoken formscanonical phonemes do not do justice to the phonetics ofthe actual sounds [115] Furthermore if words are defined assequences of phones the problem arises what representationsto posit for words with two ormore reduced variants Addingentries for reduced forms to the lexicon turns out not toafford better overal recognition [116] Although exemplarmodels have been put forward to overcome this problem[117] we take a different approach here and following Arnoldet al [18] lay out a discriminative approach to auditorycomprehension

The cues that we make use of to represent the acousticsignal are the Frequency Band Summary Features (FBSFs)introduced by Arnold et al [18] as input cues FBSFs summa-rize the information present in the spectrogram of a speechsignal The algorithm that derives FBSFs first chunks theinput at the minima of the Hilbert amplitude envelope of thesignalrsquos oscillogram (see the upper panel of Figure 8) Foreach chunk the algorithm distinguishes 21 frequency bandsin the MEL scaled spectrum and intensities are discretizedinto 5 levels (lower panel in Figure 8) for small intervalsof time For each chunk and for each frequency band inthese chunks a discrete feature is derived that specifies chunknumber frequency band number and a description of thetemporal variation in the band bringing together minimummaximum median initial and final intensity values The21 frequency bands are inspired by the 21 receptive areason the cochlear membrane that are sensitive to differentranges of frequencies [118] Thus a given FBSF is a proxyfor cell assemblies in the auditory cortex that respond to aparticular pattern of changes over time in spectral intensityThe AcousticNDLCodeR package [119] for R [120] wasemployed to extract the FBSFs from the audio files

We tested ldl on 20 hours of speech sampled from theaudio files of the ucla library broadcast newsscapedata a vast repository of multimodal TV news broadcasts

Complexity 19

0 5000 10000 15000 20000 25000 30000 35000 40000minus6000

minus4000

minus2000

0

2000

4000

6000

8000

0 5000 10000 15000 20000 25000 30000 35000 40000minus6000

minus4000

minus2000

0

2000

4000

6000

8000

Figure 7Oscillogram for twodifferent realizations of theword crisiswith different degrees of reduction in theNewsScape archive [kraız](left)and [khraısıs](right)

00 01 02 03 04 05 06time

minus020minus015minus010minus005

000005010015

ampl

itude

1

21

frequ

ency

ban

d

Figure 8 Oscillogram of the speech signal for a realization of the word economic with Hilbert envelope (in orange) outlining the signal isshown on the top panel The lower panel depicts the discretized MEL scaled spectrum of the signalThe vertical bars are the boundary pointsthat partition the signal into 4 chunks For one chunk horizontal lines (in red) highlight one of the frequency bands for which the FBSFsprovide summaries of the variation over time in that frequency band The FBSF for the highlighted band is ldquoband11-start3-median3-min2-max5-end3-part2rdquo

provided to us by the Distributed Little Red Hen Lab Theaudio files of this resource were automatically classified asclean for relatively clean parts where there is speech withoutbackground noise or music and noisy for speech snippetswhere background noise or music is present Here we reportresults for 20 hours of clean speech to a total of 131673word tokens (representing 4779word types) with in all 40639distinct FBSFs

The FBSFs for the word tokens are brought together ina matrix C119886 with dimensions 131673 audio tokens times 40639FBSFs The targeted semantic vectors are taken from the Smatrix which is expanded to a matrix with 131673 rows onefor each audio token and 4609 columns the dimension of

the semantic vectors Although the transformation matrix Fcould be obtained by calculating C1015840S the calculation of C1015840is numerically expensive To reduce computational costs wecalculated F as follows (the transpose of a square matrix Xdenoted by X119879 is obtained by replacing the upper triangleof the matrix by the lower triangle and vice versa Thus( 3 87 2 )119879 = ( 3 78 2 ) For a nonsquare matrix the transpose isobtained by switching rows and columnsThus a 4times2 matrixbecomes a 2 times 4 matrix when transposed)

CF = S

C119879CF = C119879S

20 Complexity

(C119879C)minus1 (C119879C) F = (C119879C)minus1 C119879SF = (C119879C)minus1 C119879S

(17)

In this way matrix inversion is required for a much smallermatrixC119879C which is a square matrix of size 40639 times 40639

To evaluate model performance we compared the esti-mated semantic vectors with the targeted semantic vectorsusing the Pearson correlation We therefore calculated the131673 times 131673 correlation matrix for all possible pairs ofestimated and targeted semantic vectors Precisionwas calcu-lated by comparing predicted vectors with the gold standardprovided by the targeted vectors Recognition was defined tobe successful if the correlation of the predicted vector withthe targeted gold vector was the highest of all the pairwisecorrelations of this predicted vector with any of the other goldsemantic vectors Precision defined as the proportion of cor-rect recognitions divided by the total number of audio tokenswas at 3361 For the correctly identified words the meancorrelation was 072 and for the incorrectly identified wordsit was 055 To place this in perspective a naive discrimi-nation model with discrete lexomes as output performed at12 and a deep convolution network Mozilla DeepSpeech(httpsgithubcommozillaDeepSpeech based onHannun et al [9]) performed at 6 The low performanceofMozilla DeepSpeech is due primarily due to its dependenceon a languagemodelWhenpresentedwith utterances insteadof single words it performs remarkably well

44 Discussion A problem with naive discriminative learn-ing that has been puzzling for a long time is that measuresbased on ndl performed well as predictors of processingtimes [54 58] whereas accuracy for lexical decisions waslow An accuracy of 27 for a model performing an 11480-classification task is perhaps reasonable but the lack ofprecision is unsatisfactory when the goal is to model humanvisual lexicality decisions Bymoving fromndl to ldlmodelaccuracy is substantially improved (to 59) At the sametime predictions for reaction times improved considerablyas well As lexicality decisions do not require word identifica-tion further improvement in predicting decision behavior isexpected to be possible by considering not only whether thepredicted semantic is closest to the targeted vector but alsomeasures such as how densely the space around the predictedsemantic vector is populated

Accuracy for auditory comprehension is lower for thedata we considered above at around 33 Interestingly aseries of studies indicates that recognizing isolated wordstaken out of running speech is a nontrivial task also forhuman listeners [121ndash123] Correct identification by nativespeakers of 1000 randomly sampled word tokens from aGerman corpus of spontaneous conversational speech rangedbetween 21 and 44 [18] For both human listeners andautomatic speech recognition systems recognition improvesconsiderably when words are presented in their naturalcontext Given that ldl with FBSFs performs very wellon isolated word recognition it seems worth investigating

further whether the present approach can be developed into afully-fledged model of auditory comprehension that can takefull utterances as input For a blueprint of how we plan toimplement such a model see Baayen et al [124]

5 Speech Production

This section examines whether we can predict wordsrsquo formsfrom the semantic vectors of S If this is possible for thepresent dataset with reasonable accuracy we have a proofof concept that discriminative morphology is feasible notonly for comprehension but also for speech productionThe first subsection introduces the computational imple-mentation The next subsection reports on the modelrsquosperformance which is evaluated first for monomorphemicwords then for inflected words and finally for derivedwords Section 53 provides further evidence for the pro-duction network by showing that as the support from thesemantics for the triphones becomes weaker the amount oftime required for articulating the corresponding segmentsincreases

51 Computational Implementation For a productionmodelsome representational format is required for the output thatin turn drives articulation In what follows we make useof triphones as output features Triphones capture part ofthe contextual dependencies that characterize speech andthat render problematic the phoneme as elementary unit ofa phonological calculus [93] Triphones are in many waysnot ideal in that they inherit the limitations that come withdiscrete units Other output features structured along thelines of gestural phonology [125] or time series of movementsof key articulators registered with electromagnetic articulog-raphy or ultrasound are on our list for further explorationFor now we use triphones as a convenience construct andwe will show that given the support from the semantics forthe triphones the sequence of phones can be reconstructedwith high accuracy As some models of speech productionassemble articulatory targets from phone segments (eg[126]) the present implementation can be seen as a front-endfor this type of model

Before putting the model to the test we first clarify theway the model works by means of our toy lexicon with thewords one two and three Above we introduced the semanticmatrix S (13) which we repeat here for convenience

S = one two threeonetwothree

( 100201031001

040110 ) (18)

as well as an C indicator matrix specifying which triphonesoccur in which words (12) As in what follows this matrixspecifies the triphones targeted for productionwe henceforthrefer to this matrix as the Tmatrix

Complexity 21

T

= wV wVn Vn tu tu Tr Tri rionetwothree

( 100100

100010

010001

001001 ) (19)

For production our interest is in thematrixG that transformsthe row vectors of S into the row vectors of T ie we need tosolve

SG = T (20)

Given G we can predict for any semantic vector s the vectorof triphones t that quantifies the support for the triphonesprovided by s simply by multiplying s with G

t = sG (21)

As before the transformation matrix G is straightforwardto estimate Let S1015840 denote the Moore-Penrose generalizedinverse of S Since

S1015840SG = S1015840T (22)

we have

G = S1015840T (23)

Given G we can predict the triphone matrix T from thesemantic matrix S

SG = T (24)

For the present example the inverse of S and S1015840 is

S1015840 = one two threeonetwothree

( 110minus021minus009minus029107minus008

minus041minus002104 ) (25)

and the transformation matrix G is

G = wV wVn Vn tu tu Tr Tri ri

onetwothree

( 110minus021minus009110minus021minus009

110minus021minus009minus029107minus008

minus029107minus008minus041minus002104

minus041minus002104minus041minus002104 ) (26)

For this simple example T is virtually identical to T Forrealistic data T will not be identical to T but will be anapproximation of it that is optimal in the least squares senseThe triphones with the strongest support are expected to bethe most likely to be the triphones defining a wordrsquos form

We made use of the S matrix which we derived fromthe tasa corpus as described in Section 3 The majority ofcolumns of the 23561 times 23561 matrix S show very smalldeviations from zero and hence are uninformative As beforewe reduced the number of columns of S by removing columnswith very low variance Here one option is to remove allcolumns with a variance below a preset threshold 120579 Howeverto avoid adding a threshold as a free parameter we setthe number of columns retained to the number of differenttriphones in a given dataset a number which is around 119899 =4500 In summary S denotes a 119908times119899 matrix that specifies foreach of 119908 words an 119899-dimensional semantic vector

52 Model Performance

521 Performance on Monolexomic Words We first exam-ined model performance on a dataset comprising monolex-omicwords that did not carry any inflectional exponentsThisdataset of 3987words comprised three irregular comparatives

(elder less andmore) two irregular superlatives (least most)28 irregular past tense forms and one irregular past participle(smelt) For this set of words we constructed a 3987 times 4446matrix of semantic vectors S119898 (a submatrix of the S matrixintroduced above in Section 3) and a 3987 times 4446 triphonematrix T119898 (a submatrix of T) The number of colums of S119898was set to the number of different triphones (the columndimension of T119898) Those column vectors of S119898 were retainedthat had the 4446 highest variances We then estimated thetransformation matrix G and used this matrix to predict thetriphones that define wordsrsquo forms

We evaluated model performance in two ways First weinspected whether the triphones with maximal support wereindeed the targeted triphones This turned out to be the casefor all words Targeted triphones had an activation valueclose to one and nontargeted triphones an activation closeto zero As the triphones are not ordered we also investi-gated whether the sequence of phones could be constructedcorrectly for these words To this end we set a thresholdof 099 extracted all triphones with an activation exceedingthis threshold and used the all simple paths (a path issimple if the vertices it visits are not visited more than once)function from the igraph package [127] to calculate all pathsstarting with any left-edge triphone in the set of extractedtriphones From the resulting set of paths we selected the

22 Complexity

longest path which invariably was perfectly aligned with thesequence of triphones that defines wordsrsquo forms

We also evaluated model performance with a secondmore general heuristic algorithm that also makes use of thesame algorithm from graph theory Our algorithm sets up agraph with vertices collected from the triphones that are bestsupported by the relevant semantic vectors and considers allpaths it can find that lead from an initial triphone to a finaltriphoneThis algorithm which is presented inmore detail inthe appendix and which is essential for novel complex wordsproduced the correct form for 3982 out of 3987 words Itselected a shorter form for five words int for intent lin forlinnen mis for mistress oint for ointment and pin for pippinThe correct forms were also found but ranked second due toa simple length penalty that is implemented in the algorithm

From these analyses it is clear that mapping nearly4000 semantic vectors on their corresponding triphone pathscan be accomplished with very high accuracy for Englishmonolexomic words The question to be addressed nextis how well this approach works for complex words Wefirst address inflected forms and limit ourselves here to theinflected variants of the present set of 3987 monolexomicwords

522 Performance on Inflected Words Following the classicdistinction between inflection and word formation inflectedwords did not receive semantic vectors of their own Never-theless we can create semantic vectors for inflected words byadding the semantic vector of an inflectional function to thesemantic vector of its base However there are several ways inwhich themapping frommeaning to form for inflectedwordscan be set up To explain this we need some further notation

Let S119898 and T119898 denote the submatrices of S and T thatcontain the semantic and triphone vectors of monolexomicwords Assume that a subset of 119896 of these monolexomicwords is attested in the training corpus with inflectionalfunction 119886 Let T119886 denote the matrix with the triphonevectors of these inflected words and let S119886 denote thecorresponding semantic vectors To obtain S119886 we take thepertinent submatrix S119898119886 from S119898 and add the semantic vectors119886 of the affix

S119886 = S119898119886 + i⨂ s119886 (27)

Here i is a unit vector of length 119896 and ⨂ is the generalizedKronecker product which in (27) stacks 119896 copies of s119886 row-wise As a first step we could define a separate mapping G119886for each inflectional function 119886

S119886G119886 = T119886 (28)

but in this set-up learning of inflected words does not benefitfrom the knowledge of the base words This can be remediedby a mapping for augmented matrices that contain the rowvectors for both base words and inflected words[S119898

S119886]G119886 = [T119898

T119886] (29)

The dimension of G119886 (length of semantic vector by lengthof triphone vector) remains the same so this option is notmore costly than the preceding one Nevertheless for eachinflectional function a separate large matrix is required Amuch more parsimonious solution is to build augmentedmatrices for base words and all inflected words jointly

[[[[[[[[[[S119898S1198861S1198862S119886119899

]]]]]]]]]]G = [[[[[[[[[[

T119898T1198861T1198862T119886119899

]]]]]]]]]](30)

The dimension of G is identical to that of G119886 but now allinflectional functions are dealt with by a single mappingIn what follows we report the results obtained with thismapping

We selected 6595 inflected variants which met the cri-terion that the frequency of the corresponding inflectionalfunction was at least 50 This resulted in a dataset with 91comparatives 97 superlatives 2401 plurals 1333 continuousforms (eg walking) 859 past tense forms and 1086 formsclassed as perfective (past participles) as well as 728 thirdperson verb forms (eg walks) Many forms can be analyzedas either past tenses or as past participles We followed theanalyses of the treetagger which resulted in a dataset inwhichboth inflectional functions are well-attested

Following (30) we obtained an (augmented) 10582 times5483 semantic matrix S whereas before we retained the 5483columns with the highest column variance The (augmented)triphone matrix T for this dataset had the same dimensions

Inspection of the activations of the triphones revealedthat targeted triphones had top activations for 85 of themonolexomic words and 86 of the inflected words Theproportion of words with at most one intruding triphone was97 for both monolexomic and inflected words The graph-based algorithm performed with an overall accuracy of 94accuracies broken down bymorphology revealed an accuracyof 99 for the monolexomic words and an accuracy of 92for inflected words One source of errors for the productionalgorithm is inconsistent coding in the celex database Forinstance the stem of prosper is coded as having a final schwafollowed by r but the inflected forms are coded without ther creating a mismatch between a partially rhotic stem andcompletely non-rhotic inflected variants

We next put model performance to a more stringent testby using 10-fold cross-validation for the inflected variantsFor each fold we trained on all stems and 90 of all inflectedforms and then evaluated performance on the 10of inflectedforms that were not seen in training In this way we can ascer-tain the extent to which our production system (network plusgraph-based algorithm for ordering triphones) is productiveWe excluded from the cross-validation procedure irregularinflected forms forms with celex phonological forms withinconsistent within-paradigm rhoticism as well as forms thestem of which was not available in the training set Thus

Complexity 23

cross-validation was carried out for a total of 6236 inflectedforms

As before the semantic vectors for inflected words wereobtained by addition of the corresponding content and inflec-tional semantic vectors For each training set we calculatedthe transformationmatrixG from the S andTmatrices of thattraining set For an out-of-bag inflected form in the test setwe calculated its semantic vector and multiplied this vectorwith the transformation matrix (using (21)) to obtain thepredicted triphone vector t

The proportion of forms that were predicted correctlywas 062 The proportion of forms ranked second was 025Forms that were incorrectly ranked first typically were otherinflected forms (including bare stems) that happened toreceive stronger support than the targeted form Such errorsare not uncommon in spoken English For instance inthe Buckeye corpus [110] closest is once reduced to closFurthermore Dell [90] classified forms such as concludementfor conclusion and he relax for he relaxes as (noncontextual)errors

The algorithm failed to produce the targeted form for3 of the cases Examples of the forms produced instead arethe past tense for blazed being realized with [zId] insteadof [zd] the plural mouths being predicted as having [Ts] ascoda rather than [Ds] and finest being reduced to finst Thevoiceless production formouth does not follow the dictionarynorm but is used as attested by on-line pronunciationdictionaries Furthermore the voicing alternation is partlyunpredictable (see eg [128] for final devoicing in Dutch)and hence model performance here is not unreasonable Wenext consider model accuracy for derived words

523 Performance on Derived Words When building thevector space model we distinguished between inflectionand word formation Inflected words did not receive theirown semantic vectors By contrast each derived word wasassigned its own lexome together with a lexome for itsderivational function Thus happiness was paired with twolexomes happiness and ness Since the semantic matrix forour dataset already contains semantic vectors for derivedwords we first investigated how well forms are predictedwhen derived words are assessed along with monolexomicwords without any further differentiation between the twoTo this end we constructed a semantic matrix S for 4885words (rows) by 4993 (columns) and constructed the trans-formation matrix G from this matrix and the correspondingtriphone matrix T The predicted form vectors of T =SG supported the targeted triphones above all other tri-phones without exception Furthermore the graph-basedalgorithm correctly reconstructed 99 of all forms withonly 5 cases where it assigned the correct form secondrank

Next we inspected how the algorithm performs whenthe semantic vectors of derived forms are obtained fromthe semantic vectors of their base words and those of theirderivational lexomes instead of using the semantic vectors ofthe derived words themselves To allow subsequent evalua-tion by cross-validation we selected those derived words that

contained an affix that occured at least 30 times in our data set(again (38) agent (177) ful (45) instrument (82) less(54) ly (127) and ness (57)) to a total of 770 complex words

We first combined these derived words with the 3987monolexomic words For the resulting 4885 words the Tand S matrices were constructed from which we derivedthe transformation matrix G and subsequently the matrixof predicted triphone strengths T The proportion of wordsfor which the targeted triphones were the best supportedtriphones was 096 and the graph algorithm performed withan accuracy of 989

In order to assess the productivity of the system weevaluated performance on derived words by means of 10-fold cross-validation For each fold we made sure that eachaffix was present proportionally to its frequency in the overalldataset and that a derived wordrsquos base word was included inthe training set

We first examined performance when the transformationmatrix is estimated from a semantic matrix that contains thesemantic vectors of the derived words themselves whereassemantic vectors for unseen derived words are obtained bysummation of the semantic vectors of base and affix It turnsout that this set-up results in a total failure In the presentframework the semantic vectors of derived words are tooidiosyncratic and too finely tuned to their own collocationalpreferences They are too scattered in semantic space tosupport a transformation matrix that supports the triphonesof both stem and affix

We then used exactly the same cross-validation procedureas outlined above for inflected words constructing semanticvectors for derived words from the semantic vectors of baseand affix and calculating the transformation matrix G fromthese summed vectors For unseen derivedwordsGwas usedto transform the semantic vectors for novel derived words(obtained by summing the vectors of base and affix) intotriphone space

For 75 of the derived words the graph algorithmreconstructed the targeted triphones For 14 of the derivednouns the targeted form was not retrieved These includecases such as resound which the model produced with [s]instead of the (unexpected) [z] sewer which the modelproduced with R instead of only R and tumbler where themodel used syllabic [l] instead of nonsyllabic [l] given in thecelex target form

53 Weak Links in the Triphone Graph and Delays in SpeechProduction An important property of the model is that thesupport for trigrams changes where a wordrsquos graph branchesout for different morphological variants This is illustratedin Figure 9 for the inflected form blending Support for thestem-final triphone End is still at 1 but then blend can endor continue as blends blended or blending The resultinguncertainty is reflected in the weights on the edges leavingEnd In this example the targeted ing form is driven by theinflectional semantic vector for continuous and hence theedge to ndI is best supported For other forms the edgeweights will be different and hence other paths will be bettersupported

24 Complexity

1

1

1051

024023

03

023

1

001 001

1

02

03

023

002

1 0051

bl

blE

lEn

EndndI

dIN

IN

INI

NIN

dId

Id

IdI

dII

IIN

ndz

dz

dzI

zIN

nd

EnI

nIN

lEI

EIN

Figure 9 The directed graph for blending Vertices represent triphones including triphones such as IIN that were posited by the graphalgorithm to bridge potentially relevant transitions that are not instantiated in the training data Edges are labelled with their edge weightsThe graph the edge weights of which are specific to the lexomes blend and continuous incorporates not only the path for blending butalso the paths for blend blends and blended

The uncertainty that arises at branching points in thetriphone graph is of interest in the light of several exper-imental results For instance inter keystroke intervals intyping become longer at syllable and morph boundaries[129 130] Evidence from articulography suggests variabilityis greater at morph boundaries [131] Longer keystroke exe-cution times and greater articulatory variability are exactlywhat is expected under reduced edge support We thereforeexamined whether the edge weights at the first branchingpoint are predictive for lexical processing To this end weinvestigated the acoustic duration of the segment at the centerof the first triphone with a reduced LDL edge weight (in thepresent example d in ndI henceforth lsquobranching segmentrsquo)for those words in our study that are attested in the Buckeyecorpus [110]

The dataset we extracted from the Buckeye corpus com-prised 15105 tokens of a total of 1327 word types collectedfrom 40 speakers For each of these words we calculatedthe relative duration of the branching segment calculatedby dividing segment duration by word duration For thepurposes of statistical evaluation the distribution of thisrelative duration was brought closer to normality bymeans ofa logarithmic transformationWe fitted a generalized additivemixed model [132] to log relative duration with randomintercepts for word and speaker log LDL edge weight aspredictor of interest and local speech rate (Phrase Rate)log neighborhood density (Ncount) and log word frequencyas control variables A summary of this model is presentedin Table 7 As expected an increase in LDL Edge Weight

goes hand in hand with a reduction in the duration of thebranching segment In other words when the edge weight isreduced production is slowed

The adverse effects of weaknesses in a wordrsquos path inthe graph where the path branches is of interest againstthe discussion about the function of weak links in diphonetransitions in the literature on comprehension [133ndash138] Forcomprehension it has been argued that bigram or diphonelsquotroughsrsquo ie low transitional probabilities in a chain of hightransitional probabilities provide points where sequencesare segmented and parsed into their constituents Fromthe perspective of discrimination learning however low-probability transitions function in exactly the opposite wayfor comprehension [54 124] Naive discriminative learningalso predicts that troughs should give rise to shorter process-ing times in comprehension but not because morphologicaldecomposition would proceed more effectively Since high-frequency boundary bigrams and diphones are typically usedword-internally across many words they have a low cuevalidity for thesewords Conversely low-frequency boundarybigrams are much more typical for very specific base+affixcombinations and hence are better discriminative cues thatafford enhanced activation of wordsrsquo lexomes This in turngives rise to faster processing (see also Ramscar et al [78] forthe discriminative function of low-frequency bigrams in low-frequency monolexomic words)

There is remarkable convergence between the directedgraphs for speech production such as illustrated in Figure 9and computational models using temporal self-organizing

Complexity 25

Table 7 Statistics for the partial effects of a generalized additive mixed model fitted to the relative duration of edge segments in the Buckeyecorpus The trend for log frequency is positive accelerating TPRS thin plate regression spline

A parametric coefficients Estimate Std Error t-value p-valueIntercept -19165 00592 -323908 lt 00001Log LDL Edge Weight -01358 00400 -33960 00007Phrase Rate 00047 00021 22449 00248Log Ncount 01114 00189 58837 lt 00001B smooth terms edf Refdf F-value p-valueTPRS log Frequency 19007 19050 75495 00004random intercepts word 11498839 13230000 314919 lt 00001random intercepts speaker 302995 390000 345579 lt 00001

maps (TSOMs [139ndash141]) TSOMs also use error-drivenlearning and in recent work attention is also drawn toweak edges in wordsrsquo paths in the TSOM and the conse-quences thereof for lexical processing [142] We are not usingTSOMs however one reason being that in our experiencethey do not scale up well to realistically sized lexiconsA second reason is that we are pursuing the hypothesisthat form serves meaning and that self-organization ofform by itself is not necessary This hypothesis is likelytoo strong especially as we do not provide a rationale forthe trigram and triphone units that we use to representaspects of form It is worth noting that spatial organizationof form features is not restricted to TSOMs Although inour approach the spatial organization of triphone features isleft unspecified such an organization can be easily enforced(see [85] for further details) by using an algorithm suchas graphopt which self-organizes the vertices of a graphinto a two-dimensional plane Figure 9 was obtained withthis algorithm (Graphopt was developed for the layout oflarge graphs httpwwwschmuhlorggraphopt andis implemented in the igraph package Graphopt uses basicprinciples of physics to iteratively determine an optimallayout Each node in the graph is given both mass and anelectric charge and edges between nodes are modeled asspringsThis sets up a system inwhich there are attracting andrepelling forces between the vertices of the graph and thisphysical system is simulated until it reaches an equilibrium)

54 Discussion Our production network reconstructsknown wordsrsquo forms with a high accuracy 999 formonolexomic words 92 for inflected words and 99 forderived words For novel complex words accuracy under10-fold cross-validation was 62 for inflected words and 75for derived words

The drop in performance for novel forms is perhapsunsurprising given that speakers understand many morewords than they themselves produce even though they hearnovel forms on a fairly regular basis as they proceed throughlife [79 143]

However we also encountered several technical problemsthat are not due to the algorithm but to the representationsthat the algorithm has had to work with First it is surprisingthat accuracy is as high as it is given that the semanticvectors are constructed from a small corpus with a very

simple discriminative algorithm Second we encounteredinconsistencies in the phonological forms retrieved from thecelex database inconsistencies that in part are due to theuse of discrete triphones Third many cases where the modelpredictions do not match the targeted triphone sequence thetargeted forms have a minor irregularity (eg resound with[z] instead of [s]) Fourth several of the typical errors that themodel makes are known kinds of speech errors or reducedforms that one might encounter in engaged conversationalspeech

It is noteworthy that themodel is almost completely data-driven The triphones are derived from celex and otherthan somemanual corrections for inconsistencies are derivedautomatically from wordsrsquo phone sequences The semanticvectors are based on the tasa corpus and were not in any wayoptimized for the production model Given the triphone andsemantic vectors the transformation matrix is completelydetermined No by-hand engineering of rules and exceptionsis required nor is it necessary to specify with hand-codedlinks what the first segment of a word is what its secondsegment is etc as in the weaver model [37] It is only in theheuristic graph algorithm that three thresholds are requiredin order to avoid that graphs become too large to remaincomputationally tractable

The speech production system that emerges from thisapproach comprises first of all an excellent memory forforms that have been encountered before Importantly thismemory is not a static memory but a dynamic one It is not arepository of stored forms but it reconstructs the forms fromthe semantics it is requested to encode For regular unseenforms however a second network is required that projects aregularized semantic space (obtained by accumulation of thesemantic vectors of content and inflectional or derivationalfunctions) onto the triphone output space Importantly itappears that no separate transformations or rules are requiredfor individual inflectional or derivational functions

Jointly the two networks both of which can also betrained incrementally using the learning rule ofWidrow-Hoff[44] define a dual route model attempts to build a singleintegrated network were not successfulThe semantic vectorsof derived words are too idiosyncratic to allow generalizationfor novel forms It is the property of the semantic vectorsof derived lexomes described in Section 323 namely thatthey are close to but not inside the cloud of their content

26 Complexity

lexomes which makes them productive Novel forms do notpartake in the idiosyncracies of lexicalized existing wordsbut instead remain bound to base and derivational lexomesIt is exactly this property that makes them pronounceableOnce produced a novel form will then gravitate as experi-ence accumulates towards the cloud of semantic vectors ofits morphological category meanwhile also developing itsown idiosyncracies in pronunciation including sometimeshighly reduced pronunciation variants (eg tyk for Dutchnatuurlijk ([natyrlk]) [111 144 145] It is perhaps possibleto merge the two routes into one but for this much morefine-grained semantic vectors are required that incorporateinformation about discourse and the speakerrsquos stance withrespect to the adressee (see eg [115] for detailed discussionof such factors)

6 Bringing in Time

Thus far we have not considered the role of time The audiosignal comes in over time longerwords are typically readwithmore than one fixation and likewise articulation is a temporalprocess In this section we briefly outline how time can bebrought into the model We do so by discussing the readingof proper names

Proper names (morphologically compounds) pose achallenge to compositional theories as the people referredto by names such as Richard Dawkins Richard NixonRichardThompson and SandraThompson are not obviouslysemantic composites of each other We therefore assignlexomes to names irrespective of whether the individu-als referred to are real or fictive alive or dead Further-more we assume that when the personal name and familyname receive their own fixations both names are under-stood as pertaining to the same named entity which istherefore coupled with its own unique lexome Thus forthe example sentences in Table 8 the letter trigrams ofJohn are paired with the lexome JohnClark and like-wise the letter trigrams of Clark are paired with this lex-ome

By way of illustration we obtained thematrix of semanticvectors training on 100 randomly ordered tokens of thesentences of Table 8 We then constructed a trigram cuematrix C specifying for each word (John Clark wrote )which trigrams it contains In parallel a matrix L specify-ing for each word its corresponding lexomes (JohnClarkJohnClark write ) was set up We then calculatedthe matrix F by solving CF = L and used F to cal-culate estimated (predicted) semantic vectors L Figure 10presents the correlations of the estimated semantic vectorsfor the word forms John John Welsch Janet and Clarkwith the targeted semantic vectors for the named entitiesJohnClark JohnWelsch JohnWiggam AnneHastieand JaneClark For the sequentially read words John andWelsch the semantic vector generated by John and thatgenerated by Welsch were summed to obtain the integratedsemantic vector for the composite name

Figure 10 upper left panel illustrates that upon readingthe personal name John there is considerable uncertainty

about which named entity is at issue When subsequentlythe family name is read (upper right panel) uncertainty isreduced and John Welsch now receives full support whereasother named entities have correlations close to zero or evennegative correlations As in the small world of the presenttoy example Janet is a unique personal name there is nouncertainty about what named entity is at issue when Janetis read For the family name Clark on its own by contrastJohn Clark and Janet Clark are both viable options

For the present example we assumed that every ortho-graphic word received one fixation However depending onwhether the eye lands far enough into the word namessuch as John Clark and compounds such as graph theorycan also be read with a single fixation For single-fixationreading learning events should comprise the joint trigrams ofthe two orthographic words which are then simultaneouslycalibrated against the targeted lexome (JohnClark graph-theory)

Obviously the true complexities of reading such asparafoveal preview are not captured by this example Never-theless as a rough first approximation this example showsa way forward for integrating time into the discriminativelexicon

7 General Discussion

We have presented a unified discrimination-driven modelfor visual and auditory word comprehension and wordproduction the discriminative lexicon The layout of thismodel is visualized in Figure 11 Input (cochlea and retina)and output (typing and speaking) systems are presented inlight gray the representations of themodel are shown in darkgray For auditory comprehension we make use of frequencyband summary (FBS) features low-level cues that we deriveautomatically from the speech signal The FBS features serveas input to the auditory network F119886 that maps vectors of FBSfeatures onto the semantic vectors of S For reading lettertrigrams represent the visual input at a level of granularitythat is optimal for a functional model of the lexicon (see [97]for a discussion of lower-level visual features) Trigrams arerelatively high-level features compared to the FBS featuresFor features implementing visual input at a much lower levelof visual granularity using histograms of oriented gradients[146] see Linke et al [15] Letter trigrams are mapped by thevisual network F119900 onto S

For reading we also implemented a network heredenoted by K119886 that maps trigrams onto auditory targets(auditory verbal images) represented by triphones Theseauditory triphones in turn are mapped by network H119886onto the semantic vectors S This network is motivated notonly by our observation that for predicting visual lexicaldecision latencies triphones outperform trigrams by a widemargin Network H119886 is also part of the control loop forspeech production see Hickok [147] for discussion of thiscontrol loop and the necessity of auditory targets in speechproduction Furthermore the acoustic durations with whichEnglish speakers realize the stem vowels of English verbs[148] as well as the duration of word final s in English [149]

Complexity 27

Table 8 Example sentences for the reading of proper names To keep the example simple inflectional lexomes are not taken into account

John Clark wrote great books about antsJohn Clark published a great book about dangerous antsJohn Welsch makes great photographs of airplanesJohn Welsch has a collection of great photographs of airplanes landingJohn Wiggam will build great harpsichordsJohn Wiggam will be a great expert in tuning harpsichordsAnne Hastie has taught statistics to great second year studentsAnne Hastie has taught probability to great first year studentsJanet Clark teaches graph theory to second year studentsJohnClark write great book about antJohnClark publish a great book about dangerous antJohnWelsch make great photograph of airplaneJohnWelsch have a collection of great photograph airplane landingJohnWiggam future build great harpsichordJohnWiggam future be a great expert in tuning harpsichordAnneHastie have teach statistics to great second year studentAnneHastie have teach probability to great first year studentJanetClark teach graphtheory to second year student

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

John

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

John Welsch

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

Janet

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

Clark

Pear

son

corr

elat

ion

minus0

50

00

51

0

Figure 10 Correlations of estimated semantic vectors for John JohnWelsch Janet andClarkwith the targeted semantic vectors of JohnClarkJohnWelsch JohnWiggam AnneHastie and JaneClark

can be predicted from ndl networks using auditory targets todiscriminate between lexomes [150]

Speech production is modeled by network G119886 whichmaps semantic vectors onto auditory targets which in turnserve as input for articulation Although current modelsof speech production assemble articulatory targets fromphonemes (see eg [126 147]) a possibility that is certainlycompatible with our model when phones are recontextual-ized as triphones we think that it is worthwhile investigatingwhether a network mapping semantic vectors onto vectors ofarticulatory parameters that in turn drive the articulators canbe made to work especially since many reduced forms arenot straightforwardly derivable from the phone sequences oftheir lsquocanonicalrsquo forms [111 144]

We have shown that the networks of our model havereasonable to high production and recognition accuraciesand also generate a wealth of well-supported predictions forlexical processing Central to all networks is the hypothesisthat the relation between form and meaning can be modeleddiscriminatively with large but otherwise surprisingly simplelinear networks the underlyingmathematics of which (linearalgebra) is well-understood Given the present results it isclear that the potential of linear algebra for understandingthe lexicon and lexical processing has thus far been severelyunderestimated (the model as outlined in Figure 11 is amodular one for reasons of computational convenience andinterpretabilityThe different networks probably interact (seefor some evidence [47]) One such interaction is explored

28 Complexity

typing speaking

orthographic targets

trigrams

auditory targets

triphones

semantic vectors

TASA

auditory cues

FBS features

orthographic cues

trigrams

cochlea retina

To Ta

Ha

GoGa

KaS

Fa Fo

CoCa

Figure 11 Overview of the discriminative lexicon Input and output systems are presented in light gray the representations of the model areshown in dark gray Each of these representations is a matrix with its own set of features Arcs are labeled with the discriminative networks(transformationmatrices) thatmapmatrices onto each otherThe arc from semantic vectors to orthographic vectors is in gray as thismappingis not explored in the present study

in Baayen et al [85] In their production model for Latininflection feedback from the projected triphone paths backto the semantics (synthesis by analysis) is allowed so thatof otherwise equally well-supported paths the path that bestexpresses the targeted semantics is selected For informationflows between systems in speech production see Hickok[147]) In what follows we touch upon a series of issues thatare relevant for evaluating our model

Incremental learning In the present study we esti-mated the weights on the connections of these networkswith matrix operations but importantly these weights canalso be estimated incrementally using the learning rule ofWidrow and Hoff [44] further improvements in accuracyare expected when using the Kalman filter [151] As allmatrices can be updated incrementally (and this holds aswell for the matrix with semantic vectors which are alsotime-variant) and in theory should be updated incrementally

whenever information about the order of learning events isavailable the present theory has potential for studying lexicalacquisition and the continuing development of the lexiconover the lifetime [79]

Morphology without compositional operations Wehave shown that our networks are predictive for a rangeof experimental findings Important from the perspective ofdiscriminative linguistics is that there are no compositionaloperations in the sense of Frege and Russell [1 2 152] Thework on compositionality in logic has deeply influencedformal linguistics (eg [3 153]) and has led to the beliefthat the ldquoarchitecture of the language facultyrdquo is groundedin a homomorphism between a calculus (or algebra) ofsyntactic representations and a calculus (or algebra) based onsemantic primitives Within this tradition compositionalityarises when rules combining representations of form arematched with rules combining representations of meaning

Complexity 29

The approach of Marelli and Baroni [60] who derive thesemantic vector of a complex word from the semantic vectorof its base and a dedicated affix-specific mapping from thepertinent base words to the corresponding derived wordsis in the spirit of compositionality in that a change in form(through affixation) goes hand in hand with a change inmeaning (modeled with a linear mapping in semantic space)

The perspective on morphological processing developedin the present study breaks with this tradition Althoughsemantic vectors for inflected words are obtained by summa-tion of the semantic vectors of the base word and those ofits inflectional functions (see [85] for modeling of the richverbal paradigms of Latin) the form vectors correspondingto the resulting semantic vectors are not obtained by thesummation of vectors for chunks of form nor by any otheroperations on forms Attempts to align chunks of wordsrsquo formwith their semantic structures mdash typically represented bygraphs constructed over semantic primitives (see eg [154])mdash are destined to end in quagmires of arbitrary decisionsabout the number of inflectional classes required (Estonian30 or 300) what the chunks should look like (does Germanhave a morpheme d in its articles and demonstratives derdie das diese ) whether Semitic morphology requiresoperations on form on one or two tiers [155ndash157] andhow many affixal position slots are required for Athebascanlanguages such asNavajo [158] A language that is particularlypainful for compositional theories is Yelı Dnye an Indo-Pacific language spoken on Rossel island (in the south-east ofPapuaNewGuinea)The language has a substantial inventoryof over a 1000 function words used for verb agreementTypically these words are monosyllables that simultaneouslyspecify values for negation tense aspect person and numberof subject deixis evidentiality associated motion and coun-terfactuality Furthermore verbs typically have no less thaneight different stems the use of which depends on particularconstellations of inflectional functions [159 160] Yelı Dnyethus provides a striking counterexample to the hypothesisthat a calculus of meaning would be paralleled by a calculusof form

Whereas the theory of linear discriminative learningradically rejects the idea that morphological processing isgrounded in compositionality as defined in mainstream for-mal linguistic theory it does allow for the conjoint expressionin form of lexical meanings and inflectional functions andfor forms to map onto such conjoint semantic vectors Butjust asmultiple inflectional functions can be expressed jointlyin a single word multiple words can map onto nonconjointsemantic vectors as was illustrated above for named entityrecognition John Wiggam can be understood to refer to avery specific person without this person being a composi-tional function of the proper name John and the surnameWiggam In both these cases there is no isomorphy betweeninflectional functions and lexical meanings on the one handand chunks of form on the other hand

Of course conjoint vectors for lexical meanings andinflectional functions can be implemented andmade to workonly because a Latin verb form such as sapıvissemus isunderstood when building the semantic space S to expresslexomes for person (first) number (plural) voice (active)

tense (pluperfect) mood (subjunctive) and lexical meaning(to know) (see [85] for computational modeling details)Although this could loosely be described as a decomposi-tional approach to inflected words we prefer not to use theterm lsquodecompositionalrsquo as in formal linguistics this termas outlined above has a very different meaning A moreadequate terminology would be that the system is conjointrealizational for production and conjoint inferential for com-prehension (we note here that we have focused on temporallyconjoint realization and inference leaving temporally disjointrealization and inference for sequences of words and possiblycompounds and fixed expression for further research)

Naive versus linear discriminative learning There isone important technical difference between the learningengine of the current study ldl and naive discriminativelearning (ndl) Because ndl makes the simplifying assump-tion that outcomes are orthogonal the weights from the cuesto a given outcome are independent of the weights from thecues to the other outcomes This independence assumptionwhich motivates the word naive in naive discriminativelearning makes it possible to very efficiently update networkweights during incremental learning In ldl by contrast thisindependence no longer holds learning is no longer naiveWe therefore refer to our networks not as ndl networksbut simply as linear discriminative learning networks Actualincremental updating of ldl networks is computationallymore costly but further numerical optimization is possibleand implementations are underway

Scaling An important property of the present approachis that good results are obtained already for datasets ofmodestsize Our semantic matrix is derived from the tasa corpuswhich with only 10 million words is dwarfed by the 28 billionwords of the ukWaC corpus used by Marelli and Baroni [60]more words than any individual speaker can ever encounterin their lifetime Similarly Arnold et al [18] report goodresults for word identification when models are trained on20 hours of speech which contrasts with the huge volumesof speech required for training deep learning networks forspeech recognition Although performance increases some-whatwithmore training data [161] it is clear that considerableheadway can be made with relatively moderate volumesof speech It thus becomes feasible to train models onfor instance Australian English and New Zealand Englishusing already existing speech corpora and to implementnetworks that make precise quantitative predictions for howunderstanding is mediated by listenersrsquo expectations abouttheir interlocutors (see eg [162])

In this study we have obtained good results with asemantic matrix with a dimensionality of around 4000lexomes By itself we find it surprising that a semantic vectorspace can be scaffolded with a mere 4000 words But thisfindingmay also shed light on the phenomenon of childhoodamnesia [163 164] Bauer and Larkina [165] pointed outthat young children have autobiographical memories thatthey may not remember a few years later They argue thatthe rate of forgetting diminishes as we grow older andstabilizes in adulthood Part of this process of forgettingmay relate to the changes in the semantic matrix Cruciallywe expect the correlational structure of lexomes to change

30 Complexity

considerably over time as experience accumulates If this isindeed the case the semantic vectors for the lexomes foryoung children are different from the corresponding lexomesfor older children which will again differ from those ofadults As a consequence autobiographical memories thatwere anchored to the lexomes at a young age can no longerbe accessed as the semantic vectors to which these memorieswere originally anchored no longer exist

No examplars Given how we estimate the transforma-tion matrices with which we navigate between form andmeaning it might seem that our approach is in essence animplementation of an exemplar-based theory of the mentallexicon After all for instance the C119886 matrix specifies foreach auditory word form (exemplar) its pertinent frequencyband summary features However the computational imple-mentation of the present study should not be confused withthe underlying algorithmic conceptualization The underly-ing conceptualization is that learning proceeds incrementallytrial by trial with the weights of transformation networksbeing updated with the learning rule of Widrow and Hoff[44] In the mean trial-by-trial updating with the Widrow-Hoff learning rule results in the same expected connectionweights as obtained by the present estimates using thegeneralized inverse The important conceptual advantage ofincremental learning is however that there is no need to storeexemplars By way of example for auditory comprehensionacoustic tokens are given with the input to the learningsystemThese tokens the ephemeral result of interactingwiththe environment leave their traces in the transformationmatrix F119886 but are not themselves stored The same holdsfor speech production We assume that speakers dynamicallyconstruct the semantic vectors from their past experiencesrather than retrieving these semantic vectors from somedictionary-like fixed representation familiar from currentfile systems The dynamic nature of these semantic vectorsemerged loud and clear from the modeling of novel complexwords for speech production under cross-validation In otherwords our hypothesis is that all input and output vectors areephemeral in the sense that they represent temporary statesof the cognitive system that are not themselves represented inthat system but that leave their traces in the transformationmatrices that jointly characterize and define the cognitivesystem that we approximate with the discriminative lexicon

This dynamic perspective on the mental lexicon asconsisting of several coupled networks that jointly bringthe cognitive system into states that in turn feed into fur-ther networks (not modeled here) for sentence integration(comprehension) and articulation (production) raises thequestion about the status of units in this theory Units playan important role when constructing numeric vectors forform and meaning Units for letter trigrams and phonetrigrams are of course context-enriched letter and phoneunits Content lexomes as well as lexomes for derivationaland inflectional functions are crutches for central semanticfunctions ranging from functions realizing relatively concreteonomasiological functions (lsquothis is a dog not a catrsquo) toabstract situational functions allowing entities events andstates to be specified on the dimensions of time spacequantity aspect etc However crucial to the central thrust of

the present study there are no morphological symbols (unitscombining form andmeaning) normorphological formunitssuch as stems and exponents of any kind in the modelAbove we have outlined the many linguistic considerationsthat have led us not to want to incorporate such units aspart of our theory Importantly as there are no hidden layersin our model it is not possible to seek for lsquosubsymbolicrsquoreflexes of morphological units in patterns of activation overhidden layers Here we depart from earlier connectionistmodels which rejected symbolic representations but retainedthe belief that there should be morphological representationalbeit subsymbolic ones Seidenberg and Gonnerman [166]for instance discuss morphological representations as beinglsquointerlevel representationsrsquo that are emergent reflections ofcorrelations among orthography phonology and semanticsThis is not to say that themore agglutinative a language is themore the present model will seem to the outside observer tobe operating withmorphemes But themore a language tendstowards fusional or polysyntheticmorphology the less strongthis impression will be

Model complexity Next consider the question of howdifficult lexical processing actually is Sidestepping the com-plexities of the neural embedding of lexical processing [126147] here we narrow down this question to algorithmiccomplexity State-of-the-art models in psychology [37 114167] implementmany layers of hierarchically organized unitsand many hold it for an established fact that such units arein some sense psychologically real (see eg [19 91 168])However empirical results can be equally well understoodwithout requiring the theoretical construct of the morpheme(see eg [47 54 58 166 169 170]) The present studyillustrates this point for lsquoboundaryrsquo effects in written andoral production When paths in the triphone graph branchuncertainty increases and processing slows down Althoughobserved lsquoboundaryrsquo effects are compatible with a post-Bloomfieldian lexicon the very same effects arise in ourmodel even though morphemes do not figure in any way inour computations

Importantly the simplicity of the linear mappings in ourmodel of the discriminative lexicon allows us to sidestep theproblem that typically arise in computationally implementedfull-scale hierarchical systems namely that errors arisingat lower levels propagate to higher levels The major stepsforward made in recent end-to-end models in languageengineering suggest that end-to-end cognitive models arealso likely to perform much better One might argue thatthe hidden layers of deep learning networks represent thetraditional lsquohidden layersrsquo of phonemes morphemes andword forms mediating between form and meaning in lin-guistic models and offshoots thereof in psychology Howeverthe case of baboon lexical learning [12 13] illustrates thatthis research strategy is not without risk and that simplerdiscriminative networks can substantially outperform deeplearning networks [15] We grant however that it is conceiv-able that model predictions will improve when the presentlinear networks are replaced by deep learning networks whenpresented with the same input and output representationsused here (see Zhao et al [171] for a discussion of lossfunctions for deep learning targeting numeric instead of

Complexity 31

categorical output vectors) What is unclear is whethersuch networks will improve our understandingmdashthe currentlinear networks offer the analyst connection weights betweeninput and output representations that are straightforwardlylinguistically interpretable Where we think deep learningnetworks really will come into their own is bridging the gapbetween for instance the cochlea and retina on the one handand the heuristic representations (letter trigrams) that wehave used as a starting point for our functional theory of themappings between form and meaning

Network flexibility An important challenge for deeplearning networks designed to model human lexical process-ing is to keep the networks flexible and open to rapid updat-ing This openness to continuous learning is required notonly by findings on categorization [40 41 41 42] includingphonetic categorization [172 173] but is also visible in timeseries of reaction times Baayen and Hendrix [17] reportedimproved prediction of visual lexical decision reaction timesin the British Lexicon Project when a naive discriminativelearning network is updated from trial to trial simulating thewithin-experiment learning that goes on as subjects proceedthrough the experiment In natural speech the phenomenonof phonetic convergence between interlocutors [174] likewisebears witness to a lexicon that is constantly recalibratedin interaction with its environment (for the many ways inwhich interlocutorsrsquo speech aligns see [175]) Likewise therapidity with which listeners adapt to foreign accents [176]within a few minutes shows that lexical networks exhibitfast local optimization As deep learning networks typicallyrequire huge numbers of cycles through the training dataonce trained they are at risk of not having the requiredflexibility for fast local adaptation This risk is reduced for(incremental) wide learning as illustrated by the phoneticconvergence evidenced by the model of Arnold et al [18] forauditory comprehension (a very similar point was made byLevelt [177] in his critique of connectionist models in thenineties)

Open questions This initial study necessarily leavesmany questions unanswered Compounds and inflectedderived words have not been addressed and morphologicaloperations such as reduplication infixation and nonconcate-native morphology provide further testing grounds for thepresent approach We also need to address the processingof compounds with multiple constituents ubiquitous inlanguages as different as German Mandarin and EstonianFurthermore in actual utterances words undergo assimila-tion at their boundaries and hence a crucial test case forthe discriminative lexicon is to model the production andcomprehension of words in multi-word utterances For ourproduction model we have also completely ignored stresssyllabification and pitch which however likely has renderedthe production task more difficult than necessary

A challenge for discriminative morphology is the devel-opment of proper semantic matrices For languages withrich morphology such as Estonian current morphologicalparsers will identify case-inflected words as nominativesgenitives or partitives (among others) but these labelsfor inflectional forms do not correspond to the semanticfunctions encoded in these forms (see eg [178 179]) What

is required are computational tools that detect the appropriateinflectional lexomes for these words in the sentences orutterances in which they are embedded

A related problem specifically for the speech productionmodel is how to predict the strongly reduced word formsthat are rampant in conversational speech [111] Our auditorycomprehension network F119886 is confronted with reduced formsin training and the study of Arnold et al [18] indicates thatpresent identification accuracymay approximate human per-formance for single-word recognition Whereas for auditorycomprehension we are making progress what is required forspeech production is a much more detailed understanding ofthe circumstances under which speakers actually use reducedword forms The factors driving reduction may be in partcaptured by the semantic vectors but we anticipate thataspects of the unfolding discourse play an important role aswell which brings us back to the unresolved question of howto best model not isolated words but words in context

An important challenge for our theory which formalizesincremental implicit learning is how to address and modelthe wealth of explicit knowledge that speakers have abouttheir language We play with the way words sound in poetrywe are perplexed by wordsrsquo meanings and reinterpret themso that they make more sense (eg the folk etymology [180]reflected in modern English crawfish a crustacean and nota fish originally Middle English crevis compare ecrevissein French) we teach morphology in schools and we evenfind that making morphological relations explicit may bebeneficial for aphasic patients [181] The rich culture of worduse cannot but interact with the implicit system but howexactly this interaction unfolds andwhat its consequences areat both sides of the consciousunconscious divide is presentlyunclear

Although many questions remain our model is remark-ably successful in capturing many aspects of lexical andmorphological processing in both comprehension and pro-duction Since themodel builds on a combination of linguisti-callymotivated lsquosmartrsquo low-level representations for form andmeaning and large but otherwise straightforward andmathe-matically well-understood two-layer linear networks (to beclear networks without any hidden layers) the conclusionseems justified that algorithmically the ldquomental lexiconrdquomaybe much simpler than previously thought

Appendix

A Vectors Matrices and MatrixMultiplication

Figure 12 presents two data points 119886 and 119887 with coordinates(3 4) and (-2 -3) Arrows drawn from the origin to thesepoints are shown in blue with a solid and dashed linerespectively We refer to these points as vectors Vectors aredenoted with lower case letters in bold and we place the 119909coordinate of a point next to the 119910 coordinate

a = (3 4) b = (minus2 minus3) (A1)

32 Complexity

minus4 minus2 0 2 4

minus4

minus2

02

4

x

y

a

b

x

y

Figure 12 Points 119886 and 119887 can be transformed into points 119901 and 119902 bya linear transformation

We can represent a and b jointly by placing them next to eachother in a matrix For matrices we use capital letters in boldfont

A = ( 3 4minus2 minus3) (A2)

In addition to the points 119886 and 119887 Figure 12 also shows twoother datapoints 119909 and 119910 The matrix for these two points is

B = (minus5 24 minus1) (A3)

Let us assume that points 119886 and 119887 represent the forms oftwo words 1205961 and 1205962 and that the points 119909 and 119910 representthe meanings of these two words (below we discuss in detailhow exactly wordsrsquo forms and meanings can be representedas vectors of numbers) As morphology is the study of therelation between wordsrsquo forms and their meanings we areinterested in how to transform points 119886 and b into points119909 and 119910 and likewise in how to transform points 119909 and119910 back into points 119886 and 119887 The first transformation isrequired for comprehension and the second transformationfor production

We begin with the transformation for comprehensionFormally we have matrix A which we want to transforminto matrix B A way of mapping A onto B is by means ofa linear transformation using matrix multiplication For 2 times2 matrices matrix multiplication is defined as follows(1198861 11988621198871 1198872) (1199091 11990921199101 1199102) = (11988611199091 + 11988621199101 11988611199092 + 1198862119910211988711199091 + 11988721199101 11988711199092 + 11988721199102) (A4)

Such a mapping of A onto B is given by a matrix F

F = ( 1 2minus2 minus1) (A5)

and it is straightforward to verify that indeed( 3 4minus2 minus3) ( 1 2minus2 minus1) = (minus5 24 minus1) (A6)

Using matrix notation we can write

AF = B (A7)

Given A and B how do we obtain F The answeris straightforward but we need two further concepts theidentity matrix and the matrix inverse For multiplication ofnumbers the identity multiplication is multiplication with 1For matrices multiplication with the identity matrix I leavesthe locations of the points unchangedThe identitymatrix hasones on the main diagonal and zeros elsewhere It is easilyverified that indeed(1 00 1) (1199091 11990921199101 1199102) = (1199091 11990921199101 1199102) (1 00 1) = (1199091 11990921199101 1199102) (A8)

For numbers the inverse of multiplication with 119904 is dividingby 119904 (1119904 ) (119904119909) = 119909 (A9)

For matrices the inverse of a square matrix X is that matrixXminus1 such that their product is the identity matrix

Xminus1X = XXminus1 = I (A10)

For nonsquare matrices the inverse Yminus1 is defined such that

Yminus1 (YX) = (XY) 119884minus1 = X (A11)

We find the square matrix F that maps the square matrices Aonto B as follows

AF = B

Aminus1AF = Aminus1B

IF = Aminus1B

F = Aminus1B(A12)

Since for the present example A and its inverse happen to beidentical (Aminus1 = A) we obtain (A6)

Fmaps wordsrsquo form vectors onto wordsrsquo semantic vectorsLet us now consider the reverse and see how we can obtainwordsrsquo form vectors from wordsrsquo semantic vectors That isgiven B we want to find that matrix G which maps B ontoA ie

BG = A (A13)

Solving this equation proceeds exactly as above

BG = A

Bminus1BG = Bminus1A

IG = Bminus1A

G = Bminus1A(A14)

Complexity 33

The inverse of B is

Bminus1 = (13 2343 53) (A15)

and hence we have

(13 2343 53) ( 3 4minus2 minus3) = (minus13 minus2323 13 ) = G (A16)

The inverse of a matrix needs not exist A square matrixthat does not have an inverse is referred to as a singularmatrix Almost all matrices with which we work in theremainder of this study are singular For singular matricesan approximation of the inverse can be used such as theMoore-Penrose generalized inverse (In R the generalizedinverse is available in theMASS package function ginv Thenumeric library that is used by ginv is LAPACK availableat httpwwwnetliborglapack) In this study wedenote the generalized inverse of a matrix X by X1015840

The examples presented here are restricted to 2 times 2matrices but themathematics generalize tomatrices of largerdimensions When multiplying two matrices X and Y theonly constraint is that the number of columns ofX is the sameas the number of rows ofYThe resultingmatrix has the samenumber of rows as X and the same number of columns as Y

In the present study the rows of matrices represent wordsand columns the features of thesewords For instance the rowvectors of A can represent wordsrsquo form features and the rowvectors of and B the corresponding semantic features Thefirst row vector of A is mapped onto the first row vector ofB as follows (3 4) ( 1 2minus2 minus1) = (minus5 2) (A17)

A transformationmatrix such as F can be represented as atwo-layer networkThe network corresponding to F is shownin Figure 13 When the input vector (3 4) is presented to thenetwork the nodes 1198941 and 1198942 are set to 3 and 4 respectively Toobtain the activations 1199001 = minus5 and 1199002 = 2 for the output vector(minus5 2) we cumulate the evidence at the input weighted bythe connection strengths specified on the edges of the graphThus for 1199001 we obtain the value 3 times 1 + 4 times minus2 = minus5 and for1199002 we have 3 times 2 + 4 times minus1 = 2 In other words the networkmaps the input vector (3 4) onto the output vector (minus5 2)

An important property of linear maps is that they areproductive We can present a novel form vector say

s = (2 2) (A18)

to the network and it will map this vector onto a novelsemantic vector Using matrices this new semantic vector isstraightforwardly calculated(2 2) ( 1 2minus2 minus1) = (minus2 2) (A19)

1 2 minus2 minus1

i1 i2

o1 o2

semantic vectors come out here

form features go in here

connection weightsare the transformationmatrix

Figure 13 The linear network corresponding to the transformationmatrix F There are two input nodes 1198941 and 1198942 (in blue)and twooutput nodes 1199001 and 1199002 in red When the input vector (3 4) ispresented to the network the value at 1199001 is 3 times 1 + 4 times minus2 = minus5and that at 1199002 is 3 times 2 + 4 times minus1 = 2 The network maps the point(3 4) onto the point (minus5 2)B Graph-Based Triphone Sequencing

The hypothesis underlying graph-based triphone sequencingis that in the directed graph that has triphones as verticesand an edge between any two vertices that overlap (iesegments 2 and 3 of triphone 1 are identical to segments 1and 2 of triphone 2) the path from a wordrsquos initial triphone(the triphone starting with ) to a wordrsquos final triphone (thetriphone ending with a ) is the path receiving the bestsupport of all possible paths from the initial to the finaltriphone Unfortunately the directed graph containing allvertices and edges is too large tomake computations tractablein a reasonable amount of computation time The followingalgorithm is a heuristic algorithm that first collects potentiallyrelevant vertices and edges and then calculates all pathsstarting fromany initial vertex using theall simple pathsfunction from the igraph package [127] As a first step allvertices that are supported by the stem and whole wordwith network support exceeding 01 are selected To theresulting set those vertices are added that are supported bythe affix which is conditional on these vertices having anetwork support exceeding 095 A directed graph can nowbe constructed and for each initial vertex all paths startingat these vertices are calculated The subset of those pathsreaching a final vertex is selected and the support for eachpath is calculated As longer paths trivially will have greatersupport the support for a path is weighted for its lengthsimply by dividing the raw support by the path length Pathsare ranked by weighted path support and the path withmaximal support is selected as the acoustic image drivingarticulation

It is possible that no path from an initial vertex to a finalvertex is found due to critical boundary triphones not beinginstantiated in the data on which the model was trainedThis happens when the model is assessed on novel inflected

34 Complexity

nobo

dyso

met

hing

ever

ythi

ngno

thin

gev

eryo

nean

ythi

ngan

yone

som

eone

hers

him his

via

than I

exce

ptou

tside

thei

rsw

hoev

erth

ence

min

ein

side

amon

gst its

thei

rou

rm

yyo

uyo

ur her

she

your

selv

esw

hate

ver us we it

they he

them m

em

yself

him

self

hers

elfou

rselv

esyo

urse

lfits

elfth

emse

lves

abou

tw

ithou

tw

ithin

beyo

ndam

ong

betw

een

thro

ugho

utdu

ring

since by

from in of

with at fo

r tobe

low

abov

e up offdo

wn

upon on

agai

nst

tow

ard

tow

ards

arou

ndne

aral

ong

unde

rbe

hind

besid

eth

roug

hon

toin

toov

erac

ross

nobodysomethingeverythingnothingeveryoneanythinganyonesomeonehershimhisviathanIexceptoutsidetheirswhoeverthencemineinsideamongstitstheirourmyyouyourhersheyourselveswhateverusweittheyhethemmemyselfhimselfherselfourselvesyourselfitselfthemselvesaboutwithoutwithinbeyondamongbetweenthroughoutduringsincebyfrominofwithatfortobelowaboveupoffdownupononagainsttowardtowardsaroundnearalongunderbehindbesidethroughontointooveracross

minus01 0 01 02Value

Color Keyand Histogram

050

015

0025

00C

ount

Figure 14 Heatmap for the correlations of the semantic vectors of pronouns and prepositions Note that quantifier pronouns form a groupof their own that prepositions and the other pronouns form distinguisable groups and that within groups further subclusters are visible egforwe and us she and her and across and over All diagonal elements represent correlations equal to 1 (this is not brought out by the heatmapwhich color-codes diagonal elements with the color for the highest correlation of the range we specified 025)

and derived words under cross-validationTherefore verticesand edges are added to the graph for all nonfinal verticesthat are at the end of any of the paths starting at any initialtriphone Such novel vertices and edges are assigned zeronetwork support Typically paths from the initial vertex tothe final vertex can now be constructed and path support isevaluated as above

For an algorithm that allows triphone paths to includecycles which may be the case in languages with much richer

morphology than English see Baayen et al [85] and for codethe R packageWpmWithLdl described therein

C A Heatmap for Function Words(Pronouns and Prepositions)

The heatmap for the correlations of the semantic vectors ofpronouns and prepositions is presented in Figure 14

Complexity 35

Data Availability

With the exception of the data from theDistributed Little RedHen Lab all primary data are publicly available from the citedsources

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The authors are indebted to Geoff Hollis Jessie Nixon ChrisWestbury and LuisMienhardt for their constructive feedbackon earlier versions of this manuscript This research wassupported by an ERC advancedGrant (no 742545) to the firstauthor

References

[1] G Frege ldquoBegriffsschrift a formula language modeled uponthat of arithmetic for pure thoughtrdquo in From Frege to GodelA Source Book in Mathematical Logic 1879-1931 pp 1ndash82 1967

[2] B RussellAn Inquiry intoMeaning andTruth Allen andUnwinLondon UK 1942

[3] R Montague ldquoThe proper treatment of quantification in ordi-nary englishrdquo in Approaches to Natural Language pp 221ndash242Springer 1973

[4] N Chomsky Syntactic Structures vol 33 Mouton The Hague1957

[5] W J M Levelt Formal Grammars in Linguistics And Psycholin-guistics Volume 1 An Introduction to the Theory of FormalLanguages and Automata Volume 2 Applications in LinguisticTheory Volume 3 Psycholinguistic Applications John BenjaminsPublishing 2008

[6] D F Kleinschmidt and T Florian Jaeger ldquoRobust speechperception Recognize the familiar generalize to the similar andadapt to the novelrdquo Psychological Review vol 122 no 2 pp 148ndash203 2015

[7] L Breiman ldquoRandom forestsrdquo Machine Learning vol 45 no 1pp 5ndash32 2001

[8] C Strobl A-L Boulesteix A Zeileis and T Hothorn ldquoBiasin random forest variable importance measures illustrationssources and a solutionrdquo BMC Bioinformatics vol 8 2007

[9] A Hannun C Case J Casper et al ldquoDeep speechScaling up end-to-end speech recognitionrdquo 2014httpsarxivorgabs14125567

[10] Z Tuske P Golik R Schluter and H Ney ldquoAcoustic modelingwith deep neural networks using raw time signal for lvcsrrdquo inProceedings of the INTERSPEECH pp 890ndash894 2014

[11] J Schmidhuber ldquoDeep learning in neural networks anoverviewrdquo Neural Networks vol 61 pp 85ndash117 2015

[12] T Hannagan J C Ziegler S Dufau J Fagot and J GraingerldquoDeep learning of orthographic representations in baboonsrdquoPLoS ONE vol 9 no 1 2014

[13] J Grainger S Dufau M Montant J C Ziegler and J FagotldquoOrthographic processing in baboons (Papio papio)rdquo Sciencevol 336 no 6078 pp 245ndash248 2012

[14] D Scarf K Boy A U Reinert J Devine O Gunturkun andM Colombo ldquoOrthographic processing in pigeons (columba

livia)rdquo Proceedings of the National Academy of Sciences vol 113no 40 pp 11272ndash11276 2016

[15] M Linke F Broker M Ramscar and R H Baayen ldquoArebaboons learning ldquoorthographicrdquo representations Probablynotrdquo PLoS ONE vol 12 no 8 2017

[16] D H Hubel and T N Wiesel ldquoReceptive fields binocularinteraction and functional architecture in the catrsquos visualcortexrdquo The Journal of Physiology vol 160 pp 106ndash154 1962

[17] R H Baayen and P Hendrix ldquoTwo-layer networks non-linear separation and human learningrdquo in From Semantics toDialectometry Festschrift in honor of JohnNerbonne Tributes 32M Wieling M Kroon G Van Noord and G Bouma Eds pp13ndash22 College Publications 2017

[18] D Arnold F Tomaschek K Sering F Lopez and R H BaayenldquoWords from spontaneous conversational speech can be rec-ognized with human-like accuracy by an error-driven learningalgorithm that discriminates between meanings straight fromsmart acoustic features bypassing the phoneme as recognitionunitrdquo PLoS ONE vol 12 no 4 2017

[19] P Zwitserlood ldquoProcessing and representation of morphologi-cal complexity in native language comprehension and produc-tionrdquo in The Construction of Words G E Booij Ed pp 583ndash602 Springer 2018

[20] G Booij ldquoConstruction morphologyrdquo Language and LinguisticsCompass vol 4 no 7 pp 543ndash555 2010

[21] J P Blevins ldquoWord-based morphologyrdquo Journal of Linguisticsvol 42 no 3 pp 531ndash573 2006

[22] G Stump Inflectional Morphology A Theory of ParadigmStructure Cambridge University Press 2001

[23] R Beard ldquoOn the extent and nature of irregularity in thelexiconrdquo Lingua vol 42 no 4 pp 305ndash341 1977

[24] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1974

[25] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1991

[26] C Hockett ldquoThe origin of speechrdquo Scientific American vol 203pp 89ndash97 1960

[27] M Erelt Estonian Language M Erelt Ed Estonian AcademyPublishers Tallinn Estonia 2003

[28] P H Matthews Grammatical Theory in the United States fromBloomfield to Chomsky Cambridge University Press Cam-bridge 1993

[29] N Chomsky andMHalleTheSound Pattern of English Harperand Row NY USA 1968

[30] T Landauer and S Dumais ldquoA solution to Platorsquos problem Thelatent semantic analysis theory of acquisition induction andrepresentation of knowledgerdquo Psychological Review vol 104 no2 pp 211ndash240 1997

[31] W Weaver ldquoTranslationrdquo in Machine Translation of LanguagesFourteen Essays W N Locke and A D Booth Eds pp 15ndash23MIT Press Cambridge 1955

[32] J R Firth Selected Papers of J R Firth 1952-59 IndianaUniversity Press 1968

[33] K Lund and C Burgess ldquoProducing high-dimensional seman-tic spaces from lexical co-occurrencerdquo Behavior Research Meth-ods Instruments and Computers vol 28 no 2 pp 203ndash2081996

[34] C Shaoul and C Westbury ldquoExploring lexical co-occurrencespace using HiDExrdquo Behavior Research Methods vol 42 no 2pp 393ndash413 2010

36 Complexity

[35] T Mikolov I Sutskever K Chen G Corrado and J DeanldquoDistributed representations of words and phrases and theircompositionalityrdquo inAdvances in Neural Information ProcessingSystems pp 3111ndash3119 2013

[36] A Roelofs ldquoThe WEAVER model of word-form encoding inspeech productionrdquo Cognition vol 64 no 3 pp 249ndash284 1997

[37] W J M Levelt A Roelofs and A S Meyer ldquoA theory of lexicalaccess in speech productionrdquoBehavioral and Brain Sciences vol22 no 1 pp 1ndash38 1999

[38] M Taft ldquoInteractive-activation as a Framework for Under-standing Morphological Processingrdquo Language and CognitiveProcesses vol 9 no 3 pp 271ndash294 1994

[39] R Schreuder and R H Baayen ldquoModeling morphologicalprocessingrdquo inMorphological Aspects of Language Processing LB Feldman Ed pp 131ndash154 Lawrence ErlbaumHillsdale NewJersey USA 1995

[40] B C Love D L Medin and T M Gureckis ldquoSUSTAIN ANetwork Model of Category Learningrdquo Psychological Reviewvol 111 no 2 pp 309ndash332 2004

[41] C J Marsolek ldquoWhat antipriming reveals about primingrdquoTrends in Cognitive Sciences vol 12 no 5 pp 176ndash181 2008

[42] M Ramscar and R Port ldquoCategorization (without categories)rdquoin Handbook of Cognitive Linguistics E Dabrowska and DDivjak Eds pp 75ndash99 De Gruyter Berlin Germany 2015

[43] R A Rescorla and A R Wagner ldquoA theory of Pavlovianconditioning Variations in the effectiveness of reinforcementand nonreinforcementrdquo in Classical Conditioning II CurrentResearch and Theory A H Black and W F Prokasy Eds pp64ndash99 Appleton Century Crofts NY USA 1972

[44] B Widrow and M E Hoff ldquoAdaptive switching circuitsrdquo 1960WESCON Convention Record Part IV pp 96ndash104 1960

[45] R H Baayen R Schreuder and R Sproat ldquoMorphology inthe Mental Lexicon A Computational Model for Visual WordRecognitionrdquo in Lexicon Development for Speech and LanguageProcessing F van Eynde and D Gibbon Eds pp 267ndash291Kluwer Academic Publishers 2000

[46] M Taft ldquoA morphological-decomposition model of lexicalrepresentationrdquo Linguistics vol 26 no 4 pp 657ndash667 1988

[47] MWHarmandM S Seidenberg ldquoComputing themeanings ofwords in reading Cooperative division of labor between visualand phonological processesrdquo Psychological Review vol 111 no3 pp 662ndash720 2004

[48] P C Trimmer J M McNamara A Houston and J A RMarshall ldquoDoes natural selection favour the Rescorla-Wagnerrulerdquo Journal of Theoretical Biology vol 302 pp 39ndash52 2012

[49] R R Miller R C Barnet and N J Grahame ldquoAssessment ofthe Rescorla-Wagner modelrdquo Psychological Bulletin vol 117 no3 pp 363ndash386 1995

[50] L J Kamin ldquoPredictability surprise attention and condition-ingrdquo in Punishment and Aversive Behavior B A Campbell andRMChurch Eds pp 276ndash296Appleton-Century-CroftsNYUSA 1969

[51] R A Rescorla ldquoPavlovian conditioning Itrsquos not what you thinkit isrdquo American Psychologist vol 43 no 3 pp 151ndash160 1988

[52] M Ramscar D Yarlett M Dye K Denny and K ThorpeldquoThe Effects of Feature-Label-Order and Their Implications forSymbolic Learningrdquo Cognitive Science vol 34 no 6 pp 909ndash957 2010

[53] M Ramscar and D Yarlett ldquoLinguistic self-correction in theabsence of feedback A new approach to the logical problem oflanguage acquisitionrdquo Cognitive Science vol 31 no 6 pp 927ndash960 2007

[54] R H Baayen P Milin D F ETHurđevic P Hendrix and MMarelli ldquoAn Amorphous Model for Morphological Process-ing in Visual Comprehension Based on Naive DiscriminativeLearningrdquo Psychological Review vol 118 no 3 pp 438ndash4812011

[55] F M Del Prado Martın R Bertram T Haikio R Schreuderand R H Baayen ldquoMorphological family size in a morphologi-cally rich languageThe case of Finnish compared to Dutch andHebrewrdquo Journal of Experimental Psychology LearningMemoryand Cognition vol 30 pp 1271ndash1278 2004

[56] P Milin D F Durdevic and F M Del Prado Martın ldquoThesimultaneous effects of inflectional paradigms and classes onlexical recognition Evidence from Serbianrdquo Journal of Memoryand Language vol 60 no 1 pp 50ndash64 2009

[57] B K Bergen ldquoThe psychological reality of phonaesthemesrdquoLanguage vol 80 no 2 pp 290ndash311 2004

[58] P Milin L B Feldman M Ramscar P Hendrix and R HBaayen ldquoDiscrimination in lexical decisionrdquo PLoS ONE vol 12no 2 2017

[59] R H Baayen P Milin and M Ramscar ldquoFrequency in lexicalprocessingrdquo Aphasiology vol 30 no 11 pp 1174ndash1220 2016

[60] M Marelli and M Baroni ldquoAffixation in semantic space Mod-eling morpheme meanings with compositional distributionalsemanticsrdquo Psychological Review vol 122 no 3 pp 485ndash5152015

[61] T Sering PMilin andRH Baayen ldquoLanguage comprehensionas amultiple label classification problemrdquo StatisticaNeerlandicapp 1ndash15 2018

[62] R Kaye and RWilson Linear Algebra Oxford University Press1998

[63] S H Ivens and B L Koslin Demands for Reading LiteracyRequire New Accountability Methods Touchstone Applied Sci-ence Associates 1991

[64] T K Landauer P W Foltz and D Laham ldquoAn introduction tolatent semantic analysisrdquoDiscourse Processes vol 25 no 2-3 pp259ndash284 1998

[65] H Schmid ldquoImprovements in Part-of-Speech Tagging with anApplication to Germanrdquo in Proceedings of the ACL SIGDAT-Workshop pp 13ndash25 Dublin Ireland 1995

[66] R H Baayen R Piepenbrock and L Gulikers The CELEXLexical Database (CD-ROM) Linguistic Data ConsortiumUniversity of Pennsylvania Philadelphia PA 1995

[67] J Mitchell and M Lapata ldquoVector-based models of semanticcompositionrdquo in Proceedings of the ACL pp 236ndash244 2008

[68] A Lazaridou M Marelli R Zamparelli and M BaronildquoCompositionally derived representations of morphologicallycomplex words in distributional semanticsrdquo in Proceedings ofthe ACL (1) pp 1517ndash1526 2013

[69] R Cotterell H Schutze and J Eisner ldquoMorphological smooth-ing and extrapolation of word embeddingsrdquo in Proceedings ofthe 54th Annual Meeting of the Association for ComputationalLinguistics vol 1 pp 1651ndash1660 2016

[70] B D Zeller S Pado and J Snajder ldquoTowards semantic valida-tion of a derivational lexiconrdquo in Proceedings of the COLING2014 the 25th International Conference on Computational Lin-guistics Technical Papers pp 1728ndash1739 2014

[71] H Marchand The Categories and Types of Present-Day EnglishWord Formation A Synchronic-Diachronic Approach BeckrsquoscheVerlagsbuchhandlung Munchen Germany 1969

[72] L Bauer EnglishWord-Formation Cambridge University PressCambridge 1983

Complexity 37

[73] I PlagWord-Formation in English CambridgeUniversity PressCambridge 2003

[74] R Beard Lexeme-Morpheme Base Morphology A GeneralTheory of Inflection and Word Formation State University ofNew York Press Albany NY USA 1995

[75] K Geeraert J Newman and R H Baayen ldquoIdiom VariationExperimental Data and a Blueprint of a Computational ModelrdquoTopics in Cognitive Science vol 9 no 3 pp 653ndash669 2017

[76] C Shaoul S Bitschau N Schilling et al ldquondl2 Naive discrim-inative learning an implementation in R R packagerdquo 2015

[77] P Milin D Divjak and R H Baayen ldquoA learning perspectiveon individual differences in skilled reading Exploring andexploiting orthographic and semantic discrimination cuesrdquoJournal of Experimental Psychology Learning Memory andCognition vol 43 no 11 pp 1730ndash1751 2017

[78] M Ramscar P Hendrix C Shaoul P Milin and R H BaayenldquoNonlinear dynamics of lifelong learning the myth of cognitivedeclinerdquo Topics in Cognitive Science vol 6 pp 5ndash42 2014

[79] M Ramscar C C Sun P Hendrix and R H Baayen ldquoTheMis-measurement of Mind Life-Span Changes in Paired-Associate-Learning Scores Reflect the ldquoCostrdquo of Learning Not CognitiveDeclinerdquo Psychological Science vol 28 no 8 pp 1171ndash1179 2017

[80] G desRosiers and D Ivison ldquoPaired associate learning Form 1and form 2 of the Wechsler memory scalerdquo Archives of ClinicalNeuropsychology vol 3 no 1 pp 47ndash67 1988

[81] E Bruni N K Tran andM Baroni ldquoMultimodal distributionalsemanticsrdquo Journal of Artificial Intelligence Research vol 49 pp1ndash47 2014

[82] W N Venables and B D RipleyModern Applied Statistics withS-PLUS Springer NY USA 2002

[83] A B Warriner V Kuperman and M Brysbaert ldquoNorms ofvalence arousal and dominance for 13915 English lemmasrdquoBehavior Research Methods vol 45 no 4 pp 1191ndash1207 2013

[84] D Kastovsky ldquoProductivity in word formationrdquo Linguistics vol24 no 3 pp 585ndash600 1986

[85] R H Baayen Y Chuang and J P Blevins ldquoInflectionalmorphology with linear mappingsrdquoTheMental Lexicon vol 13no 2 pp 232ndash270 2018

[86] Burnage CELEX A Guide for Users Centre for Lexical Infor-mation Nijmegen The Netherlands 1988

[87] C McBride-Chang ldquoModels of Speech Perception and Phono-logical Processing in Readingrdquo Child Development vol 67 no4 pp 1836ndash1856 1996

[88] G C Van Orden H Kloos et al ldquoThe question of phonologyand readingrdquo inThe Science of Reading A Handbook pp 61ndash782005

[89] D Jared and K OrsquoDonnell ldquoSkilled adult readers activate themeanings of high-frequency words using phonology Evidencefrom eye trackingrdquoMemoryampCognition vol 45 no 2 pp 334ndash346 2017

[90] G S Dell ldquoA Spreading-Activation Theory of Retrieval inSentence Productionrdquo Psychological Review vol 93 no 3 pp283ndash321 1986

[91] A Marantz ldquoNo escape from morphemes in morphologicalprocessingrdquo Language and Cognitive Processes vol 28 no 7 pp905ndash916 2013

[92] M Bozic W D Marslen-Wilson E A Stamatakis M HDavis and L K Tyler ldquoDifferentiating morphology formand meaning Neural correlates of morphological complexityrdquoCognitive Neuroscience vol 19 no 9 pp 1464ndash1475 2007

[93] R F Port andA P Leary ldquoAgainst formal phonologyrdquoLanguagevol 81 no 4 pp 927ndash964 2005

[94] J P Blevins ldquoStems and paradigmsrdquo Language vol 79 no 4 pp737ndash767 2003

[95] R Forsyth and D Holmes ldquoFeature-finding for test classifica-tionrdquo Literary and Linguistic Computing vol 11 no 4 pp 163ndash174 1996

[96] H Pham and R H Baayen ldquoVietnamese compounds showan anti-frequency effect in visual lexical decisionrdquo LanguageCognition and Neuroscience vol 30 no 9 pp 1077ndash1095 2015

[97] L Cohen and S Dehaene ldquoVentral and dorsal contributions toword readingrdquo inTheCognitive Neurosciences M S GazzanigaEd pp 789ndash804 Massachusetts Institute of Technology 2009

[98] E Keuleers P Lacey K Rastle and M Brysbaert ldquoThe BritishLexicon Project Lexical decision data for 28730 monosyllabicand disyllabic English wordsrdquo Behavior Research Methods vol44 no 1 pp 287ndash304 2012

[99] R K Wagner J K Torgesen and C A Rashotte ldquoDevel-opment of reading-related phonological processing abilitiesnew evidence of bidirectional causality from a latent variablelongitudinal studyrdquoDevelopmental Psychology vol 30 no 1 pp73ndash87 1994

[100] K F EWong andH-C Chen ldquoOrthographic and phonologicalprocessing in reading Chinese text Evidence from eye fixa-tionsrdquo Language and Cognitive Processes vol 14 no 5-6 pp461ndash480 1999

[101] R L Newman D Jared and C A Haigh ldquoDoes phonologyplay a role when skilled readers read high-frequency wordsEvidence from ERPsrdquo Language and Cognitive Processes vol 27no 9 pp 1361ndash1384 2012

[102] D Jared J Ashby S J Agauas and B A Levy ldquoPhonologicalactivation of word meanings in grade 5 readersrdquo Journal ofExperimental Psychology LearningMemory andCognition vol42 no 4 pp 524ndash541 2016

[103] T Bitan A Kaftory A Meiri-Leib Z Eviatar and O PelegldquoPhonological ambiguity modulates resolution of semanticambiguity during reading An fMRI study of Hebrewrdquo Neu-ropsychology vol 31 no 7 pp 759ndash777 2017

[104] D Jared and S Bainbridge ldquoReading homophone puns Evi-dence from eye trackingrdquo Canadian Journal of ExperimentalPsychology vol 71 no 1 pp 2ndash13 2017

[105] S Amenta M Marelli and S Sulpizio ldquoFrom sound tomeaning Phonology-to-Semantics mapping in visual wordrecognitionrdquo Psychonomic Bulletin amp Review vol 24 no 3 pp887ndash893 2017

[106] M Perrone-Bertolotti J Kujala J R Vidal et al ldquoHow silent issilent reading intracerebral evidence for top-down activationof temporal voice areas during readingrdquo The Journal of Neuro-science vol 32 no 49 pp 17554ndash17562 2012

[107] B Yao P Belin andC Scheepers ldquoSilent reading of direct versusindirect speech activates voice-selective areas in the auditorycortexrdquo Journal of Cognitive Neuroscience vol 23 no 10 pp3146ndash3152 2011

[108] M Coltheart B Curtis P Atkins and M Haller ldquoMod-els of Reading Aloud Dual-Route and Parallel-Distributed-Processing Approachesrdquo Psychological Review vol 100 no 4pp 589ndash608 1993

[109] M Coltheart ldquoModeling reading The dual-route approachrdquo inThe Science of Reading A Handbook pp 6ndash23 2005

38 Complexity

[110] M Pitt K Johnson E Hume S Kiesling and W RaymondldquoThe Buckeye corpus of conversational speech Labeling con-ventions and a test of transcriber reliabilityrdquo Speech Communi-cation vol 45 no 1 pp 89ndash95 2005

[111] K Johnson ldquoMassive reduction in conversational AmericanEnglishrdquo in Proceedings of the Spontaneous Speech Data andAnalysis Proceedings of the 1st Session of the 10th InternationalSymposium pp 29ndash54 The National International Institute forJapanese Language Tokyo Japan 2004

[112] C Phillips ldquoLevels of representation in the electrophysiology ofspeech perceptionrdquo Cognitive Science vol 25 no 5 pp 711ndash7312001

[113] R L Diehl A J Lotto and L L Holt ldquoSpeech perceptionrdquoAnnual Review of Psychology vol 55 pp 149ndash179 2004

[114] D Norris and J M McQueen ldquoShortlist B a Bayesian model ofcontinuous speech recognitionrdquo Psychological Review vol 115no 2 pp 357ndash395 2008

[115] S Hawkins ldquoRoles and representations of systematic finephonetic detail in speech understandingrdquo Journal of Phoneticsvol 31 no 3-4 pp 373ndash405 2003

[116] CCucchiarini andH Strik ldquoAutomatic Phonetic TranscriptionAn overviewrdquo in Proceedings of the 15th ICPhS pp 347ndash350Barcelona Spain 2003

[117] K JohnsonTheAuditoryPerceptual Basis for Speech Segmenta-tion Ohio State University Working Papers in Linguistics 1997

[118] H Fletcher ldquoAuditory patternsrdquo Reviews of Modern Physics vol12 no 1 pp 47ndash65 1940

[119] D Arnold ldquoAcousticndlcoder Coding sound files for use withndl R package version 101rdquo 2017

[120] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2016

[121] J Pickett and I Pollack ldquoIntelligibility of excerpts from fluentspeech Effects of rate of utterance and duration of excerptrdquoLanguage and Speech vol 6 pp 151ndash164 1963

[122] L Shockey ldquoPerception of reduced forms by non-native speak-ers of Englishrdquo in Sound Patterns of Spontaneous Speech DDuez Ed pp 97ndash100 ESCA Aix 1998

[123] M Ernestus H Baayen and R Schreuder ldquoThe recognition ofreduced word formsrdquo Brain and Language vol 81 no 1-3 pp162ndash173 2002

[124] R H Baayen C Shaoul J Willits and M Ramscar ldquoCom-prehension without segmentation a proof of concept withnaive discriminative learningrdquo Language Cognition and Neu-roscience vol 31 no 1 pp 106ndash128 2016

[125] C P Browman and L Goldstein ldquoArticulatory phonology Anoverviewrdquo Phonetica vol 49 no 3-4 pp 155ndash180 1992

[126] J A Tourville and F H Guenther ldquoThe DIVA model A neuraltheory of speech acquisition and productionrdquo Language andCognitive Processes vol 26 no 7 pp 952ndash981 2011

[127] G Csardi and T Nepusz ldquoThe igraph software package forcomplex network researchrdquo InterJournal Complex Systems vol1695 no 5 pp 1ndash9 2006

[128] M Ernestus and R H Baayen ldquoPredicting the unpredictableInterpreting neutralized segments in Dutchrdquo Language vol 79no 1 pp 5ndash38 2003

[129] R Weingarten G Nottbusch and U Will ldquoMorphemes sylla-bles and graphemes in written word productionrdquo inMultidisci-plinary Approaches to Speech Production T Pechmann and CHabel Eds pp 529ndash572 Mouton de Gruyter Berlin Germany2004

[130] R Bertram F E Toslashnnessen S Stromqvist J Hyona andP Niemi ldquoCascaded processing in written compound wordproductionrdquo Frontiers in Human Neuroscience vol 9 pp 1ndash102015

[131] T Cho ldquoEffects of morpheme boundaries on intergesturaltiming Evidence from Koreanrdquo Phonetica vol 58 no 3 pp129ndash162 2001

[132] S N Wood Generalized Additive Models Chapman ampHallCRC NY USA 2017

[133] M Seidenberg ldquoSublexical structures in visual word recogni-tion Access units or orthographic redundancyrdquo in Attentionand Performance XII M Coltheart Ed pp 245ndash264 LawrenceErlbaum Associates Hove 1987

[134] J M McQueen ldquoSegmentation of continuous speech usingphonotacticsrdquo Journal of Memory and Language vol 39 no 1pp 21ndash46 1998

[135] J B Hay Causes and Consequences of Word Structure Rout-ledge NY USA London UK 2003

[136] J B Hay ldquoFrom speech perception to morphology Affix-ordering revisitedrdquo Language vol 78 pp 527ndash555 2002

[137] J B Hay and R H Baayen ldquoPhonotactics parsing and produc-tivityrdquo Italian Journal of Linguistics vol 15 no 1 pp 99ndash1302003

[138] J Hay J Pierrehumbert and M Beckman ldquoSpeech perceptionwell-formedness and the statistics of the lexiconrdquo Papers inlaboratory phonology VI pp 58ndash74 2004

[139] M Ferro C Marzi and V Pirrelli ldquoA self-organizing modelof word storage and processing implications for morphologylearningrdquo Lingue e Linguaggio vol 10 no 2 pp 209ndash226 2011

[140] F ChersiM Ferro G Pezzulo andV Pirrelli ldquoTopological self-organization and prediction learning support both action andlexical chains in the brainrdquo Topics in Cognitive Science vol 6no 3 pp 476ndash491 2014

[141] V Pirrelli M Ferro and C Marzi ldquoComputational complexityof abstractive morphologyrdquo in Understanding and MeasuringMorphological Complexity M Bearman D Brown and G GCorbett Eds pp 141ndash166 Oxford University Press 2015

[142] C Marzi M Ferro and V Pirrelli ldquoIs inflectional irregularitydysfunctional to human processingrdquo in Abstract Booklet TheMental Lexicon 2018 V Kuperman Ed University of AlbertaEdmonton 2018

[143] E Keuleers M Stevens P Mandera and M Brysbaert ldquoWordknowledge in the crowd Measuring vocabulary size and wordprevalence in a massive online experimentrdquo The QuarterlyJournal of Experimental Psychology vol 68 no 8 pp 1665ndash16922015

[144] M Ernestus Voice Assimilation And Segment Reduction inCasual Dutch A Corpus-Based Study of the Phonology-PhoneticsInterface LOT Utrecht the Netherlands 2000

[145] R Kemps M Ernestus R Schreuder and H Baayen ldquoProcess-ing reducedword formsThe suffix restoration effectrdquoBrain andLanguage vol 90 no 1-3 pp 117ndash127 2004

[146] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) vol 1 pp 886ndash893 June 2005

[147] G Hickok ldquoThe architecture of speech production and the roleof the phoneme in speech processingrdquo Language Cognition andNeuroscience vol 29 no 1 pp 2ndash20 2014

[148] B V Tucker M Sims and R H Baayen Opposing Forceson Acoustic Duration Manuscript University of Alberta andUniversity of Tubingen 2018

Complexity 39

[149] I Plag J Homann and G Kunter ldquoHomophony and mor-phology The acoustics of word-final S in Englishrdquo Journal ofLinguistics vol 53 no 1 pp 181ndash216 2017

[150] F Tomaschek I Plag M Ernestus and R H BaayenModeling the Duration of Word-Final s in English withNaive Discriminative Learning Manuscript University ofSiegenTubingenNijmegen 2018

[151] R E Kalman ldquoA new approach to linear filtering and predictionproblemsrdquo Journal of Fluids Engineering vol 82 no 1 pp 35ndash451960

[152] B Russell ldquoOn denotingrdquo Mind vol 14 no 56 pp 479ndash4931905

[153] N Hornstein Logical Form FromGB toMinimalism Blackwell1995

[154] R Jackendoff Semantic StructuresMITPress Cambridge 1990[155] J J McCarthy ldquoA prosodic theory of non-concatenative mor-

phologyrdquo Linguistic Inquiry vol 12 pp 373ndash418 1981[156] A Ussishkin ldquoA Fixed Prosodic Theory of Nonconcatenative

Templaticmorphologyrdquo Natural Language amp Linguistic Theoryvol 23 no 1 pp 169ndash218 2005

[157] A Ussishkin ldquoAffix-favored contrast inequity and psycholin-guistic grounding for non-concatenative morphologyrdquo Mor-phology vol 16 no 1 pp 107ndash125 2006

[158] RW Young andWMorganTheNavajo Language A Grammarand Colloquial Dictionary University of New Mexico Press1980

[159] S C Levinson and A Majid ldquoThe island of time Yelı Dnye thelanguage of Rossel islandrdquo Frontiers in psychology vol 4 2013

[160] S C Levinson ldquoThe language of space in Yelı Dnyerdquo inGrammars of Space Explorations in Cognitive Diversity pp 157ndash203 Cambridge University Press 2006

[161] E Shafaei-Bajestan and R H Baayen ldquoWide Learning forAuditory Comprehensionrdquo in Proceedings of the Interspeech2018 2018

[162] J Hay and K Drager ldquoStuffed toys and speech perceptionrdquoLinguistics vol 48 no 4 pp 865ndash892 2010

[163] V Henri and C Henri ldquoOn our earliest recollections ofchildhoodrdquo Psychological Review vol 2 pp 215-216 1895

[164] S Freud ldquoChildhood and concealing memoriesrdquo in The BasicWritings of Sigmund Freud The Modern Library NY USA19051953

[165] P J Bauer and M Larkina ldquoThe onset of childhood amnesiain childhood A prospective investigation of the course anddeterminants of forgetting of early-life eventsrdquoMemory vol 22no 8 pp 907ndash924 2014

[166] M S Seidenberg and L M Gonnerman ldquoExplaining deriva-tional morphology as the convergence of codesrdquo Trends inCognitive Sciences vol 4 no 9 pp 353ndash361 2000

[167] M Coltheart K Rastle C Perry R Langdon and J ZieglerldquoDRC a dual route cascaded model of visual word recognitionand reading aloudrdquoPsychological Review vol 108 no 1 pp 204ndash256 2001

[168] M V Butz and E F Kutter How the Mind Comes into BeingIntroducing Cognitive Science from a Functional and Computa-tional Perspective Oxford University Press 2016

[169] M W Harm and M S Seidenberg ldquoPhonology readingacquisition and dyslexia Insights from connectionist modelsrdquoPsychological Review vol 106 no 3 pp 491ndash528 1999

[170] L M Gonnerman M S Seidenberg and E S AndersenldquoGraded semantic and phonological similarity effects in prim-ing Evidence for a distributed connectionist approach to

morphologyrdquo Journal of Experimental Psychology General vol136 no 2 pp 323ndash345 2007

[171] H Zhao O Gallo I Frosio and J Kautz ldquoLoss Functions forImage Restoration With Neural Networksrdquo IEEE Transactionson Computational Imaging vol 3 no 1 pp 47ndash57 2017

[172] C Clarke and P Luce ldquoPerceptual adaptation to speakercharacteristics Vot boundaries in stop voicing categorizationrdquoin Proceedings of the ISCA Workshop on Plasticity in SpeechPerception 2005

[173] DNorris JMMcQueen andACutler ldquoPerceptual learning inspeechrdquo Cognitive Psychology vol 47 no 2 pp 204ndash238 2003

[174] A Schweitzer and N Lewandowski ldquoSocial factors in conver-gence of f1 and f2 in spontaneous speechrdquo in Proceedings ofthe 10th International Seminar on Speech Production CologneGermany 2014

[175] M J Pickering and S Garrod ldquoToward a mechanistic psychol-ogy of dialoguerdquo Behavioral and Brain Sciences vol 27 no 2 pp169ndash190 2004

[176] C M Clarke and M F Garrett ldquoRapid adaptation to foreign-accented Englishrdquo The Journal of the Acoustical Society ofAmerica vol 116 no 6 pp 3647ndash3658 2004

[177] W J Levelt ldquoDie konnektionistische moderdquo Sprache UndKognition vol 10 no 2 pp 61ndash72 1991

[178] A Kostic ldquoInformational load constraints on processinginflected morphologyrdquo in Morphological Aspects of LanguageProcessing L B Feldman Ed Lawrence Erlbaum Inc Publish-ers New Jersey USA 1995

[179] J P BlevinsWord andParadigmMorphology OxfordUniversityPress 2016

[180] E Forstemann Uber Deutsche volksetymologie Zeitschrift furvergleichende Sprachforschung auf dem Gebiete des DeutschenGriechischen und Lateinischen 1852

[181] K Nault Morphological Therapy Protocal [Ph D thesis] Uni-versity of Alberta Edmonton 2010

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 16: The Discriminative Lexicon: A Unified Computational Model ...downloads.hindawi.com/journals/complexity/2019/4895891.pdf · as developed within computational linguistics and com-puter

16 Complexity

In order to integrate this finding into our model weimplemented an additional network K that maps ortho-graphic vectors (coding the presence or absence of trigrams)onto phonological vectors (indicating the presence or absenceof triphones) The result is a dual route model for (silent)reading with a first route utilizing a network mappingstraight from orthography to semantics and a secondindirect route utilizing two networks one mapping fromtrigrams to triphones and the other subsequently mappingfrom triphones to semantics

The network K which maps trigram vectors onto tri-phone vectors provides good support for the relevant tri-phones but does not provide guidance as to their orderingUsing a graph-based algorithm detailed in the Appendix (seealso Section 52) we found that for 92 of the words thecorrect sequence of triphones jointly spanning the wordrsquossequence of letters received maximal support

We then constructed amatrix Twith the triphone vectorspredicted from the trigrams by networkK and defined a newnetwork H mapping these predicted triphone vectors ontothe semantic vectors SThemean correlation for the correctlyrecognized words was 090 and the median correlation was094 For words that were not recognized correctly the meanand median correlation were 045 and 043 respectively Wenow have for each word two estimated semantic vectors avector s1 obtained with network F of the first route and avector s2 obtained with network H from the auditory targetsthat themselves were predicted by the orthographic cuesThemean of the correlations of these pairs of vectors was 119903 = 073(119901 lt 00001)

To assess to what extent the networks that we definedare informative for actual visual word recognition we madeuse of the reaction times in the visual lexical decision taskavailable in the BLPAll 11480words in the current dataset arealso included in BLPWe derived threemeasures from each ofthe two networks

The first measure is the sum of the L1-norms of s1 ands2 to which we will refer as a wordrsquos total activation diversityThe total activation diversity is an estimate of the support fora wordrsquos lexicality as provided by the two routes The secondmeasure is the correlation of s1 and s2 to which we will referas a wordrsquos route congruency The third measure is the L1-norm of a lexomersquos column vector in S to which we referas its prior This last measure has previously been observedto be a strong predictor for lexical decision latencies [59] Awordrsquos prior is an estimate of a wordrsquos entrenchment and prioravailability

We fitted a generalized additive model to the inverse-transformed response latencies in the BLP with these threemeasures as key covariates of interest including as controlpredictors word length andword type (derived inflected andmonolexomic with derived as reference level) As shown inTable 5 inflected words were responded to more slowly thanderived words and the same holds for monolexomic wordsalbeit to a lesser extent Response latencies also increasedwith length Total activation diversity revealed a U-shapedeffect (Figure 5 left panel) For all but the lowest totalactivation diversities we find that response times increase as

total activation diversity increases This result is consistentwith previous findings for auditory word recognition [18]As expected given the results of Baayen et al [59] the priorwas a strong predictor with larger priors affording shorterresponse times There was a small effect of route congruencywhich emerged in a nonlinear interaction with the priorsuch that for large priors a greater route congruency affordedfurther reduction in response time (Figure 5 right panel)Apparently when the two routes converge on the samesemantic vector uncertainty is reduced and a faster responsecan be initiated

For comparison the dual-route model was implementedwith ndl as well Three measures were derived from thenetworks and used to predict the same response latenciesThese measures include activation activation diversity andprior all of which have been reported to be reliable predictorsfor RT in visual lexical decision [54 58 59] However sincehere we are dealing with two routes the three measures canbe independently derived from both routes We thereforesummed up the measures of the two routes obtaining threemeasures total activation total activation diversity and totalprior With word type and word length as control factorsTable 6 shows that thesemeasures participated in a three-wayinteraction presented in Figure 6 Total activation showed aU-shaped effect on RT that is increasingly attenuated as totalactivation diversity is increased (left panel) Total activationalso interacted with the total prior (right panel) For mediumrange values of total activation RTs increased with totalactivation and as expected decreased with the total priorThecenter panel shows that RTs decrease with total prior formostof the range of total activation diversity and that the effect oftotal activation diversity changes sign going from low to highvalues of the total prior

Model comparison revealed that the generalized additivemodel with ldl measures (Table 5) provides a substantiallyimproved fit to the data compared to the GAM using ndlmeasures (Table 6) with a difference of no less than 65101AIC units At the same time the GAM based on ldl is lesscomplex

Given the substantially better predictability of ldl mea-sures on human behavioral data one remaining question iswhether it is really the case that two routes are involved insilent reading After all the superior accuracy of the secondroute might be an artefact of a simple machine learningtechnique performing better for triphones than for trigramsThis question can be addressed by examining whether thefit of the GAM summarized in Table 5 improves or worsensdepending on whether the activation diversity of the first orthe second route is taken out of commission

When the GAM is provided access to just the activationdiversity of s2 the AIC increased by 10059 units Howeverwhen themodel is based only on the activation diversity of s1themodel fit increased by no less than 14790 AIC units Fromthis we conclude that at least for the visual lexical decisionlatencies in the BLP the second route first mapping trigramsto triphones and then mapping triphones onto semanticvectors plays the more important role

The superiority of the second route may in part bedue to the number of triphone features being larger

Complexity 17

minus0295

minus00175

026

part

ial e

ffect

1 2 3 4 50Total activation diversity

000

005

010

015

Part

ial e

ffect

05

10

15

20

Prio

r

02 04 06 08 1000Selfminuscorrelation

Figure 5 The partial effects of total activation diversity (left) and the interaction of route congruency and prior (right) on RT in the BritishLexicon Project

Table 5 Summary of a generalized additive model fitted to response latencies in visual lexical decision using measures based on ldl s thinplate regression spline smooth te tensor product smooth

A parametric coefficients Estimate Std Error t-value p-valueintercept -17774 00087 -205414 lt0001word typeInflected 01110 00059 18742 lt0001word typeMonomorphemic 00233 00054 4322 lt0001word length 00117 00011 10981 lt0001B smooth terms edf Refdf F p-values(total activation diversity) 6002 7222 236 lt0001te(route congruency prior) 14673 17850 2137 lt0001

02 06 10 14

NDL total activation

minus0171

0031

0233

part

ial e

ffect

minus4 minus2 0 2 4

10

15

20

25

30

35

40

NDL total prior

ND

L to

tal a

ctiv

atio

n di

vers

ity

minus0416

01915

0799

part

ial e

ffect

02 06 10 14

minus4

minus2

02

4

NDL total activation

ND

L to

tal p

rior

minus0158

01635

0485

part

ial e

ffect

10

15

20

25

30

35

40

ND

L to

tal a

ctiv

atio

n di

vers

ity

Figure 6The interaction of total activation and total activation diversity (left) of total prior by total activation diversity (center) and of totalactivation by total prior (right) on RT in the British Lexicon Project based on ndl

18 Complexity

Table 6 Summary of a generalized additive model fitted to response latencies in visual lexical decision using measures from ndl te tensorproduct smooth

A parametric coefficients Estimate Std Error t-value p-valueintercept -16934 00086 -195933 lt0001word typeInflected 00286 00052 5486 lt0001word typeMonomorphemic 00042 00055 0770 = 4word length 00066 00011 5893 lt0001B smooth terms edf Refdf F p-valuete(activation activation diversity prior) 4134 4972 1084 lt0001

than the number of trigram features (3465 trigrams ver-sus 5929 triphones) More features which mathematicallyamount to more predictors enable more precise map-pings Furthermore the heavy use made in English ofletter sequences such as ough (with 10 different pro-nunciations httpswwwdictionarycomesough)reduces semantic discriminability compared to the corre-sponding spoken forms It is noteworthy however that thebenefits of the triphone-to-semantics mapping are possibleonly thanks to the high accuracy with which orthographictrigram vectors are mapped onto phonological triphonevectors (92)

It is unlikely that the lsquophonological routersquo is always dom-inant in silent reading Especially in fast lsquodiagonalrsquo readingthe lsquodirect routersquomay bemore dominantThere is remarkablealthough for the present authors unexpected convergencewith the dual route model for reading aloud of Coltheartet al [108] and Coltheart [109] However while the text-to-phonology route of their model has as primary function toexplain why nonwords can be pronounced our results showthat both routes can actually be active when silently readingreal words A fundamental difference is however that in ourmodel wordsrsquo semantics play a central role

43 Auditory Comprehension For the modeling of readingwe made use of letter trigrams as cues These cues abstractaway from the actual visual patterns that fall on the retinapatterns that are already transformed at the retina beforebeing sent to the visual cortex Our hypothesis is that lettertrigrams represent those high-level cells or cell assemblies inthe visual system that are critical for reading andwe thereforeleave the modeling possibly with deep learning networksof how patterns on the retina are transformed into lettertrigrams for further research

As a consequence of the high level of abstraction of thetrigrams a word form is represented by a unique vectorspecifying which of a fixed set of letter trigrams is present inthe word Although one might consider modeling auditorycomprehension with phone triplets (triphones) replacing theletter trigrams of visual comprehension such an approachwould not do justice to the enormous variability of actualspeech Whereas the modeling of reading printed wordscan depart from the assumption that the pixels of a wordrsquosletters on a computer screen are in a fixed configurationindependently of where the word is shown on the screenthe speech signal of the same word type varies from token

to token as illustrated in Figure 7 for the English wordcrisis A survey of the Buckeye corpus [110] of spontaneousconversations recorded at Columbus Ohio [111] indicatesthat around 5 of the words are spoken with one syllablemissing and that a little over 20 of words have at least onephone missing

It is widely believed that the phoneme as an abstractunit of sound is essential for coming to grips with thehuge variability that characterizes the speech signal [112ndash114]However the phoneme as theoretical linguistic constructis deeply problematic [93] and for many spoken formscanonical phonemes do not do justice to the phonetics ofthe actual sounds [115] Furthermore if words are defined assequences of phones the problem arises what representationsto posit for words with two ormore reduced variants Addingentries for reduced forms to the lexicon turns out not toafford better overal recognition [116] Although exemplarmodels have been put forward to overcome this problem[117] we take a different approach here and following Arnoldet al [18] lay out a discriminative approach to auditorycomprehension

The cues that we make use of to represent the acousticsignal are the Frequency Band Summary Features (FBSFs)introduced by Arnold et al [18] as input cues FBSFs summa-rize the information present in the spectrogram of a speechsignal The algorithm that derives FBSFs first chunks theinput at the minima of the Hilbert amplitude envelope of thesignalrsquos oscillogram (see the upper panel of Figure 8) Foreach chunk the algorithm distinguishes 21 frequency bandsin the MEL scaled spectrum and intensities are discretizedinto 5 levels (lower panel in Figure 8) for small intervalsof time For each chunk and for each frequency band inthese chunks a discrete feature is derived that specifies chunknumber frequency band number and a description of thetemporal variation in the band bringing together minimummaximum median initial and final intensity values The21 frequency bands are inspired by the 21 receptive areason the cochlear membrane that are sensitive to differentranges of frequencies [118] Thus a given FBSF is a proxyfor cell assemblies in the auditory cortex that respond to aparticular pattern of changes over time in spectral intensityThe AcousticNDLCodeR package [119] for R [120] wasemployed to extract the FBSFs from the audio files

We tested ldl on 20 hours of speech sampled from theaudio files of the ucla library broadcast newsscapedata a vast repository of multimodal TV news broadcasts

Complexity 19

0 5000 10000 15000 20000 25000 30000 35000 40000minus6000

minus4000

minus2000

0

2000

4000

6000

8000

0 5000 10000 15000 20000 25000 30000 35000 40000minus6000

minus4000

minus2000

0

2000

4000

6000

8000

Figure 7Oscillogram for twodifferent realizations of theword crisiswith different degrees of reduction in theNewsScape archive [kraız](left)and [khraısıs](right)

00 01 02 03 04 05 06time

minus020minus015minus010minus005

000005010015

ampl

itude

1

21

frequ

ency

ban

d

Figure 8 Oscillogram of the speech signal for a realization of the word economic with Hilbert envelope (in orange) outlining the signal isshown on the top panel The lower panel depicts the discretized MEL scaled spectrum of the signalThe vertical bars are the boundary pointsthat partition the signal into 4 chunks For one chunk horizontal lines (in red) highlight one of the frequency bands for which the FBSFsprovide summaries of the variation over time in that frequency band The FBSF for the highlighted band is ldquoband11-start3-median3-min2-max5-end3-part2rdquo

provided to us by the Distributed Little Red Hen Lab Theaudio files of this resource were automatically classified asclean for relatively clean parts where there is speech withoutbackground noise or music and noisy for speech snippetswhere background noise or music is present Here we reportresults for 20 hours of clean speech to a total of 131673word tokens (representing 4779word types) with in all 40639distinct FBSFs

The FBSFs for the word tokens are brought together ina matrix C119886 with dimensions 131673 audio tokens times 40639FBSFs The targeted semantic vectors are taken from the Smatrix which is expanded to a matrix with 131673 rows onefor each audio token and 4609 columns the dimension of

the semantic vectors Although the transformation matrix Fcould be obtained by calculating C1015840S the calculation of C1015840is numerically expensive To reduce computational costs wecalculated F as follows (the transpose of a square matrix Xdenoted by X119879 is obtained by replacing the upper triangleof the matrix by the lower triangle and vice versa Thus( 3 87 2 )119879 = ( 3 78 2 ) For a nonsquare matrix the transpose isobtained by switching rows and columnsThus a 4times2 matrixbecomes a 2 times 4 matrix when transposed)

CF = S

C119879CF = C119879S

20 Complexity

(C119879C)minus1 (C119879C) F = (C119879C)minus1 C119879SF = (C119879C)minus1 C119879S

(17)

In this way matrix inversion is required for a much smallermatrixC119879C which is a square matrix of size 40639 times 40639

To evaluate model performance we compared the esti-mated semantic vectors with the targeted semantic vectorsusing the Pearson correlation We therefore calculated the131673 times 131673 correlation matrix for all possible pairs ofestimated and targeted semantic vectors Precisionwas calcu-lated by comparing predicted vectors with the gold standardprovided by the targeted vectors Recognition was defined tobe successful if the correlation of the predicted vector withthe targeted gold vector was the highest of all the pairwisecorrelations of this predicted vector with any of the other goldsemantic vectors Precision defined as the proportion of cor-rect recognitions divided by the total number of audio tokenswas at 3361 For the correctly identified words the meancorrelation was 072 and for the incorrectly identified wordsit was 055 To place this in perspective a naive discrimi-nation model with discrete lexomes as output performed at12 and a deep convolution network Mozilla DeepSpeech(httpsgithubcommozillaDeepSpeech based onHannun et al [9]) performed at 6 The low performanceofMozilla DeepSpeech is due primarily due to its dependenceon a languagemodelWhenpresentedwith utterances insteadof single words it performs remarkably well

44 Discussion A problem with naive discriminative learn-ing that has been puzzling for a long time is that measuresbased on ndl performed well as predictors of processingtimes [54 58] whereas accuracy for lexical decisions waslow An accuracy of 27 for a model performing an 11480-classification task is perhaps reasonable but the lack ofprecision is unsatisfactory when the goal is to model humanvisual lexicality decisions Bymoving fromndl to ldlmodelaccuracy is substantially improved (to 59) At the sametime predictions for reaction times improved considerablyas well As lexicality decisions do not require word identifica-tion further improvement in predicting decision behavior isexpected to be possible by considering not only whether thepredicted semantic is closest to the targeted vector but alsomeasures such as how densely the space around the predictedsemantic vector is populated

Accuracy for auditory comprehension is lower for thedata we considered above at around 33 Interestingly aseries of studies indicates that recognizing isolated wordstaken out of running speech is a nontrivial task also forhuman listeners [121ndash123] Correct identification by nativespeakers of 1000 randomly sampled word tokens from aGerman corpus of spontaneous conversational speech rangedbetween 21 and 44 [18] For both human listeners andautomatic speech recognition systems recognition improvesconsiderably when words are presented in their naturalcontext Given that ldl with FBSFs performs very wellon isolated word recognition it seems worth investigating

further whether the present approach can be developed into afully-fledged model of auditory comprehension that can takefull utterances as input For a blueprint of how we plan toimplement such a model see Baayen et al [124]

5 Speech Production

This section examines whether we can predict wordsrsquo formsfrom the semantic vectors of S If this is possible for thepresent dataset with reasonable accuracy we have a proofof concept that discriminative morphology is feasible notonly for comprehension but also for speech productionThe first subsection introduces the computational imple-mentation The next subsection reports on the modelrsquosperformance which is evaluated first for monomorphemicwords then for inflected words and finally for derivedwords Section 53 provides further evidence for the pro-duction network by showing that as the support from thesemantics for the triphones becomes weaker the amount oftime required for articulating the corresponding segmentsincreases

51 Computational Implementation For a productionmodelsome representational format is required for the output thatin turn drives articulation In what follows we make useof triphones as output features Triphones capture part ofthe contextual dependencies that characterize speech andthat render problematic the phoneme as elementary unit ofa phonological calculus [93] Triphones are in many waysnot ideal in that they inherit the limitations that come withdiscrete units Other output features structured along thelines of gestural phonology [125] or time series of movementsof key articulators registered with electromagnetic articulog-raphy or ultrasound are on our list for further explorationFor now we use triphones as a convenience construct andwe will show that given the support from the semantics forthe triphones the sequence of phones can be reconstructedwith high accuracy As some models of speech productionassemble articulatory targets from phone segments (eg[126]) the present implementation can be seen as a front-endfor this type of model

Before putting the model to the test we first clarify theway the model works by means of our toy lexicon with thewords one two and three Above we introduced the semanticmatrix S (13) which we repeat here for convenience

S = one two threeonetwothree

( 100201031001

040110 ) (18)

as well as an C indicator matrix specifying which triphonesoccur in which words (12) As in what follows this matrixspecifies the triphones targeted for productionwe henceforthrefer to this matrix as the Tmatrix

Complexity 21

T

= wV wVn Vn tu tu Tr Tri rionetwothree

( 100100

100010

010001

001001 ) (19)

For production our interest is in thematrixG that transformsthe row vectors of S into the row vectors of T ie we need tosolve

SG = T (20)

Given G we can predict for any semantic vector s the vectorof triphones t that quantifies the support for the triphonesprovided by s simply by multiplying s with G

t = sG (21)

As before the transformation matrix G is straightforwardto estimate Let S1015840 denote the Moore-Penrose generalizedinverse of S Since

S1015840SG = S1015840T (22)

we have

G = S1015840T (23)

Given G we can predict the triphone matrix T from thesemantic matrix S

SG = T (24)

For the present example the inverse of S and S1015840 is

S1015840 = one two threeonetwothree

( 110minus021minus009minus029107minus008

minus041minus002104 ) (25)

and the transformation matrix G is

G = wV wVn Vn tu tu Tr Tri ri

onetwothree

( 110minus021minus009110minus021minus009

110minus021minus009minus029107minus008

minus029107minus008minus041minus002104

minus041minus002104minus041minus002104 ) (26)

For this simple example T is virtually identical to T Forrealistic data T will not be identical to T but will be anapproximation of it that is optimal in the least squares senseThe triphones with the strongest support are expected to bethe most likely to be the triphones defining a wordrsquos form

We made use of the S matrix which we derived fromthe tasa corpus as described in Section 3 The majority ofcolumns of the 23561 times 23561 matrix S show very smalldeviations from zero and hence are uninformative As beforewe reduced the number of columns of S by removing columnswith very low variance Here one option is to remove allcolumns with a variance below a preset threshold 120579 Howeverto avoid adding a threshold as a free parameter we setthe number of columns retained to the number of differenttriphones in a given dataset a number which is around 119899 =4500 In summary S denotes a 119908times119899 matrix that specifies foreach of 119908 words an 119899-dimensional semantic vector

52 Model Performance

521 Performance on Monolexomic Words We first exam-ined model performance on a dataset comprising monolex-omicwords that did not carry any inflectional exponentsThisdataset of 3987words comprised three irregular comparatives

(elder less andmore) two irregular superlatives (least most)28 irregular past tense forms and one irregular past participle(smelt) For this set of words we constructed a 3987 times 4446matrix of semantic vectors S119898 (a submatrix of the S matrixintroduced above in Section 3) and a 3987 times 4446 triphonematrix T119898 (a submatrix of T) The number of colums of S119898was set to the number of different triphones (the columndimension of T119898) Those column vectors of S119898 were retainedthat had the 4446 highest variances We then estimated thetransformation matrix G and used this matrix to predict thetriphones that define wordsrsquo forms

We evaluated model performance in two ways First weinspected whether the triphones with maximal support wereindeed the targeted triphones This turned out to be the casefor all words Targeted triphones had an activation valueclose to one and nontargeted triphones an activation closeto zero As the triphones are not ordered we also investi-gated whether the sequence of phones could be constructedcorrectly for these words To this end we set a thresholdof 099 extracted all triphones with an activation exceedingthis threshold and used the all simple paths (a path issimple if the vertices it visits are not visited more than once)function from the igraph package [127] to calculate all pathsstarting with any left-edge triphone in the set of extractedtriphones From the resulting set of paths we selected the

22 Complexity

longest path which invariably was perfectly aligned with thesequence of triphones that defines wordsrsquo forms

We also evaluated model performance with a secondmore general heuristic algorithm that also makes use of thesame algorithm from graph theory Our algorithm sets up agraph with vertices collected from the triphones that are bestsupported by the relevant semantic vectors and considers allpaths it can find that lead from an initial triphone to a finaltriphoneThis algorithm which is presented inmore detail inthe appendix and which is essential for novel complex wordsproduced the correct form for 3982 out of 3987 words Itselected a shorter form for five words int for intent lin forlinnen mis for mistress oint for ointment and pin for pippinThe correct forms were also found but ranked second due toa simple length penalty that is implemented in the algorithm

From these analyses it is clear that mapping nearly4000 semantic vectors on their corresponding triphone pathscan be accomplished with very high accuracy for Englishmonolexomic words The question to be addressed nextis how well this approach works for complex words Wefirst address inflected forms and limit ourselves here to theinflected variants of the present set of 3987 monolexomicwords

522 Performance on Inflected Words Following the classicdistinction between inflection and word formation inflectedwords did not receive semantic vectors of their own Never-theless we can create semantic vectors for inflected words byadding the semantic vector of an inflectional function to thesemantic vector of its base However there are several ways inwhich themapping frommeaning to form for inflectedwordscan be set up To explain this we need some further notation

Let S119898 and T119898 denote the submatrices of S and T thatcontain the semantic and triphone vectors of monolexomicwords Assume that a subset of 119896 of these monolexomicwords is attested in the training corpus with inflectionalfunction 119886 Let T119886 denote the matrix with the triphonevectors of these inflected words and let S119886 denote thecorresponding semantic vectors To obtain S119886 we take thepertinent submatrix S119898119886 from S119898 and add the semantic vectors119886 of the affix

S119886 = S119898119886 + i⨂ s119886 (27)

Here i is a unit vector of length 119896 and ⨂ is the generalizedKronecker product which in (27) stacks 119896 copies of s119886 row-wise As a first step we could define a separate mapping G119886for each inflectional function 119886

S119886G119886 = T119886 (28)

but in this set-up learning of inflected words does not benefitfrom the knowledge of the base words This can be remediedby a mapping for augmented matrices that contain the rowvectors for both base words and inflected words[S119898

S119886]G119886 = [T119898

T119886] (29)

The dimension of G119886 (length of semantic vector by lengthof triphone vector) remains the same so this option is notmore costly than the preceding one Nevertheless for eachinflectional function a separate large matrix is required Amuch more parsimonious solution is to build augmentedmatrices for base words and all inflected words jointly

[[[[[[[[[[S119898S1198861S1198862S119886119899

]]]]]]]]]]G = [[[[[[[[[[

T119898T1198861T1198862T119886119899

]]]]]]]]]](30)

The dimension of G is identical to that of G119886 but now allinflectional functions are dealt with by a single mappingIn what follows we report the results obtained with thismapping

We selected 6595 inflected variants which met the cri-terion that the frequency of the corresponding inflectionalfunction was at least 50 This resulted in a dataset with 91comparatives 97 superlatives 2401 plurals 1333 continuousforms (eg walking) 859 past tense forms and 1086 formsclassed as perfective (past participles) as well as 728 thirdperson verb forms (eg walks) Many forms can be analyzedas either past tenses or as past participles We followed theanalyses of the treetagger which resulted in a dataset inwhichboth inflectional functions are well-attested

Following (30) we obtained an (augmented) 10582 times5483 semantic matrix S whereas before we retained the 5483columns with the highest column variance The (augmented)triphone matrix T for this dataset had the same dimensions

Inspection of the activations of the triphones revealedthat targeted triphones had top activations for 85 of themonolexomic words and 86 of the inflected words Theproportion of words with at most one intruding triphone was97 for both monolexomic and inflected words The graph-based algorithm performed with an overall accuracy of 94accuracies broken down bymorphology revealed an accuracyof 99 for the monolexomic words and an accuracy of 92for inflected words One source of errors for the productionalgorithm is inconsistent coding in the celex database Forinstance the stem of prosper is coded as having a final schwafollowed by r but the inflected forms are coded without ther creating a mismatch between a partially rhotic stem andcompletely non-rhotic inflected variants

We next put model performance to a more stringent testby using 10-fold cross-validation for the inflected variantsFor each fold we trained on all stems and 90 of all inflectedforms and then evaluated performance on the 10of inflectedforms that were not seen in training In this way we can ascer-tain the extent to which our production system (network plusgraph-based algorithm for ordering triphones) is productiveWe excluded from the cross-validation procedure irregularinflected forms forms with celex phonological forms withinconsistent within-paradigm rhoticism as well as forms thestem of which was not available in the training set Thus

Complexity 23

cross-validation was carried out for a total of 6236 inflectedforms

As before the semantic vectors for inflected words wereobtained by addition of the corresponding content and inflec-tional semantic vectors For each training set we calculatedthe transformationmatrixG from the S andTmatrices of thattraining set For an out-of-bag inflected form in the test setwe calculated its semantic vector and multiplied this vectorwith the transformation matrix (using (21)) to obtain thepredicted triphone vector t

The proportion of forms that were predicted correctlywas 062 The proportion of forms ranked second was 025Forms that were incorrectly ranked first typically were otherinflected forms (including bare stems) that happened toreceive stronger support than the targeted form Such errorsare not uncommon in spoken English For instance inthe Buckeye corpus [110] closest is once reduced to closFurthermore Dell [90] classified forms such as concludementfor conclusion and he relax for he relaxes as (noncontextual)errors

The algorithm failed to produce the targeted form for3 of the cases Examples of the forms produced instead arethe past tense for blazed being realized with [zId] insteadof [zd] the plural mouths being predicted as having [Ts] ascoda rather than [Ds] and finest being reduced to finst Thevoiceless production formouth does not follow the dictionarynorm but is used as attested by on-line pronunciationdictionaries Furthermore the voicing alternation is partlyunpredictable (see eg [128] for final devoicing in Dutch)and hence model performance here is not unreasonable Wenext consider model accuracy for derived words

523 Performance on Derived Words When building thevector space model we distinguished between inflectionand word formation Inflected words did not receive theirown semantic vectors By contrast each derived word wasassigned its own lexome together with a lexome for itsderivational function Thus happiness was paired with twolexomes happiness and ness Since the semantic matrix forour dataset already contains semantic vectors for derivedwords we first investigated how well forms are predictedwhen derived words are assessed along with monolexomicwords without any further differentiation between the twoTo this end we constructed a semantic matrix S for 4885words (rows) by 4993 (columns) and constructed the trans-formation matrix G from this matrix and the correspondingtriphone matrix T The predicted form vectors of T =SG supported the targeted triphones above all other tri-phones without exception Furthermore the graph-basedalgorithm correctly reconstructed 99 of all forms withonly 5 cases where it assigned the correct form secondrank

Next we inspected how the algorithm performs whenthe semantic vectors of derived forms are obtained fromthe semantic vectors of their base words and those of theirderivational lexomes instead of using the semantic vectors ofthe derived words themselves To allow subsequent evalua-tion by cross-validation we selected those derived words that

contained an affix that occured at least 30 times in our data set(again (38) agent (177) ful (45) instrument (82) less(54) ly (127) and ness (57)) to a total of 770 complex words

We first combined these derived words with the 3987monolexomic words For the resulting 4885 words the Tand S matrices were constructed from which we derivedthe transformation matrix G and subsequently the matrixof predicted triphone strengths T The proportion of wordsfor which the targeted triphones were the best supportedtriphones was 096 and the graph algorithm performed withan accuracy of 989

In order to assess the productivity of the system weevaluated performance on derived words by means of 10-fold cross-validation For each fold we made sure that eachaffix was present proportionally to its frequency in the overalldataset and that a derived wordrsquos base word was included inthe training set

We first examined performance when the transformationmatrix is estimated from a semantic matrix that contains thesemantic vectors of the derived words themselves whereassemantic vectors for unseen derived words are obtained bysummation of the semantic vectors of base and affix It turnsout that this set-up results in a total failure In the presentframework the semantic vectors of derived words are tooidiosyncratic and too finely tuned to their own collocationalpreferences They are too scattered in semantic space tosupport a transformation matrix that supports the triphonesof both stem and affix

We then used exactly the same cross-validation procedureas outlined above for inflected words constructing semanticvectors for derived words from the semantic vectors of baseand affix and calculating the transformation matrix G fromthese summed vectors For unseen derivedwordsGwas usedto transform the semantic vectors for novel derived words(obtained by summing the vectors of base and affix) intotriphone space

For 75 of the derived words the graph algorithmreconstructed the targeted triphones For 14 of the derivednouns the targeted form was not retrieved These includecases such as resound which the model produced with [s]instead of the (unexpected) [z] sewer which the modelproduced with R instead of only R and tumbler where themodel used syllabic [l] instead of nonsyllabic [l] given in thecelex target form

53 Weak Links in the Triphone Graph and Delays in SpeechProduction An important property of the model is that thesupport for trigrams changes where a wordrsquos graph branchesout for different morphological variants This is illustratedin Figure 9 for the inflected form blending Support for thestem-final triphone End is still at 1 but then blend can endor continue as blends blended or blending The resultinguncertainty is reflected in the weights on the edges leavingEnd In this example the targeted ing form is driven by theinflectional semantic vector for continuous and hence theedge to ndI is best supported For other forms the edgeweights will be different and hence other paths will be bettersupported

24 Complexity

1

1

1051

024023

03

023

1

001 001

1

02

03

023

002

1 0051

bl

blE

lEn

EndndI

dIN

IN

INI

NIN

dId

Id

IdI

dII

IIN

ndz

dz

dzI

zIN

nd

EnI

nIN

lEI

EIN

Figure 9 The directed graph for blending Vertices represent triphones including triphones such as IIN that were posited by the graphalgorithm to bridge potentially relevant transitions that are not instantiated in the training data Edges are labelled with their edge weightsThe graph the edge weights of which are specific to the lexomes blend and continuous incorporates not only the path for blending butalso the paths for blend blends and blended

The uncertainty that arises at branching points in thetriphone graph is of interest in the light of several exper-imental results For instance inter keystroke intervals intyping become longer at syllable and morph boundaries[129 130] Evidence from articulography suggests variabilityis greater at morph boundaries [131] Longer keystroke exe-cution times and greater articulatory variability are exactlywhat is expected under reduced edge support We thereforeexamined whether the edge weights at the first branchingpoint are predictive for lexical processing To this end weinvestigated the acoustic duration of the segment at the centerof the first triphone with a reduced LDL edge weight (in thepresent example d in ndI henceforth lsquobranching segmentrsquo)for those words in our study that are attested in the Buckeyecorpus [110]

The dataset we extracted from the Buckeye corpus com-prised 15105 tokens of a total of 1327 word types collectedfrom 40 speakers For each of these words we calculatedthe relative duration of the branching segment calculatedby dividing segment duration by word duration For thepurposes of statistical evaluation the distribution of thisrelative duration was brought closer to normality bymeans ofa logarithmic transformationWe fitted a generalized additivemixed model [132] to log relative duration with randomintercepts for word and speaker log LDL edge weight aspredictor of interest and local speech rate (Phrase Rate)log neighborhood density (Ncount) and log word frequencyas control variables A summary of this model is presentedin Table 7 As expected an increase in LDL Edge Weight

goes hand in hand with a reduction in the duration of thebranching segment In other words when the edge weight isreduced production is slowed

The adverse effects of weaknesses in a wordrsquos path inthe graph where the path branches is of interest againstthe discussion about the function of weak links in diphonetransitions in the literature on comprehension [133ndash138] Forcomprehension it has been argued that bigram or diphonelsquotroughsrsquo ie low transitional probabilities in a chain of hightransitional probabilities provide points where sequencesare segmented and parsed into their constituents Fromthe perspective of discrimination learning however low-probability transitions function in exactly the opposite wayfor comprehension [54 124] Naive discriminative learningalso predicts that troughs should give rise to shorter process-ing times in comprehension but not because morphologicaldecomposition would proceed more effectively Since high-frequency boundary bigrams and diphones are typically usedword-internally across many words they have a low cuevalidity for thesewords Conversely low-frequency boundarybigrams are much more typical for very specific base+affixcombinations and hence are better discriminative cues thatafford enhanced activation of wordsrsquo lexomes This in turngives rise to faster processing (see also Ramscar et al [78] forthe discriminative function of low-frequency bigrams in low-frequency monolexomic words)

There is remarkable convergence between the directedgraphs for speech production such as illustrated in Figure 9and computational models using temporal self-organizing

Complexity 25

Table 7 Statistics for the partial effects of a generalized additive mixed model fitted to the relative duration of edge segments in the Buckeyecorpus The trend for log frequency is positive accelerating TPRS thin plate regression spline

A parametric coefficients Estimate Std Error t-value p-valueIntercept -19165 00592 -323908 lt 00001Log LDL Edge Weight -01358 00400 -33960 00007Phrase Rate 00047 00021 22449 00248Log Ncount 01114 00189 58837 lt 00001B smooth terms edf Refdf F-value p-valueTPRS log Frequency 19007 19050 75495 00004random intercepts word 11498839 13230000 314919 lt 00001random intercepts speaker 302995 390000 345579 lt 00001

maps (TSOMs [139ndash141]) TSOMs also use error-drivenlearning and in recent work attention is also drawn toweak edges in wordsrsquo paths in the TSOM and the conse-quences thereof for lexical processing [142] We are not usingTSOMs however one reason being that in our experiencethey do not scale up well to realistically sized lexiconsA second reason is that we are pursuing the hypothesisthat form serves meaning and that self-organization ofform by itself is not necessary This hypothesis is likelytoo strong especially as we do not provide a rationale forthe trigram and triphone units that we use to representaspects of form It is worth noting that spatial organizationof form features is not restricted to TSOMs Although inour approach the spatial organization of triphone features isleft unspecified such an organization can be easily enforced(see [85] for further details) by using an algorithm suchas graphopt which self-organizes the vertices of a graphinto a two-dimensional plane Figure 9 was obtained withthis algorithm (Graphopt was developed for the layout oflarge graphs httpwwwschmuhlorggraphopt andis implemented in the igraph package Graphopt uses basicprinciples of physics to iteratively determine an optimallayout Each node in the graph is given both mass and anelectric charge and edges between nodes are modeled asspringsThis sets up a system inwhich there are attracting andrepelling forces between the vertices of the graph and thisphysical system is simulated until it reaches an equilibrium)

54 Discussion Our production network reconstructsknown wordsrsquo forms with a high accuracy 999 formonolexomic words 92 for inflected words and 99 forderived words For novel complex words accuracy under10-fold cross-validation was 62 for inflected words and 75for derived words

The drop in performance for novel forms is perhapsunsurprising given that speakers understand many morewords than they themselves produce even though they hearnovel forms on a fairly regular basis as they proceed throughlife [79 143]

However we also encountered several technical problemsthat are not due to the algorithm but to the representationsthat the algorithm has had to work with First it is surprisingthat accuracy is as high as it is given that the semanticvectors are constructed from a small corpus with a very

simple discriminative algorithm Second we encounteredinconsistencies in the phonological forms retrieved from thecelex database inconsistencies that in part are due to theuse of discrete triphones Third many cases where the modelpredictions do not match the targeted triphone sequence thetargeted forms have a minor irregularity (eg resound with[z] instead of [s]) Fourth several of the typical errors that themodel makes are known kinds of speech errors or reducedforms that one might encounter in engaged conversationalspeech

It is noteworthy that themodel is almost completely data-driven The triphones are derived from celex and otherthan somemanual corrections for inconsistencies are derivedautomatically from wordsrsquo phone sequences The semanticvectors are based on the tasa corpus and were not in any wayoptimized for the production model Given the triphone andsemantic vectors the transformation matrix is completelydetermined No by-hand engineering of rules and exceptionsis required nor is it necessary to specify with hand-codedlinks what the first segment of a word is what its secondsegment is etc as in the weaver model [37] It is only in theheuristic graph algorithm that three thresholds are requiredin order to avoid that graphs become too large to remaincomputationally tractable

The speech production system that emerges from thisapproach comprises first of all an excellent memory forforms that have been encountered before Importantly thismemory is not a static memory but a dynamic one It is not arepository of stored forms but it reconstructs the forms fromthe semantics it is requested to encode For regular unseenforms however a second network is required that projects aregularized semantic space (obtained by accumulation of thesemantic vectors of content and inflectional or derivationalfunctions) onto the triphone output space Importantly itappears that no separate transformations or rules are requiredfor individual inflectional or derivational functions

Jointly the two networks both of which can also betrained incrementally using the learning rule ofWidrow-Hoff[44] define a dual route model attempts to build a singleintegrated network were not successfulThe semantic vectorsof derived words are too idiosyncratic to allow generalizationfor novel forms It is the property of the semantic vectorsof derived lexomes described in Section 323 namely thatthey are close to but not inside the cloud of their content

26 Complexity

lexomes which makes them productive Novel forms do notpartake in the idiosyncracies of lexicalized existing wordsbut instead remain bound to base and derivational lexomesIt is exactly this property that makes them pronounceableOnce produced a novel form will then gravitate as experi-ence accumulates towards the cloud of semantic vectors ofits morphological category meanwhile also developing itsown idiosyncracies in pronunciation including sometimeshighly reduced pronunciation variants (eg tyk for Dutchnatuurlijk ([natyrlk]) [111 144 145] It is perhaps possibleto merge the two routes into one but for this much morefine-grained semantic vectors are required that incorporateinformation about discourse and the speakerrsquos stance withrespect to the adressee (see eg [115] for detailed discussionof such factors)

6 Bringing in Time

Thus far we have not considered the role of time The audiosignal comes in over time longerwords are typically readwithmore than one fixation and likewise articulation is a temporalprocess In this section we briefly outline how time can bebrought into the model We do so by discussing the readingof proper names

Proper names (morphologically compounds) pose achallenge to compositional theories as the people referredto by names such as Richard Dawkins Richard NixonRichardThompson and SandraThompson are not obviouslysemantic composites of each other We therefore assignlexomes to names irrespective of whether the individu-als referred to are real or fictive alive or dead Further-more we assume that when the personal name and familyname receive their own fixations both names are under-stood as pertaining to the same named entity which istherefore coupled with its own unique lexome Thus forthe example sentences in Table 8 the letter trigrams ofJohn are paired with the lexome JohnClark and like-wise the letter trigrams of Clark are paired with this lex-ome

By way of illustration we obtained thematrix of semanticvectors training on 100 randomly ordered tokens of thesentences of Table 8 We then constructed a trigram cuematrix C specifying for each word (John Clark wrote )which trigrams it contains In parallel a matrix L specify-ing for each word its corresponding lexomes (JohnClarkJohnClark write ) was set up We then calculatedthe matrix F by solving CF = L and used F to cal-culate estimated (predicted) semantic vectors L Figure 10presents the correlations of the estimated semantic vectorsfor the word forms John John Welsch Janet and Clarkwith the targeted semantic vectors for the named entitiesJohnClark JohnWelsch JohnWiggam AnneHastieand JaneClark For the sequentially read words John andWelsch the semantic vector generated by John and thatgenerated by Welsch were summed to obtain the integratedsemantic vector for the composite name

Figure 10 upper left panel illustrates that upon readingthe personal name John there is considerable uncertainty

about which named entity is at issue When subsequentlythe family name is read (upper right panel) uncertainty isreduced and John Welsch now receives full support whereasother named entities have correlations close to zero or evennegative correlations As in the small world of the presenttoy example Janet is a unique personal name there is nouncertainty about what named entity is at issue when Janetis read For the family name Clark on its own by contrastJohn Clark and Janet Clark are both viable options

For the present example we assumed that every ortho-graphic word received one fixation However depending onwhether the eye lands far enough into the word namessuch as John Clark and compounds such as graph theorycan also be read with a single fixation For single-fixationreading learning events should comprise the joint trigrams ofthe two orthographic words which are then simultaneouslycalibrated against the targeted lexome (JohnClark graph-theory)

Obviously the true complexities of reading such asparafoveal preview are not captured by this example Never-theless as a rough first approximation this example showsa way forward for integrating time into the discriminativelexicon

7 General Discussion

We have presented a unified discrimination-driven modelfor visual and auditory word comprehension and wordproduction the discriminative lexicon The layout of thismodel is visualized in Figure 11 Input (cochlea and retina)and output (typing and speaking) systems are presented inlight gray the representations of themodel are shown in darkgray For auditory comprehension we make use of frequencyband summary (FBS) features low-level cues that we deriveautomatically from the speech signal The FBS features serveas input to the auditory network F119886 that maps vectors of FBSfeatures onto the semantic vectors of S For reading lettertrigrams represent the visual input at a level of granularitythat is optimal for a functional model of the lexicon (see [97]for a discussion of lower-level visual features) Trigrams arerelatively high-level features compared to the FBS featuresFor features implementing visual input at a much lower levelof visual granularity using histograms of oriented gradients[146] see Linke et al [15] Letter trigrams are mapped by thevisual network F119900 onto S

For reading we also implemented a network heredenoted by K119886 that maps trigrams onto auditory targets(auditory verbal images) represented by triphones Theseauditory triphones in turn are mapped by network H119886onto the semantic vectors S This network is motivated notonly by our observation that for predicting visual lexicaldecision latencies triphones outperform trigrams by a widemargin Network H119886 is also part of the control loop forspeech production see Hickok [147] for discussion of thiscontrol loop and the necessity of auditory targets in speechproduction Furthermore the acoustic durations with whichEnglish speakers realize the stem vowels of English verbs[148] as well as the duration of word final s in English [149]

Complexity 27

Table 8 Example sentences for the reading of proper names To keep the example simple inflectional lexomes are not taken into account

John Clark wrote great books about antsJohn Clark published a great book about dangerous antsJohn Welsch makes great photographs of airplanesJohn Welsch has a collection of great photographs of airplanes landingJohn Wiggam will build great harpsichordsJohn Wiggam will be a great expert in tuning harpsichordsAnne Hastie has taught statistics to great second year studentsAnne Hastie has taught probability to great first year studentsJanet Clark teaches graph theory to second year studentsJohnClark write great book about antJohnClark publish a great book about dangerous antJohnWelsch make great photograph of airplaneJohnWelsch have a collection of great photograph airplane landingJohnWiggam future build great harpsichordJohnWiggam future be a great expert in tuning harpsichordAnneHastie have teach statistics to great second year studentAnneHastie have teach probability to great first year studentJanetClark teach graphtheory to second year student

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

John

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

John Welsch

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

Janet

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

Clark

Pear

son

corr

elat

ion

minus0

50

00

51

0

Figure 10 Correlations of estimated semantic vectors for John JohnWelsch Janet andClarkwith the targeted semantic vectors of JohnClarkJohnWelsch JohnWiggam AnneHastie and JaneClark

can be predicted from ndl networks using auditory targets todiscriminate between lexomes [150]

Speech production is modeled by network G119886 whichmaps semantic vectors onto auditory targets which in turnserve as input for articulation Although current modelsof speech production assemble articulatory targets fromphonemes (see eg [126 147]) a possibility that is certainlycompatible with our model when phones are recontextual-ized as triphones we think that it is worthwhile investigatingwhether a network mapping semantic vectors onto vectors ofarticulatory parameters that in turn drive the articulators canbe made to work especially since many reduced forms arenot straightforwardly derivable from the phone sequences oftheir lsquocanonicalrsquo forms [111 144]

We have shown that the networks of our model havereasonable to high production and recognition accuraciesand also generate a wealth of well-supported predictions forlexical processing Central to all networks is the hypothesisthat the relation between form and meaning can be modeleddiscriminatively with large but otherwise surprisingly simplelinear networks the underlyingmathematics of which (linearalgebra) is well-understood Given the present results it isclear that the potential of linear algebra for understandingthe lexicon and lexical processing has thus far been severelyunderestimated (the model as outlined in Figure 11 is amodular one for reasons of computational convenience andinterpretabilityThe different networks probably interact (seefor some evidence [47]) One such interaction is explored

28 Complexity

typing speaking

orthographic targets

trigrams

auditory targets

triphones

semantic vectors

TASA

auditory cues

FBS features

orthographic cues

trigrams

cochlea retina

To Ta

Ha

GoGa

KaS

Fa Fo

CoCa

Figure 11 Overview of the discriminative lexicon Input and output systems are presented in light gray the representations of the model areshown in dark gray Each of these representations is a matrix with its own set of features Arcs are labeled with the discriminative networks(transformationmatrices) thatmapmatrices onto each otherThe arc from semantic vectors to orthographic vectors is in gray as thismappingis not explored in the present study

in Baayen et al [85] In their production model for Latininflection feedback from the projected triphone paths backto the semantics (synthesis by analysis) is allowed so thatof otherwise equally well-supported paths the path that bestexpresses the targeted semantics is selected For informationflows between systems in speech production see Hickok[147]) In what follows we touch upon a series of issues thatare relevant for evaluating our model

Incremental learning In the present study we esti-mated the weights on the connections of these networkswith matrix operations but importantly these weights canalso be estimated incrementally using the learning rule ofWidrow and Hoff [44] further improvements in accuracyare expected when using the Kalman filter [151] As allmatrices can be updated incrementally (and this holds aswell for the matrix with semantic vectors which are alsotime-variant) and in theory should be updated incrementally

whenever information about the order of learning events isavailable the present theory has potential for studying lexicalacquisition and the continuing development of the lexiconover the lifetime [79]

Morphology without compositional operations Wehave shown that our networks are predictive for a rangeof experimental findings Important from the perspective ofdiscriminative linguistics is that there are no compositionaloperations in the sense of Frege and Russell [1 2 152] Thework on compositionality in logic has deeply influencedformal linguistics (eg [3 153]) and has led to the beliefthat the ldquoarchitecture of the language facultyrdquo is groundedin a homomorphism between a calculus (or algebra) ofsyntactic representations and a calculus (or algebra) based onsemantic primitives Within this tradition compositionalityarises when rules combining representations of form arematched with rules combining representations of meaning

Complexity 29

The approach of Marelli and Baroni [60] who derive thesemantic vector of a complex word from the semantic vectorof its base and a dedicated affix-specific mapping from thepertinent base words to the corresponding derived wordsis in the spirit of compositionality in that a change in form(through affixation) goes hand in hand with a change inmeaning (modeled with a linear mapping in semantic space)

The perspective on morphological processing developedin the present study breaks with this tradition Althoughsemantic vectors for inflected words are obtained by summa-tion of the semantic vectors of the base word and those ofits inflectional functions (see [85] for modeling of the richverbal paradigms of Latin) the form vectors correspondingto the resulting semantic vectors are not obtained by thesummation of vectors for chunks of form nor by any otheroperations on forms Attempts to align chunks of wordsrsquo formwith their semantic structures mdash typically represented bygraphs constructed over semantic primitives (see eg [154])mdash are destined to end in quagmires of arbitrary decisionsabout the number of inflectional classes required (Estonian30 or 300) what the chunks should look like (does Germanhave a morpheme d in its articles and demonstratives derdie das diese ) whether Semitic morphology requiresoperations on form on one or two tiers [155ndash157] andhow many affixal position slots are required for Athebascanlanguages such asNavajo [158] A language that is particularlypainful for compositional theories is Yelı Dnye an Indo-Pacific language spoken on Rossel island (in the south-east ofPapuaNewGuinea)The language has a substantial inventoryof over a 1000 function words used for verb agreementTypically these words are monosyllables that simultaneouslyspecify values for negation tense aspect person and numberof subject deixis evidentiality associated motion and coun-terfactuality Furthermore verbs typically have no less thaneight different stems the use of which depends on particularconstellations of inflectional functions [159 160] Yelı Dnyethus provides a striking counterexample to the hypothesisthat a calculus of meaning would be paralleled by a calculusof form

Whereas the theory of linear discriminative learningradically rejects the idea that morphological processing isgrounded in compositionality as defined in mainstream for-mal linguistic theory it does allow for the conjoint expressionin form of lexical meanings and inflectional functions andfor forms to map onto such conjoint semantic vectors Butjust asmultiple inflectional functions can be expressed jointlyin a single word multiple words can map onto nonconjointsemantic vectors as was illustrated above for named entityrecognition John Wiggam can be understood to refer to avery specific person without this person being a composi-tional function of the proper name John and the surnameWiggam In both these cases there is no isomorphy betweeninflectional functions and lexical meanings on the one handand chunks of form on the other hand

Of course conjoint vectors for lexical meanings andinflectional functions can be implemented andmade to workonly because a Latin verb form such as sapıvissemus isunderstood when building the semantic space S to expresslexomes for person (first) number (plural) voice (active)

tense (pluperfect) mood (subjunctive) and lexical meaning(to know) (see [85] for computational modeling details)Although this could loosely be described as a decomposi-tional approach to inflected words we prefer not to use theterm lsquodecompositionalrsquo as in formal linguistics this termas outlined above has a very different meaning A moreadequate terminology would be that the system is conjointrealizational for production and conjoint inferential for com-prehension (we note here that we have focused on temporallyconjoint realization and inference leaving temporally disjointrealization and inference for sequences of words and possiblycompounds and fixed expression for further research)

Naive versus linear discriminative learning There isone important technical difference between the learningengine of the current study ldl and naive discriminativelearning (ndl) Because ndl makes the simplifying assump-tion that outcomes are orthogonal the weights from the cuesto a given outcome are independent of the weights from thecues to the other outcomes This independence assumptionwhich motivates the word naive in naive discriminativelearning makes it possible to very efficiently update networkweights during incremental learning In ldl by contrast thisindependence no longer holds learning is no longer naiveWe therefore refer to our networks not as ndl networksbut simply as linear discriminative learning networks Actualincremental updating of ldl networks is computationallymore costly but further numerical optimization is possibleand implementations are underway

Scaling An important property of the present approachis that good results are obtained already for datasets ofmodestsize Our semantic matrix is derived from the tasa corpuswhich with only 10 million words is dwarfed by the 28 billionwords of the ukWaC corpus used by Marelli and Baroni [60]more words than any individual speaker can ever encounterin their lifetime Similarly Arnold et al [18] report goodresults for word identification when models are trained on20 hours of speech which contrasts with the huge volumesof speech required for training deep learning networks forspeech recognition Although performance increases some-whatwithmore training data [161] it is clear that considerableheadway can be made with relatively moderate volumesof speech It thus becomes feasible to train models onfor instance Australian English and New Zealand Englishusing already existing speech corpora and to implementnetworks that make precise quantitative predictions for howunderstanding is mediated by listenersrsquo expectations abouttheir interlocutors (see eg [162])

In this study we have obtained good results with asemantic matrix with a dimensionality of around 4000lexomes By itself we find it surprising that a semantic vectorspace can be scaffolded with a mere 4000 words But thisfindingmay also shed light on the phenomenon of childhoodamnesia [163 164] Bauer and Larkina [165] pointed outthat young children have autobiographical memories thatthey may not remember a few years later They argue thatthe rate of forgetting diminishes as we grow older andstabilizes in adulthood Part of this process of forgettingmay relate to the changes in the semantic matrix Cruciallywe expect the correlational structure of lexomes to change

30 Complexity

considerably over time as experience accumulates If this isindeed the case the semantic vectors for the lexomes foryoung children are different from the corresponding lexomesfor older children which will again differ from those ofadults As a consequence autobiographical memories thatwere anchored to the lexomes at a young age can no longerbe accessed as the semantic vectors to which these memorieswere originally anchored no longer exist

No examplars Given how we estimate the transforma-tion matrices with which we navigate between form andmeaning it might seem that our approach is in essence animplementation of an exemplar-based theory of the mentallexicon After all for instance the C119886 matrix specifies foreach auditory word form (exemplar) its pertinent frequencyband summary features However the computational imple-mentation of the present study should not be confused withthe underlying algorithmic conceptualization The underly-ing conceptualization is that learning proceeds incrementallytrial by trial with the weights of transformation networksbeing updated with the learning rule of Widrow and Hoff[44] In the mean trial-by-trial updating with the Widrow-Hoff learning rule results in the same expected connectionweights as obtained by the present estimates using thegeneralized inverse The important conceptual advantage ofincremental learning is however that there is no need to storeexemplars By way of example for auditory comprehensionacoustic tokens are given with the input to the learningsystemThese tokens the ephemeral result of interactingwiththe environment leave their traces in the transformationmatrix F119886 but are not themselves stored The same holdsfor speech production We assume that speakers dynamicallyconstruct the semantic vectors from their past experiencesrather than retrieving these semantic vectors from somedictionary-like fixed representation familiar from currentfile systems The dynamic nature of these semantic vectorsemerged loud and clear from the modeling of novel complexwords for speech production under cross-validation In otherwords our hypothesis is that all input and output vectors areephemeral in the sense that they represent temporary statesof the cognitive system that are not themselves represented inthat system but that leave their traces in the transformationmatrices that jointly characterize and define the cognitivesystem that we approximate with the discriminative lexicon

This dynamic perspective on the mental lexicon asconsisting of several coupled networks that jointly bringthe cognitive system into states that in turn feed into fur-ther networks (not modeled here) for sentence integration(comprehension) and articulation (production) raises thequestion about the status of units in this theory Units playan important role when constructing numeric vectors forform and meaning Units for letter trigrams and phonetrigrams are of course context-enriched letter and phoneunits Content lexomes as well as lexomes for derivationaland inflectional functions are crutches for central semanticfunctions ranging from functions realizing relatively concreteonomasiological functions (lsquothis is a dog not a catrsquo) toabstract situational functions allowing entities events andstates to be specified on the dimensions of time spacequantity aspect etc However crucial to the central thrust of

the present study there are no morphological symbols (unitscombining form andmeaning) normorphological formunitssuch as stems and exponents of any kind in the modelAbove we have outlined the many linguistic considerationsthat have led us not to want to incorporate such units aspart of our theory Importantly as there are no hidden layersin our model it is not possible to seek for lsquosubsymbolicrsquoreflexes of morphological units in patterns of activation overhidden layers Here we depart from earlier connectionistmodels which rejected symbolic representations but retainedthe belief that there should be morphological representationalbeit subsymbolic ones Seidenberg and Gonnerman [166]for instance discuss morphological representations as beinglsquointerlevel representationsrsquo that are emergent reflections ofcorrelations among orthography phonology and semanticsThis is not to say that themore agglutinative a language is themore the present model will seem to the outside observer tobe operating withmorphemes But themore a language tendstowards fusional or polysyntheticmorphology the less strongthis impression will be

Model complexity Next consider the question of howdifficult lexical processing actually is Sidestepping the com-plexities of the neural embedding of lexical processing [126147] here we narrow down this question to algorithmiccomplexity State-of-the-art models in psychology [37 114167] implementmany layers of hierarchically organized unitsand many hold it for an established fact that such units arein some sense psychologically real (see eg [19 91 168])However empirical results can be equally well understoodwithout requiring the theoretical construct of the morpheme(see eg [47 54 58 166 169 170]) The present studyillustrates this point for lsquoboundaryrsquo effects in written andoral production When paths in the triphone graph branchuncertainty increases and processing slows down Althoughobserved lsquoboundaryrsquo effects are compatible with a post-Bloomfieldian lexicon the very same effects arise in ourmodel even though morphemes do not figure in any way inour computations

Importantly the simplicity of the linear mappings in ourmodel of the discriminative lexicon allows us to sidestep theproblem that typically arise in computationally implementedfull-scale hierarchical systems namely that errors arisingat lower levels propagate to higher levels The major stepsforward made in recent end-to-end models in languageengineering suggest that end-to-end cognitive models arealso likely to perform much better One might argue thatthe hidden layers of deep learning networks represent thetraditional lsquohidden layersrsquo of phonemes morphemes andword forms mediating between form and meaning in lin-guistic models and offshoots thereof in psychology Howeverthe case of baboon lexical learning [12 13] illustrates thatthis research strategy is not without risk and that simplerdiscriminative networks can substantially outperform deeplearning networks [15] We grant however that it is conceiv-able that model predictions will improve when the presentlinear networks are replaced by deep learning networks whenpresented with the same input and output representationsused here (see Zhao et al [171] for a discussion of lossfunctions for deep learning targeting numeric instead of

Complexity 31

categorical output vectors) What is unclear is whethersuch networks will improve our understandingmdashthe currentlinear networks offer the analyst connection weights betweeninput and output representations that are straightforwardlylinguistically interpretable Where we think deep learningnetworks really will come into their own is bridging the gapbetween for instance the cochlea and retina on the one handand the heuristic representations (letter trigrams) that wehave used as a starting point for our functional theory of themappings between form and meaning

Network flexibility An important challenge for deeplearning networks designed to model human lexical process-ing is to keep the networks flexible and open to rapid updat-ing This openness to continuous learning is required notonly by findings on categorization [40 41 41 42] includingphonetic categorization [172 173] but is also visible in timeseries of reaction times Baayen and Hendrix [17] reportedimproved prediction of visual lexical decision reaction timesin the British Lexicon Project when a naive discriminativelearning network is updated from trial to trial simulating thewithin-experiment learning that goes on as subjects proceedthrough the experiment In natural speech the phenomenonof phonetic convergence between interlocutors [174] likewisebears witness to a lexicon that is constantly recalibratedin interaction with its environment (for the many ways inwhich interlocutorsrsquo speech aligns see [175]) Likewise therapidity with which listeners adapt to foreign accents [176]within a few minutes shows that lexical networks exhibitfast local optimization As deep learning networks typicallyrequire huge numbers of cycles through the training dataonce trained they are at risk of not having the requiredflexibility for fast local adaptation This risk is reduced for(incremental) wide learning as illustrated by the phoneticconvergence evidenced by the model of Arnold et al [18] forauditory comprehension (a very similar point was made byLevelt [177] in his critique of connectionist models in thenineties)

Open questions This initial study necessarily leavesmany questions unanswered Compounds and inflectedderived words have not been addressed and morphologicaloperations such as reduplication infixation and nonconcate-native morphology provide further testing grounds for thepresent approach We also need to address the processingof compounds with multiple constituents ubiquitous inlanguages as different as German Mandarin and EstonianFurthermore in actual utterances words undergo assimila-tion at their boundaries and hence a crucial test case forthe discriminative lexicon is to model the production andcomprehension of words in multi-word utterances For ourproduction model we have also completely ignored stresssyllabification and pitch which however likely has renderedthe production task more difficult than necessary

A challenge for discriminative morphology is the devel-opment of proper semantic matrices For languages withrich morphology such as Estonian current morphologicalparsers will identify case-inflected words as nominativesgenitives or partitives (among others) but these labelsfor inflectional forms do not correspond to the semanticfunctions encoded in these forms (see eg [178 179]) What

is required are computational tools that detect the appropriateinflectional lexomes for these words in the sentences orutterances in which they are embedded

A related problem specifically for the speech productionmodel is how to predict the strongly reduced word formsthat are rampant in conversational speech [111] Our auditorycomprehension network F119886 is confronted with reduced formsin training and the study of Arnold et al [18] indicates thatpresent identification accuracymay approximate human per-formance for single-word recognition Whereas for auditorycomprehension we are making progress what is required forspeech production is a much more detailed understanding ofthe circumstances under which speakers actually use reducedword forms The factors driving reduction may be in partcaptured by the semantic vectors but we anticipate thataspects of the unfolding discourse play an important role aswell which brings us back to the unresolved question of howto best model not isolated words but words in context

An important challenge for our theory which formalizesincremental implicit learning is how to address and modelthe wealth of explicit knowledge that speakers have abouttheir language We play with the way words sound in poetrywe are perplexed by wordsrsquo meanings and reinterpret themso that they make more sense (eg the folk etymology [180]reflected in modern English crawfish a crustacean and nota fish originally Middle English crevis compare ecrevissein French) we teach morphology in schools and we evenfind that making morphological relations explicit may bebeneficial for aphasic patients [181] The rich culture of worduse cannot but interact with the implicit system but howexactly this interaction unfolds andwhat its consequences areat both sides of the consciousunconscious divide is presentlyunclear

Although many questions remain our model is remark-ably successful in capturing many aspects of lexical andmorphological processing in both comprehension and pro-duction Since themodel builds on a combination of linguisti-callymotivated lsquosmartrsquo low-level representations for form andmeaning and large but otherwise straightforward andmathe-matically well-understood two-layer linear networks (to beclear networks without any hidden layers) the conclusionseems justified that algorithmically the ldquomental lexiconrdquomaybe much simpler than previously thought

Appendix

A Vectors Matrices and MatrixMultiplication

Figure 12 presents two data points 119886 and 119887 with coordinates(3 4) and (-2 -3) Arrows drawn from the origin to thesepoints are shown in blue with a solid and dashed linerespectively We refer to these points as vectors Vectors aredenoted with lower case letters in bold and we place the 119909coordinate of a point next to the 119910 coordinate

a = (3 4) b = (minus2 minus3) (A1)

32 Complexity

minus4 minus2 0 2 4

minus4

minus2

02

4

x

y

a

b

x

y

Figure 12 Points 119886 and 119887 can be transformed into points 119901 and 119902 bya linear transformation

We can represent a and b jointly by placing them next to eachother in a matrix For matrices we use capital letters in boldfont

A = ( 3 4minus2 minus3) (A2)

In addition to the points 119886 and 119887 Figure 12 also shows twoother datapoints 119909 and 119910 The matrix for these two points is

B = (minus5 24 minus1) (A3)

Let us assume that points 119886 and 119887 represent the forms oftwo words 1205961 and 1205962 and that the points 119909 and 119910 representthe meanings of these two words (below we discuss in detailhow exactly wordsrsquo forms and meanings can be representedas vectors of numbers) As morphology is the study of therelation between wordsrsquo forms and their meanings we areinterested in how to transform points 119886 and b into points119909 and 119910 and likewise in how to transform points 119909 and119910 back into points 119886 and 119887 The first transformation isrequired for comprehension and the second transformationfor production

We begin with the transformation for comprehensionFormally we have matrix A which we want to transforminto matrix B A way of mapping A onto B is by means ofa linear transformation using matrix multiplication For 2 times2 matrices matrix multiplication is defined as follows(1198861 11988621198871 1198872) (1199091 11990921199101 1199102) = (11988611199091 + 11988621199101 11988611199092 + 1198862119910211988711199091 + 11988721199101 11988711199092 + 11988721199102) (A4)

Such a mapping of A onto B is given by a matrix F

F = ( 1 2minus2 minus1) (A5)

and it is straightforward to verify that indeed( 3 4minus2 minus3) ( 1 2minus2 minus1) = (minus5 24 minus1) (A6)

Using matrix notation we can write

AF = B (A7)

Given A and B how do we obtain F The answeris straightforward but we need two further concepts theidentity matrix and the matrix inverse For multiplication ofnumbers the identity multiplication is multiplication with 1For matrices multiplication with the identity matrix I leavesthe locations of the points unchangedThe identitymatrix hasones on the main diagonal and zeros elsewhere It is easilyverified that indeed(1 00 1) (1199091 11990921199101 1199102) = (1199091 11990921199101 1199102) (1 00 1) = (1199091 11990921199101 1199102) (A8)

For numbers the inverse of multiplication with 119904 is dividingby 119904 (1119904 ) (119904119909) = 119909 (A9)

For matrices the inverse of a square matrix X is that matrixXminus1 such that their product is the identity matrix

Xminus1X = XXminus1 = I (A10)

For nonsquare matrices the inverse Yminus1 is defined such that

Yminus1 (YX) = (XY) 119884minus1 = X (A11)

We find the square matrix F that maps the square matrices Aonto B as follows

AF = B

Aminus1AF = Aminus1B

IF = Aminus1B

F = Aminus1B(A12)

Since for the present example A and its inverse happen to beidentical (Aminus1 = A) we obtain (A6)

Fmaps wordsrsquo form vectors onto wordsrsquo semantic vectorsLet us now consider the reverse and see how we can obtainwordsrsquo form vectors from wordsrsquo semantic vectors That isgiven B we want to find that matrix G which maps B ontoA ie

BG = A (A13)

Solving this equation proceeds exactly as above

BG = A

Bminus1BG = Bminus1A

IG = Bminus1A

G = Bminus1A(A14)

Complexity 33

The inverse of B is

Bminus1 = (13 2343 53) (A15)

and hence we have

(13 2343 53) ( 3 4minus2 minus3) = (minus13 minus2323 13 ) = G (A16)

The inverse of a matrix needs not exist A square matrixthat does not have an inverse is referred to as a singularmatrix Almost all matrices with which we work in theremainder of this study are singular For singular matricesan approximation of the inverse can be used such as theMoore-Penrose generalized inverse (In R the generalizedinverse is available in theMASS package function ginv Thenumeric library that is used by ginv is LAPACK availableat httpwwwnetliborglapack) In this study wedenote the generalized inverse of a matrix X by X1015840

The examples presented here are restricted to 2 times 2matrices but themathematics generalize tomatrices of largerdimensions When multiplying two matrices X and Y theonly constraint is that the number of columns ofX is the sameas the number of rows ofYThe resultingmatrix has the samenumber of rows as X and the same number of columns as Y

In the present study the rows of matrices represent wordsand columns the features of thesewords For instance the rowvectors of A can represent wordsrsquo form features and the rowvectors of and B the corresponding semantic features Thefirst row vector of A is mapped onto the first row vector ofB as follows (3 4) ( 1 2minus2 minus1) = (minus5 2) (A17)

A transformationmatrix such as F can be represented as atwo-layer networkThe network corresponding to F is shownin Figure 13 When the input vector (3 4) is presented to thenetwork the nodes 1198941 and 1198942 are set to 3 and 4 respectively Toobtain the activations 1199001 = minus5 and 1199002 = 2 for the output vector(minus5 2) we cumulate the evidence at the input weighted bythe connection strengths specified on the edges of the graphThus for 1199001 we obtain the value 3 times 1 + 4 times minus2 = minus5 and for1199002 we have 3 times 2 + 4 times minus1 = 2 In other words the networkmaps the input vector (3 4) onto the output vector (minus5 2)

An important property of linear maps is that they areproductive We can present a novel form vector say

s = (2 2) (A18)

to the network and it will map this vector onto a novelsemantic vector Using matrices this new semantic vector isstraightforwardly calculated(2 2) ( 1 2minus2 minus1) = (minus2 2) (A19)

1 2 minus2 minus1

i1 i2

o1 o2

semantic vectors come out here

form features go in here

connection weightsare the transformationmatrix

Figure 13 The linear network corresponding to the transformationmatrix F There are two input nodes 1198941 and 1198942 (in blue)and twooutput nodes 1199001 and 1199002 in red When the input vector (3 4) ispresented to the network the value at 1199001 is 3 times 1 + 4 times minus2 = minus5and that at 1199002 is 3 times 2 + 4 times minus1 = 2 The network maps the point(3 4) onto the point (minus5 2)B Graph-Based Triphone Sequencing

The hypothesis underlying graph-based triphone sequencingis that in the directed graph that has triphones as verticesand an edge between any two vertices that overlap (iesegments 2 and 3 of triphone 1 are identical to segments 1and 2 of triphone 2) the path from a wordrsquos initial triphone(the triphone starting with ) to a wordrsquos final triphone (thetriphone ending with a ) is the path receiving the bestsupport of all possible paths from the initial to the finaltriphone Unfortunately the directed graph containing allvertices and edges is too large tomake computations tractablein a reasonable amount of computation time The followingalgorithm is a heuristic algorithm that first collects potentiallyrelevant vertices and edges and then calculates all pathsstarting fromany initial vertex using theall simple pathsfunction from the igraph package [127] As a first step allvertices that are supported by the stem and whole wordwith network support exceeding 01 are selected To theresulting set those vertices are added that are supported bythe affix which is conditional on these vertices having anetwork support exceeding 095 A directed graph can nowbe constructed and for each initial vertex all paths startingat these vertices are calculated The subset of those pathsreaching a final vertex is selected and the support for eachpath is calculated As longer paths trivially will have greatersupport the support for a path is weighted for its lengthsimply by dividing the raw support by the path length Pathsare ranked by weighted path support and the path withmaximal support is selected as the acoustic image drivingarticulation

It is possible that no path from an initial vertex to a finalvertex is found due to critical boundary triphones not beinginstantiated in the data on which the model was trainedThis happens when the model is assessed on novel inflected

34 Complexity

nobo

dyso

met

hing

ever

ythi

ngno

thin

gev

eryo

nean

ythi

ngan

yone

som

eone

hers

him his

via

than I

exce

ptou

tside

thei

rsw

hoev

erth

ence

min

ein

side

amon

gst its

thei

rou

rm

yyo

uyo

ur her

she

your

selv

esw

hate

ver us we it

they he

them m

em

yself

him

self

hers

elfou

rselv

esyo

urse

lfits

elfth

emse

lves

abou

tw

ithou

tw

ithin

beyo

ndam

ong

betw

een

thro

ugho

utdu

ring

since by

from in of

with at fo

r tobe

low

abov

e up offdo

wn

upon on

agai

nst

tow

ard

tow

ards

arou

ndne

aral

ong

unde

rbe

hind

besid

eth

roug

hon

toin

toov

erac

ross

nobodysomethingeverythingnothingeveryoneanythinganyonesomeonehershimhisviathanIexceptoutsidetheirswhoeverthencemineinsideamongstitstheirourmyyouyourhersheyourselveswhateverusweittheyhethemmemyselfhimselfherselfourselvesyourselfitselfthemselvesaboutwithoutwithinbeyondamongbetweenthroughoutduringsincebyfrominofwithatfortobelowaboveupoffdownupononagainsttowardtowardsaroundnearalongunderbehindbesidethroughontointooveracross

minus01 0 01 02Value

Color Keyand Histogram

050

015

0025

00C

ount

Figure 14 Heatmap for the correlations of the semantic vectors of pronouns and prepositions Note that quantifier pronouns form a groupof their own that prepositions and the other pronouns form distinguisable groups and that within groups further subclusters are visible egforwe and us she and her and across and over All diagonal elements represent correlations equal to 1 (this is not brought out by the heatmapwhich color-codes diagonal elements with the color for the highest correlation of the range we specified 025)

and derived words under cross-validationTherefore verticesand edges are added to the graph for all nonfinal verticesthat are at the end of any of the paths starting at any initialtriphone Such novel vertices and edges are assigned zeronetwork support Typically paths from the initial vertex tothe final vertex can now be constructed and path support isevaluated as above

For an algorithm that allows triphone paths to includecycles which may be the case in languages with much richer

morphology than English see Baayen et al [85] and for codethe R packageWpmWithLdl described therein

C A Heatmap for Function Words(Pronouns and Prepositions)

The heatmap for the correlations of the semantic vectors ofpronouns and prepositions is presented in Figure 14

Complexity 35

Data Availability

With the exception of the data from theDistributed Little RedHen Lab all primary data are publicly available from the citedsources

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The authors are indebted to Geoff Hollis Jessie Nixon ChrisWestbury and LuisMienhardt for their constructive feedbackon earlier versions of this manuscript This research wassupported by an ERC advancedGrant (no 742545) to the firstauthor

References

[1] G Frege ldquoBegriffsschrift a formula language modeled uponthat of arithmetic for pure thoughtrdquo in From Frege to GodelA Source Book in Mathematical Logic 1879-1931 pp 1ndash82 1967

[2] B RussellAn Inquiry intoMeaning andTruth Allen andUnwinLondon UK 1942

[3] R Montague ldquoThe proper treatment of quantification in ordi-nary englishrdquo in Approaches to Natural Language pp 221ndash242Springer 1973

[4] N Chomsky Syntactic Structures vol 33 Mouton The Hague1957

[5] W J M Levelt Formal Grammars in Linguistics And Psycholin-guistics Volume 1 An Introduction to the Theory of FormalLanguages and Automata Volume 2 Applications in LinguisticTheory Volume 3 Psycholinguistic Applications John BenjaminsPublishing 2008

[6] D F Kleinschmidt and T Florian Jaeger ldquoRobust speechperception Recognize the familiar generalize to the similar andadapt to the novelrdquo Psychological Review vol 122 no 2 pp 148ndash203 2015

[7] L Breiman ldquoRandom forestsrdquo Machine Learning vol 45 no 1pp 5ndash32 2001

[8] C Strobl A-L Boulesteix A Zeileis and T Hothorn ldquoBiasin random forest variable importance measures illustrationssources and a solutionrdquo BMC Bioinformatics vol 8 2007

[9] A Hannun C Case J Casper et al ldquoDeep speechScaling up end-to-end speech recognitionrdquo 2014httpsarxivorgabs14125567

[10] Z Tuske P Golik R Schluter and H Ney ldquoAcoustic modelingwith deep neural networks using raw time signal for lvcsrrdquo inProceedings of the INTERSPEECH pp 890ndash894 2014

[11] J Schmidhuber ldquoDeep learning in neural networks anoverviewrdquo Neural Networks vol 61 pp 85ndash117 2015

[12] T Hannagan J C Ziegler S Dufau J Fagot and J GraingerldquoDeep learning of orthographic representations in baboonsrdquoPLoS ONE vol 9 no 1 2014

[13] J Grainger S Dufau M Montant J C Ziegler and J FagotldquoOrthographic processing in baboons (Papio papio)rdquo Sciencevol 336 no 6078 pp 245ndash248 2012

[14] D Scarf K Boy A U Reinert J Devine O Gunturkun andM Colombo ldquoOrthographic processing in pigeons (columba

livia)rdquo Proceedings of the National Academy of Sciences vol 113no 40 pp 11272ndash11276 2016

[15] M Linke F Broker M Ramscar and R H Baayen ldquoArebaboons learning ldquoorthographicrdquo representations Probablynotrdquo PLoS ONE vol 12 no 8 2017

[16] D H Hubel and T N Wiesel ldquoReceptive fields binocularinteraction and functional architecture in the catrsquos visualcortexrdquo The Journal of Physiology vol 160 pp 106ndash154 1962

[17] R H Baayen and P Hendrix ldquoTwo-layer networks non-linear separation and human learningrdquo in From Semantics toDialectometry Festschrift in honor of JohnNerbonne Tributes 32M Wieling M Kroon G Van Noord and G Bouma Eds pp13ndash22 College Publications 2017

[18] D Arnold F Tomaschek K Sering F Lopez and R H BaayenldquoWords from spontaneous conversational speech can be rec-ognized with human-like accuracy by an error-driven learningalgorithm that discriminates between meanings straight fromsmart acoustic features bypassing the phoneme as recognitionunitrdquo PLoS ONE vol 12 no 4 2017

[19] P Zwitserlood ldquoProcessing and representation of morphologi-cal complexity in native language comprehension and produc-tionrdquo in The Construction of Words G E Booij Ed pp 583ndash602 Springer 2018

[20] G Booij ldquoConstruction morphologyrdquo Language and LinguisticsCompass vol 4 no 7 pp 543ndash555 2010

[21] J P Blevins ldquoWord-based morphologyrdquo Journal of Linguisticsvol 42 no 3 pp 531ndash573 2006

[22] G Stump Inflectional Morphology A Theory of ParadigmStructure Cambridge University Press 2001

[23] R Beard ldquoOn the extent and nature of irregularity in thelexiconrdquo Lingua vol 42 no 4 pp 305ndash341 1977

[24] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1974

[25] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1991

[26] C Hockett ldquoThe origin of speechrdquo Scientific American vol 203pp 89ndash97 1960

[27] M Erelt Estonian Language M Erelt Ed Estonian AcademyPublishers Tallinn Estonia 2003

[28] P H Matthews Grammatical Theory in the United States fromBloomfield to Chomsky Cambridge University Press Cam-bridge 1993

[29] N Chomsky andMHalleTheSound Pattern of English Harperand Row NY USA 1968

[30] T Landauer and S Dumais ldquoA solution to Platorsquos problem Thelatent semantic analysis theory of acquisition induction andrepresentation of knowledgerdquo Psychological Review vol 104 no2 pp 211ndash240 1997

[31] W Weaver ldquoTranslationrdquo in Machine Translation of LanguagesFourteen Essays W N Locke and A D Booth Eds pp 15ndash23MIT Press Cambridge 1955

[32] J R Firth Selected Papers of J R Firth 1952-59 IndianaUniversity Press 1968

[33] K Lund and C Burgess ldquoProducing high-dimensional seman-tic spaces from lexical co-occurrencerdquo Behavior Research Meth-ods Instruments and Computers vol 28 no 2 pp 203ndash2081996

[34] C Shaoul and C Westbury ldquoExploring lexical co-occurrencespace using HiDExrdquo Behavior Research Methods vol 42 no 2pp 393ndash413 2010

36 Complexity

[35] T Mikolov I Sutskever K Chen G Corrado and J DeanldquoDistributed representations of words and phrases and theircompositionalityrdquo inAdvances in Neural Information ProcessingSystems pp 3111ndash3119 2013

[36] A Roelofs ldquoThe WEAVER model of word-form encoding inspeech productionrdquo Cognition vol 64 no 3 pp 249ndash284 1997

[37] W J M Levelt A Roelofs and A S Meyer ldquoA theory of lexicalaccess in speech productionrdquoBehavioral and Brain Sciences vol22 no 1 pp 1ndash38 1999

[38] M Taft ldquoInteractive-activation as a Framework for Under-standing Morphological Processingrdquo Language and CognitiveProcesses vol 9 no 3 pp 271ndash294 1994

[39] R Schreuder and R H Baayen ldquoModeling morphologicalprocessingrdquo inMorphological Aspects of Language Processing LB Feldman Ed pp 131ndash154 Lawrence ErlbaumHillsdale NewJersey USA 1995

[40] B C Love D L Medin and T M Gureckis ldquoSUSTAIN ANetwork Model of Category Learningrdquo Psychological Reviewvol 111 no 2 pp 309ndash332 2004

[41] C J Marsolek ldquoWhat antipriming reveals about primingrdquoTrends in Cognitive Sciences vol 12 no 5 pp 176ndash181 2008

[42] M Ramscar and R Port ldquoCategorization (without categories)rdquoin Handbook of Cognitive Linguistics E Dabrowska and DDivjak Eds pp 75ndash99 De Gruyter Berlin Germany 2015

[43] R A Rescorla and A R Wagner ldquoA theory of Pavlovianconditioning Variations in the effectiveness of reinforcementand nonreinforcementrdquo in Classical Conditioning II CurrentResearch and Theory A H Black and W F Prokasy Eds pp64ndash99 Appleton Century Crofts NY USA 1972

[44] B Widrow and M E Hoff ldquoAdaptive switching circuitsrdquo 1960WESCON Convention Record Part IV pp 96ndash104 1960

[45] R H Baayen R Schreuder and R Sproat ldquoMorphology inthe Mental Lexicon A Computational Model for Visual WordRecognitionrdquo in Lexicon Development for Speech and LanguageProcessing F van Eynde and D Gibbon Eds pp 267ndash291Kluwer Academic Publishers 2000

[46] M Taft ldquoA morphological-decomposition model of lexicalrepresentationrdquo Linguistics vol 26 no 4 pp 657ndash667 1988

[47] MWHarmandM S Seidenberg ldquoComputing themeanings ofwords in reading Cooperative division of labor between visualand phonological processesrdquo Psychological Review vol 111 no3 pp 662ndash720 2004

[48] P C Trimmer J M McNamara A Houston and J A RMarshall ldquoDoes natural selection favour the Rescorla-Wagnerrulerdquo Journal of Theoretical Biology vol 302 pp 39ndash52 2012

[49] R R Miller R C Barnet and N J Grahame ldquoAssessment ofthe Rescorla-Wagner modelrdquo Psychological Bulletin vol 117 no3 pp 363ndash386 1995

[50] L J Kamin ldquoPredictability surprise attention and condition-ingrdquo in Punishment and Aversive Behavior B A Campbell andRMChurch Eds pp 276ndash296Appleton-Century-CroftsNYUSA 1969

[51] R A Rescorla ldquoPavlovian conditioning Itrsquos not what you thinkit isrdquo American Psychologist vol 43 no 3 pp 151ndash160 1988

[52] M Ramscar D Yarlett M Dye K Denny and K ThorpeldquoThe Effects of Feature-Label-Order and Their Implications forSymbolic Learningrdquo Cognitive Science vol 34 no 6 pp 909ndash957 2010

[53] M Ramscar and D Yarlett ldquoLinguistic self-correction in theabsence of feedback A new approach to the logical problem oflanguage acquisitionrdquo Cognitive Science vol 31 no 6 pp 927ndash960 2007

[54] R H Baayen P Milin D F ETHurđevic P Hendrix and MMarelli ldquoAn Amorphous Model for Morphological Process-ing in Visual Comprehension Based on Naive DiscriminativeLearningrdquo Psychological Review vol 118 no 3 pp 438ndash4812011

[55] F M Del Prado Martın R Bertram T Haikio R Schreuderand R H Baayen ldquoMorphological family size in a morphologi-cally rich languageThe case of Finnish compared to Dutch andHebrewrdquo Journal of Experimental Psychology LearningMemoryand Cognition vol 30 pp 1271ndash1278 2004

[56] P Milin D F Durdevic and F M Del Prado Martın ldquoThesimultaneous effects of inflectional paradigms and classes onlexical recognition Evidence from Serbianrdquo Journal of Memoryand Language vol 60 no 1 pp 50ndash64 2009

[57] B K Bergen ldquoThe psychological reality of phonaesthemesrdquoLanguage vol 80 no 2 pp 290ndash311 2004

[58] P Milin L B Feldman M Ramscar P Hendrix and R HBaayen ldquoDiscrimination in lexical decisionrdquo PLoS ONE vol 12no 2 2017

[59] R H Baayen P Milin and M Ramscar ldquoFrequency in lexicalprocessingrdquo Aphasiology vol 30 no 11 pp 1174ndash1220 2016

[60] M Marelli and M Baroni ldquoAffixation in semantic space Mod-eling morpheme meanings with compositional distributionalsemanticsrdquo Psychological Review vol 122 no 3 pp 485ndash5152015

[61] T Sering PMilin andRH Baayen ldquoLanguage comprehensionas amultiple label classification problemrdquo StatisticaNeerlandicapp 1ndash15 2018

[62] R Kaye and RWilson Linear Algebra Oxford University Press1998

[63] S H Ivens and B L Koslin Demands for Reading LiteracyRequire New Accountability Methods Touchstone Applied Sci-ence Associates 1991

[64] T K Landauer P W Foltz and D Laham ldquoAn introduction tolatent semantic analysisrdquoDiscourse Processes vol 25 no 2-3 pp259ndash284 1998

[65] H Schmid ldquoImprovements in Part-of-Speech Tagging with anApplication to Germanrdquo in Proceedings of the ACL SIGDAT-Workshop pp 13ndash25 Dublin Ireland 1995

[66] R H Baayen R Piepenbrock and L Gulikers The CELEXLexical Database (CD-ROM) Linguistic Data ConsortiumUniversity of Pennsylvania Philadelphia PA 1995

[67] J Mitchell and M Lapata ldquoVector-based models of semanticcompositionrdquo in Proceedings of the ACL pp 236ndash244 2008

[68] A Lazaridou M Marelli R Zamparelli and M BaronildquoCompositionally derived representations of morphologicallycomplex words in distributional semanticsrdquo in Proceedings ofthe ACL (1) pp 1517ndash1526 2013

[69] R Cotterell H Schutze and J Eisner ldquoMorphological smooth-ing and extrapolation of word embeddingsrdquo in Proceedings ofthe 54th Annual Meeting of the Association for ComputationalLinguistics vol 1 pp 1651ndash1660 2016

[70] B D Zeller S Pado and J Snajder ldquoTowards semantic valida-tion of a derivational lexiconrdquo in Proceedings of the COLING2014 the 25th International Conference on Computational Lin-guistics Technical Papers pp 1728ndash1739 2014

[71] H Marchand The Categories and Types of Present-Day EnglishWord Formation A Synchronic-Diachronic Approach BeckrsquoscheVerlagsbuchhandlung Munchen Germany 1969

[72] L Bauer EnglishWord-Formation Cambridge University PressCambridge 1983

Complexity 37

[73] I PlagWord-Formation in English CambridgeUniversity PressCambridge 2003

[74] R Beard Lexeme-Morpheme Base Morphology A GeneralTheory of Inflection and Word Formation State University ofNew York Press Albany NY USA 1995

[75] K Geeraert J Newman and R H Baayen ldquoIdiom VariationExperimental Data and a Blueprint of a Computational ModelrdquoTopics in Cognitive Science vol 9 no 3 pp 653ndash669 2017

[76] C Shaoul S Bitschau N Schilling et al ldquondl2 Naive discrim-inative learning an implementation in R R packagerdquo 2015

[77] P Milin D Divjak and R H Baayen ldquoA learning perspectiveon individual differences in skilled reading Exploring andexploiting orthographic and semantic discrimination cuesrdquoJournal of Experimental Psychology Learning Memory andCognition vol 43 no 11 pp 1730ndash1751 2017

[78] M Ramscar P Hendrix C Shaoul P Milin and R H BaayenldquoNonlinear dynamics of lifelong learning the myth of cognitivedeclinerdquo Topics in Cognitive Science vol 6 pp 5ndash42 2014

[79] M Ramscar C C Sun P Hendrix and R H Baayen ldquoTheMis-measurement of Mind Life-Span Changes in Paired-Associate-Learning Scores Reflect the ldquoCostrdquo of Learning Not CognitiveDeclinerdquo Psychological Science vol 28 no 8 pp 1171ndash1179 2017

[80] G desRosiers and D Ivison ldquoPaired associate learning Form 1and form 2 of the Wechsler memory scalerdquo Archives of ClinicalNeuropsychology vol 3 no 1 pp 47ndash67 1988

[81] E Bruni N K Tran andM Baroni ldquoMultimodal distributionalsemanticsrdquo Journal of Artificial Intelligence Research vol 49 pp1ndash47 2014

[82] W N Venables and B D RipleyModern Applied Statistics withS-PLUS Springer NY USA 2002

[83] A B Warriner V Kuperman and M Brysbaert ldquoNorms ofvalence arousal and dominance for 13915 English lemmasrdquoBehavior Research Methods vol 45 no 4 pp 1191ndash1207 2013

[84] D Kastovsky ldquoProductivity in word formationrdquo Linguistics vol24 no 3 pp 585ndash600 1986

[85] R H Baayen Y Chuang and J P Blevins ldquoInflectionalmorphology with linear mappingsrdquoTheMental Lexicon vol 13no 2 pp 232ndash270 2018

[86] Burnage CELEX A Guide for Users Centre for Lexical Infor-mation Nijmegen The Netherlands 1988

[87] C McBride-Chang ldquoModels of Speech Perception and Phono-logical Processing in Readingrdquo Child Development vol 67 no4 pp 1836ndash1856 1996

[88] G C Van Orden H Kloos et al ldquoThe question of phonologyand readingrdquo inThe Science of Reading A Handbook pp 61ndash782005

[89] D Jared and K OrsquoDonnell ldquoSkilled adult readers activate themeanings of high-frequency words using phonology Evidencefrom eye trackingrdquoMemoryampCognition vol 45 no 2 pp 334ndash346 2017

[90] G S Dell ldquoA Spreading-Activation Theory of Retrieval inSentence Productionrdquo Psychological Review vol 93 no 3 pp283ndash321 1986

[91] A Marantz ldquoNo escape from morphemes in morphologicalprocessingrdquo Language and Cognitive Processes vol 28 no 7 pp905ndash916 2013

[92] M Bozic W D Marslen-Wilson E A Stamatakis M HDavis and L K Tyler ldquoDifferentiating morphology formand meaning Neural correlates of morphological complexityrdquoCognitive Neuroscience vol 19 no 9 pp 1464ndash1475 2007

[93] R F Port andA P Leary ldquoAgainst formal phonologyrdquoLanguagevol 81 no 4 pp 927ndash964 2005

[94] J P Blevins ldquoStems and paradigmsrdquo Language vol 79 no 4 pp737ndash767 2003

[95] R Forsyth and D Holmes ldquoFeature-finding for test classifica-tionrdquo Literary and Linguistic Computing vol 11 no 4 pp 163ndash174 1996

[96] H Pham and R H Baayen ldquoVietnamese compounds showan anti-frequency effect in visual lexical decisionrdquo LanguageCognition and Neuroscience vol 30 no 9 pp 1077ndash1095 2015

[97] L Cohen and S Dehaene ldquoVentral and dorsal contributions toword readingrdquo inTheCognitive Neurosciences M S GazzanigaEd pp 789ndash804 Massachusetts Institute of Technology 2009

[98] E Keuleers P Lacey K Rastle and M Brysbaert ldquoThe BritishLexicon Project Lexical decision data for 28730 monosyllabicand disyllabic English wordsrdquo Behavior Research Methods vol44 no 1 pp 287ndash304 2012

[99] R K Wagner J K Torgesen and C A Rashotte ldquoDevel-opment of reading-related phonological processing abilitiesnew evidence of bidirectional causality from a latent variablelongitudinal studyrdquoDevelopmental Psychology vol 30 no 1 pp73ndash87 1994

[100] K F EWong andH-C Chen ldquoOrthographic and phonologicalprocessing in reading Chinese text Evidence from eye fixa-tionsrdquo Language and Cognitive Processes vol 14 no 5-6 pp461ndash480 1999

[101] R L Newman D Jared and C A Haigh ldquoDoes phonologyplay a role when skilled readers read high-frequency wordsEvidence from ERPsrdquo Language and Cognitive Processes vol 27no 9 pp 1361ndash1384 2012

[102] D Jared J Ashby S J Agauas and B A Levy ldquoPhonologicalactivation of word meanings in grade 5 readersrdquo Journal ofExperimental Psychology LearningMemory andCognition vol42 no 4 pp 524ndash541 2016

[103] T Bitan A Kaftory A Meiri-Leib Z Eviatar and O PelegldquoPhonological ambiguity modulates resolution of semanticambiguity during reading An fMRI study of Hebrewrdquo Neu-ropsychology vol 31 no 7 pp 759ndash777 2017

[104] D Jared and S Bainbridge ldquoReading homophone puns Evi-dence from eye trackingrdquo Canadian Journal of ExperimentalPsychology vol 71 no 1 pp 2ndash13 2017

[105] S Amenta M Marelli and S Sulpizio ldquoFrom sound tomeaning Phonology-to-Semantics mapping in visual wordrecognitionrdquo Psychonomic Bulletin amp Review vol 24 no 3 pp887ndash893 2017

[106] M Perrone-Bertolotti J Kujala J R Vidal et al ldquoHow silent issilent reading intracerebral evidence for top-down activationof temporal voice areas during readingrdquo The Journal of Neuro-science vol 32 no 49 pp 17554ndash17562 2012

[107] B Yao P Belin andC Scheepers ldquoSilent reading of direct versusindirect speech activates voice-selective areas in the auditorycortexrdquo Journal of Cognitive Neuroscience vol 23 no 10 pp3146ndash3152 2011

[108] M Coltheart B Curtis P Atkins and M Haller ldquoMod-els of Reading Aloud Dual-Route and Parallel-Distributed-Processing Approachesrdquo Psychological Review vol 100 no 4pp 589ndash608 1993

[109] M Coltheart ldquoModeling reading The dual-route approachrdquo inThe Science of Reading A Handbook pp 6ndash23 2005

38 Complexity

[110] M Pitt K Johnson E Hume S Kiesling and W RaymondldquoThe Buckeye corpus of conversational speech Labeling con-ventions and a test of transcriber reliabilityrdquo Speech Communi-cation vol 45 no 1 pp 89ndash95 2005

[111] K Johnson ldquoMassive reduction in conversational AmericanEnglishrdquo in Proceedings of the Spontaneous Speech Data andAnalysis Proceedings of the 1st Session of the 10th InternationalSymposium pp 29ndash54 The National International Institute forJapanese Language Tokyo Japan 2004

[112] C Phillips ldquoLevels of representation in the electrophysiology ofspeech perceptionrdquo Cognitive Science vol 25 no 5 pp 711ndash7312001

[113] R L Diehl A J Lotto and L L Holt ldquoSpeech perceptionrdquoAnnual Review of Psychology vol 55 pp 149ndash179 2004

[114] D Norris and J M McQueen ldquoShortlist B a Bayesian model ofcontinuous speech recognitionrdquo Psychological Review vol 115no 2 pp 357ndash395 2008

[115] S Hawkins ldquoRoles and representations of systematic finephonetic detail in speech understandingrdquo Journal of Phoneticsvol 31 no 3-4 pp 373ndash405 2003

[116] CCucchiarini andH Strik ldquoAutomatic Phonetic TranscriptionAn overviewrdquo in Proceedings of the 15th ICPhS pp 347ndash350Barcelona Spain 2003

[117] K JohnsonTheAuditoryPerceptual Basis for Speech Segmenta-tion Ohio State University Working Papers in Linguistics 1997

[118] H Fletcher ldquoAuditory patternsrdquo Reviews of Modern Physics vol12 no 1 pp 47ndash65 1940

[119] D Arnold ldquoAcousticndlcoder Coding sound files for use withndl R package version 101rdquo 2017

[120] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2016

[121] J Pickett and I Pollack ldquoIntelligibility of excerpts from fluentspeech Effects of rate of utterance and duration of excerptrdquoLanguage and Speech vol 6 pp 151ndash164 1963

[122] L Shockey ldquoPerception of reduced forms by non-native speak-ers of Englishrdquo in Sound Patterns of Spontaneous Speech DDuez Ed pp 97ndash100 ESCA Aix 1998

[123] M Ernestus H Baayen and R Schreuder ldquoThe recognition ofreduced word formsrdquo Brain and Language vol 81 no 1-3 pp162ndash173 2002

[124] R H Baayen C Shaoul J Willits and M Ramscar ldquoCom-prehension without segmentation a proof of concept withnaive discriminative learningrdquo Language Cognition and Neu-roscience vol 31 no 1 pp 106ndash128 2016

[125] C P Browman and L Goldstein ldquoArticulatory phonology Anoverviewrdquo Phonetica vol 49 no 3-4 pp 155ndash180 1992

[126] J A Tourville and F H Guenther ldquoThe DIVA model A neuraltheory of speech acquisition and productionrdquo Language andCognitive Processes vol 26 no 7 pp 952ndash981 2011

[127] G Csardi and T Nepusz ldquoThe igraph software package forcomplex network researchrdquo InterJournal Complex Systems vol1695 no 5 pp 1ndash9 2006

[128] M Ernestus and R H Baayen ldquoPredicting the unpredictableInterpreting neutralized segments in Dutchrdquo Language vol 79no 1 pp 5ndash38 2003

[129] R Weingarten G Nottbusch and U Will ldquoMorphemes sylla-bles and graphemes in written word productionrdquo inMultidisci-plinary Approaches to Speech Production T Pechmann and CHabel Eds pp 529ndash572 Mouton de Gruyter Berlin Germany2004

[130] R Bertram F E Toslashnnessen S Stromqvist J Hyona andP Niemi ldquoCascaded processing in written compound wordproductionrdquo Frontiers in Human Neuroscience vol 9 pp 1ndash102015

[131] T Cho ldquoEffects of morpheme boundaries on intergesturaltiming Evidence from Koreanrdquo Phonetica vol 58 no 3 pp129ndash162 2001

[132] S N Wood Generalized Additive Models Chapman ampHallCRC NY USA 2017

[133] M Seidenberg ldquoSublexical structures in visual word recogni-tion Access units or orthographic redundancyrdquo in Attentionand Performance XII M Coltheart Ed pp 245ndash264 LawrenceErlbaum Associates Hove 1987

[134] J M McQueen ldquoSegmentation of continuous speech usingphonotacticsrdquo Journal of Memory and Language vol 39 no 1pp 21ndash46 1998

[135] J B Hay Causes and Consequences of Word Structure Rout-ledge NY USA London UK 2003

[136] J B Hay ldquoFrom speech perception to morphology Affix-ordering revisitedrdquo Language vol 78 pp 527ndash555 2002

[137] J B Hay and R H Baayen ldquoPhonotactics parsing and produc-tivityrdquo Italian Journal of Linguistics vol 15 no 1 pp 99ndash1302003

[138] J Hay J Pierrehumbert and M Beckman ldquoSpeech perceptionwell-formedness and the statistics of the lexiconrdquo Papers inlaboratory phonology VI pp 58ndash74 2004

[139] M Ferro C Marzi and V Pirrelli ldquoA self-organizing modelof word storage and processing implications for morphologylearningrdquo Lingue e Linguaggio vol 10 no 2 pp 209ndash226 2011

[140] F ChersiM Ferro G Pezzulo andV Pirrelli ldquoTopological self-organization and prediction learning support both action andlexical chains in the brainrdquo Topics in Cognitive Science vol 6no 3 pp 476ndash491 2014

[141] V Pirrelli M Ferro and C Marzi ldquoComputational complexityof abstractive morphologyrdquo in Understanding and MeasuringMorphological Complexity M Bearman D Brown and G GCorbett Eds pp 141ndash166 Oxford University Press 2015

[142] C Marzi M Ferro and V Pirrelli ldquoIs inflectional irregularitydysfunctional to human processingrdquo in Abstract Booklet TheMental Lexicon 2018 V Kuperman Ed University of AlbertaEdmonton 2018

[143] E Keuleers M Stevens P Mandera and M Brysbaert ldquoWordknowledge in the crowd Measuring vocabulary size and wordprevalence in a massive online experimentrdquo The QuarterlyJournal of Experimental Psychology vol 68 no 8 pp 1665ndash16922015

[144] M Ernestus Voice Assimilation And Segment Reduction inCasual Dutch A Corpus-Based Study of the Phonology-PhoneticsInterface LOT Utrecht the Netherlands 2000

[145] R Kemps M Ernestus R Schreuder and H Baayen ldquoProcess-ing reducedword formsThe suffix restoration effectrdquoBrain andLanguage vol 90 no 1-3 pp 117ndash127 2004

[146] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) vol 1 pp 886ndash893 June 2005

[147] G Hickok ldquoThe architecture of speech production and the roleof the phoneme in speech processingrdquo Language Cognition andNeuroscience vol 29 no 1 pp 2ndash20 2014

[148] B V Tucker M Sims and R H Baayen Opposing Forceson Acoustic Duration Manuscript University of Alberta andUniversity of Tubingen 2018

Complexity 39

[149] I Plag J Homann and G Kunter ldquoHomophony and mor-phology The acoustics of word-final S in Englishrdquo Journal ofLinguistics vol 53 no 1 pp 181ndash216 2017

[150] F Tomaschek I Plag M Ernestus and R H BaayenModeling the Duration of Word-Final s in English withNaive Discriminative Learning Manuscript University ofSiegenTubingenNijmegen 2018

[151] R E Kalman ldquoA new approach to linear filtering and predictionproblemsrdquo Journal of Fluids Engineering vol 82 no 1 pp 35ndash451960

[152] B Russell ldquoOn denotingrdquo Mind vol 14 no 56 pp 479ndash4931905

[153] N Hornstein Logical Form FromGB toMinimalism Blackwell1995

[154] R Jackendoff Semantic StructuresMITPress Cambridge 1990[155] J J McCarthy ldquoA prosodic theory of non-concatenative mor-

phologyrdquo Linguistic Inquiry vol 12 pp 373ndash418 1981[156] A Ussishkin ldquoA Fixed Prosodic Theory of Nonconcatenative

Templaticmorphologyrdquo Natural Language amp Linguistic Theoryvol 23 no 1 pp 169ndash218 2005

[157] A Ussishkin ldquoAffix-favored contrast inequity and psycholin-guistic grounding for non-concatenative morphologyrdquo Mor-phology vol 16 no 1 pp 107ndash125 2006

[158] RW Young andWMorganTheNavajo Language A Grammarand Colloquial Dictionary University of New Mexico Press1980

[159] S C Levinson and A Majid ldquoThe island of time Yelı Dnye thelanguage of Rossel islandrdquo Frontiers in psychology vol 4 2013

[160] S C Levinson ldquoThe language of space in Yelı Dnyerdquo inGrammars of Space Explorations in Cognitive Diversity pp 157ndash203 Cambridge University Press 2006

[161] E Shafaei-Bajestan and R H Baayen ldquoWide Learning forAuditory Comprehensionrdquo in Proceedings of the Interspeech2018 2018

[162] J Hay and K Drager ldquoStuffed toys and speech perceptionrdquoLinguistics vol 48 no 4 pp 865ndash892 2010

[163] V Henri and C Henri ldquoOn our earliest recollections ofchildhoodrdquo Psychological Review vol 2 pp 215-216 1895

[164] S Freud ldquoChildhood and concealing memoriesrdquo in The BasicWritings of Sigmund Freud The Modern Library NY USA19051953

[165] P J Bauer and M Larkina ldquoThe onset of childhood amnesiain childhood A prospective investigation of the course anddeterminants of forgetting of early-life eventsrdquoMemory vol 22no 8 pp 907ndash924 2014

[166] M S Seidenberg and L M Gonnerman ldquoExplaining deriva-tional morphology as the convergence of codesrdquo Trends inCognitive Sciences vol 4 no 9 pp 353ndash361 2000

[167] M Coltheart K Rastle C Perry R Langdon and J ZieglerldquoDRC a dual route cascaded model of visual word recognitionand reading aloudrdquoPsychological Review vol 108 no 1 pp 204ndash256 2001

[168] M V Butz and E F Kutter How the Mind Comes into BeingIntroducing Cognitive Science from a Functional and Computa-tional Perspective Oxford University Press 2016

[169] M W Harm and M S Seidenberg ldquoPhonology readingacquisition and dyslexia Insights from connectionist modelsrdquoPsychological Review vol 106 no 3 pp 491ndash528 1999

[170] L M Gonnerman M S Seidenberg and E S AndersenldquoGraded semantic and phonological similarity effects in prim-ing Evidence for a distributed connectionist approach to

morphologyrdquo Journal of Experimental Psychology General vol136 no 2 pp 323ndash345 2007

[171] H Zhao O Gallo I Frosio and J Kautz ldquoLoss Functions forImage Restoration With Neural Networksrdquo IEEE Transactionson Computational Imaging vol 3 no 1 pp 47ndash57 2017

[172] C Clarke and P Luce ldquoPerceptual adaptation to speakercharacteristics Vot boundaries in stop voicing categorizationrdquoin Proceedings of the ISCA Workshop on Plasticity in SpeechPerception 2005

[173] DNorris JMMcQueen andACutler ldquoPerceptual learning inspeechrdquo Cognitive Psychology vol 47 no 2 pp 204ndash238 2003

[174] A Schweitzer and N Lewandowski ldquoSocial factors in conver-gence of f1 and f2 in spontaneous speechrdquo in Proceedings ofthe 10th International Seminar on Speech Production CologneGermany 2014

[175] M J Pickering and S Garrod ldquoToward a mechanistic psychol-ogy of dialoguerdquo Behavioral and Brain Sciences vol 27 no 2 pp169ndash190 2004

[176] C M Clarke and M F Garrett ldquoRapid adaptation to foreign-accented Englishrdquo The Journal of the Acoustical Society ofAmerica vol 116 no 6 pp 3647ndash3658 2004

[177] W J Levelt ldquoDie konnektionistische moderdquo Sprache UndKognition vol 10 no 2 pp 61ndash72 1991

[178] A Kostic ldquoInformational load constraints on processinginflected morphologyrdquo in Morphological Aspects of LanguageProcessing L B Feldman Ed Lawrence Erlbaum Inc Publish-ers New Jersey USA 1995

[179] J P BlevinsWord andParadigmMorphology OxfordUniversityPress 2016

[180] E Forstemann Uber Deutsche volksetymologie Zeitschrift furvergleichende Sprachforschung auf dem Gebiete des DeutschenGriechischen und Lateinischen 1852

[181] K Nault Morphological Therapy Protocal [Ph D thesis] Uni-versity of Alberta Edmonton 2010

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 17: The Discriminative Lexicon: A Unified Computational Model ...downloads.hindawi.com/journals/complexity/2019/4895891.pdf · as developed within computational linguistics and com-puter

Complexity 17

minus0295

minus00175

026

part

ial e

ffect

1 2 3 4 50Total activation diversity

000

005

010

015

Part

ial e

ffect

05

10

15

20

Prio

r

02 04 06 08 1000Selfminuscorrelation

Figure 5 The partial effects of total activation diversity (left) and the interaction of route congruency and prior (right) on RT in the BritishLexicon Project

Table 5 Summary of a generalized additive model fitted to response latencies in visual lexical decision using measures based on ldl s thinplate regression spline smooth te tensor product smooth

A parametric coefficients Estimate Std Error t-value p-valueintercept -17774 00087 -205414 lt0001word typeInflected 01110 00059 18742 lt0001word typeMonomorphemic 00233 00054 4322 lt0001word length 00117 00011 10981 lt0001B smooth terms edf Refdf F p-values(total activation diversity) 6002 7222 236 lt0001te(route congruency prior) 14673 17850 2137 lt0001

02 06 10 14

NDL total activation

minus0171

0031

0233

part

ial e

ffect

minus4 minus2 0 2 4

10

15

20

25

30

35

40

NDL total prior

ND

L to

tal a

ctiv

atio

n di

vers

ity

minus0416

01915

0799

part

ial e

ffect

02 06 10 14

minus4

minus2

02

4

NDL total activation

ND

L to

tal p

rior

minus0158

01635

0485

part

ial e

ffect

10

15

20

25

30

35

40

ND

L to

tal a

ctiv

atio

n di

vers

ity

Figure 6The interaction of total activation and total activation diversity (left) of total prior by total activation diversity (center) and of totalactivation by total prior (right) on RT in the British Lexicon Project based on ndl

18 Complexity

Table 6 Summary of a generalized additive model fitted to response latencies in visual lexical decision using measures from ndl te tensorproduct smooth

A parametric coefficients Estimate Std Error t-value p-valueintercept -16934 00086 -195933 lt0001word typeInflected 00286 00052 5486 lt0001word typeMonomorphemic 00042 00055 0770 = 4word length 00066 00011 5893 lt0001B smooth terms edf Refdf F p-valuete(activation activation diversity prior) 4134 4972 1084 lt0001

than the number of trigram features (3465 trigrams ver-sus 5929 triphones) More features which mathematicallyamount to more predictors enable more precise map-pings Furthermore the heavy use made in English ofletter sequences such as ough (with 10 different pro-nunciations httpswwwdictionarycomesough)reduces semantic discriminability compared to the corre-sponding spoken forms It is noteworthy however that thebenefits of the triphone-to-semantics mapping are possibleonly thanks to the high accuracy with which orthographictrigram vectors are mapped onto phonological triphonevectors (92)

It is unlikely that the lsquophonological routersquo is always dom-inant in silent reading Especially in fast lsquodiagonalrsquo readingthe lsquodirect routersquomay bemore dominantThere is remarkablealthough for the present authors unexpected convergencewith the dual route model for reading aloud of Coltheartet al [108] and Coltheart [109] However while the text-to-phonology route of their model has as primary function toexplain why nonwords can be pronounced our results showthat both routes can actually be active when silently readingreal words A fundamental difference is however that in ourmodel wordsrsquo semantics play a central role

43 Auditory Comprehension For the modeling of readingwe made use of letter trigrams as cues These cues abstractaway from the actual visual patterns that fall on the retinapatterns that are already transformed at the retina beforebeing sent to the visual cortex Our hypothesis is that lettertrigrams represent those high-level cells or cell assemblies inthe visual system that are critical for reading andwe thereforeleave the modeling possibly with deep learning networksof how patterns on the retina are transformed into lettertrigrams for further research

As a consequence of the high level of abstraction of thetrigrams a word form is represented by a unique vectorspecifying which of a fixed set of letter trigrams is present inthe word Although one might consider modeling auditorycomprehension with phone triplets (triphones) replacing theletter trigrams of visual comprehension such an approachwould not do justice to the enormous variability of actualspeech Whereas the modeling of reading printed wordscan depart from the assumption that the pixels of a wordrsquosletters on a computer screen are in a fixed configurationindependently of where the word is shown on the screenthe speech signal of the same word type varies from token

to token as illustrated in Figure 7 for the English wordcrisis A survey of the Buckeye corpus [110] of spontaneousconversations recorded at Columbus Ohio [111] indicatesthat around 5 of the words are spoken with one syllablemissing and that a little over 20 of words have at least onephone missing

It is widely believed that the phoneme as an abstractunit of sound is essential for coming to grips with thehuge variability that characterizes the speech signal [112ndash114]However the phoneme as theoretical linguistic constructis deeply problematic [93] and for many spoken formscanonical phonemes do not do justice to the phonetics ofthe actual sounds [115] Furthermore if words are defined assequences of phones the problem arises what representationsto posit for words with two ormore reduced variants Addingentries for reduced forms to the lexicon turns out not toafford better overal recognition [116] Although exemplarmodels have been put forward to overcome this problem[117] we take a different approach here and following Arnoldet al [18] lay out a discriminative approach to auditorycomprehension

The cues that we make use of to represent the acousticsignal are the Frequency Band Summary Features (FBSFs)introduced by Arnold et al [18] as input cues FBSFs summa-rize the information present in the spectrogram of a speechsignal The algorithm that derives FBSFs first chunks theinput at the minima of the Hilbert amplitude envelope of thesignalrsquos oscillogram (see the upper panel of Figure 8) Foreach chunk the algorithm distinguishes 21 frequency bandsin the MEL scaled spectrum and intensities are discretizedinto 5 levels (lower panel in Figure 8) for small intervalsof time For each chunk and for each frequency band inthese chunks a discrete feature is derived that specifies chunknumber frequency band number and a description of thetemporal variation in the band bringing together minimummaximum median initial and final intensity values The21 frequency bands are inspired by the 21 receptive areason the cochlear membrane that are sensitive to differentranges of frequencies [118] Thus a given FBSF is a proxyfor cell assemblies in the auditory cortex that respond to aparticular pattern of changes over time in spectral intensityThe AcousticNDLCodeR package [119] for R [120] wasemployed to extract the FBSFs from the audio files

We tested ldl on 20 hours of speech sampled from theaudio files of the ucla library broadcast newsscapedata a vast repository of multimodal TV news broadcasts

Complexity 19

0 5000 10000 15000 20000 25000 30000 35000 40000minus6000

minus4000

minus2000

0

2000

4000

6000

8000

0 5000 10000 15000 20000 25000 30000 35000 40000minus6000

minus4000

minus2000

0

2000

4000

6000

8000

Figure 7Oscillogram for twodifferent realizations of theword crisiswith different degrees of reduction in theNewsScape archive [kraız](left)and [khraısıs](right)

00 01 02 03 04 05 06time

minus020minus015minus010minus005

000005010015

ampl

itude

1

21

frequ

ency

ban

d

Figure 8 Oscillogram of the speech signal for a realization of the word economic with Hilbert envelope (in orange) outlining the signal isshown on the top panel The lower panel depicts the discretized MEL scaled spectrum of the signalThe vertical bars are the boundary pointsthat partition the signal into 4 chunks For one chunk horizontal lines (in red) highlight one of the frequency bands for which the FBSFsprovide summaries of the variation over time in that frequency band The FBSF for the highlighted band is ldquoband11-start3-median3-min2-max5-end3-part2rdquo

provided to us by the Distributed Little Red Hen Lab Theaudio files of this resource were automatically classified asclean for relatively clean parts where there is speech withoutbackground noise or music and noisy for speech snippetswhere background noise or music is present Here we reportresults for 20 hours of clean speech to a total of 131673word tokens (representing 4779word types) with in all 40639distinct FBSFs

The FBSFs for the word tokens are brought together ina matrix C119886 with dimensions 131673 audio tokens times 40639FBSFs The targeted semantic vectors are taken from the Smatrix which is expanded to a matrix with 131673 rows onefor each audio token and 4609 columns the dimension of

the semantic vectors Although the transformation matrix Fcould be obtained by calculating C1015840S the calculation of C1015840is numerically expensive To reduce computational costs wecalculated F as follows (the transpose of a square matrix Xdenoted by X119879 is obtained by replacing the upper triangleof the matrix by the lower triangle and vice versa Thus( 3 87 2 )119879 = ( 3 78 2 ) For a nonsquare matrix the transpose isobtained by switching rows and columnsThus a 4times2 matrixbecomes a 2 times 4 matrix when transposed)

CF = S

C119879CF = C119879S

20 Complexity

(C119879C)minus1 (C119879C) F = (C119879C)minus1 C119879SF = (C119879C)minus1 C119879S

(17)

In this way matrix inversion is required for a much smallermatrixC119879C which is a square matrix of size 40639 times 40639

To evaluate model performance we compared the esti-mated semantic vectors with the targeted semantic vectorsusing the Pearson correlation We therefore calculated the131673 times 131673 correlation matrix for all possible pairs ofestimated and targeted semantic vectors Precisionwas calcu-lated by comparing predicted vectors with the gold standardprovided by the targeted vectors Recognition was defined tobe successful if the correlation of the predicted vector withthe targeted gold vector was the highest of all the pairwisecorrelations of this predicted vector with any of the other goldsemantic vectors Precision defined as the proportion of cor-rect recognitions divided by the total number of audio tokenswas at 3361 For the correctly identified words the meancorrelation was 072 and for the incorrectly identified wordsit was 055 To place this in perspective a naive discrimi-nation model with discrete lexomes as output performed at12 and a deep convolution network Mozilla DeepSpeech(httpsgithubcommozillaDeepSpeech based onHannun et al [9]) performed at 6 The low performanceofMozilla DeepSpeech is due primarily due to its dependenceon a languagemodelWhenpresentedwith utterances insteadof single words it performs remarkably well

44 Discussion A problem with naive discriminative learn-ing that has been puzzling for a long time is that measuresbased on ndl performed well as predictors of processingtimes [54 58] whereas accuracy for lexical decisions waslow An accuracy of 27 for a model performing an 11480-classification task is perhaps reasonable but the lack ofprecision is unsatisfactory when the goal is to model humanvisual lexicality decisions Bymoving fromndl to ldlmodelaccuracy is substantially improved (to 59) At the sametime predictions for reaction times improved considerablyas well As lexicality decisions do not require word identifica-tion further improvement in predicting decision behavior isexpected to be possible by considering not only whether thepredicted semantic is closest to the targeted vector but alsomeasures such as how densely the space around the predictedsemantic vector is populated

Accuracy for auditory comprehension is lower for thedata we considered above at around 33 Interestingly aseries of studies indicates that recognizing isolated wordstaken out of running speech is a nontrivial task also forhuman listeners [121ndash123] Correct identification by nativespeakers of 1000 randomly sampled word tokens from aGerman corpus of spontaneous conversational speech rangedbetween 21 and 44 [18] For both human listeners andautomatic speech recognition systems recognition improvesconsiderably when words are presented in their naturalcontext Given that ldl with FBSFs performs very wellon isolated word recognition it seems worth investigating

further whether the present approach can be developed into afully-fledged model of auditory comprehension that can takefull utterances as input For a blueprint of how we plan toimplement such a model see Baayen et al [124]

5 Speech Production

This section examines whether we can predict wordsrsquo formsfrom the semantic vectors of S If this is possible for thepresent dataset with reasonable accuracy we have a proofof concept that discriminative morphology is feasible notonly for comprehension but also for speech productionThe first subsection introduces the computational imple-mentation The next subsection reports on the modelrsquosperformance which is evaluated first for monomorphemicwords then for inflected words and finally for derivedwords Section 53 provides further evidence for the pro-duction network by showing that as the support from thesemantics for the triphones becomes weaker the amount oftime required for articulating the corresponding segmentsincreases

51 Computational Implementation For a productionmodelsome representational format is required for the output thatin turn drives articulation In what follows we make useof triphones as output features Triphones capture part ofthe contextual dependencies that characterize speech andthat render problematic the phoneme as elementary unit ofa phonological calculus [93] Triphones are in many waysnot ideal in that they inherit the limitations that come withdiscrete units Other output features structured along thelines of gestural phonology [125] or time series of movementsof key articulators registered with electromagnetic articulog-raphy or ultrasound are on our list for further explorationFor now we use triphones as a convenience construct andwe will show that given the support from the semantics forthe triphones the sequence of phones can be reconstructedwith high accuracy As some models of speech productionassemble articulatory targets from phone segments (eg[126]) the present implementation can be seen as a front-endfor this type of model

Before putting the model to the test we first clarify theway the model works by means of our toy lexicon with thewords one two and three Above we introduced the semanticmatrix S (13) which we repeat here for convenience

S = one two threeonetwothree

( 100201031001

040110 ) (18)

as well as an C indicator matrix specifying which triphonesoccur in which words (12) As in what follows this matrixspecifies the triphones targeted for productionwe henceforthrefer to this matrix as the Tmatrix

Complexity 21

T

= wV wVn Vn tu tu Tr Tri rionetwothree

( 100100

100010

010001

001001 ) (19)

For production our interest is in thematrixG that transformsthe row vectors of S into the row vectors of T ie we need tosolve

SG = T (20)

Given G we can predict for any semantic vector s the vectorof triphones t that quantifies the support for the triphonesprovided by s simply by multiplying s with G

t = sG (21)

As before the transformation matrix G is straightforwardto estimate Let S1015840 denote the Moore-Penrose generalizedinverse of S Since

S1015840SG = S1015840T (22)

we have

G = S1015840T (23)

Given G we can predict the triphone matrix T from thesemantic matrix S

SG = T (24)

For the present example the inverse of S and S1015840 is

S1015840 = one two threeonetwothree

( 110minus021minus009minus029107minus008

minus041minus002104 ) (25)

and the transformation matrix G is

G = wV wVn Vn tu tu Tr Tri ri

onetwothree

( 110minus021minus009110minus021minus009

110minus021minus009minus029107minus008

minus029107minus008minus041minus002104

minus041minus002104minus041minus002104 ) (26)

For this simple example T is virtually identical to T Forrealistic data T will not be identical to T but will be anapproximation of it that is optimal in the least squares senseThe triphones with the strongest support are expected to bethe most likely to be the triphones defining a wordrsquos form

We made use of the S matrix which we derived fromthe tasa corpus as described in Section 3 The majority ofcolumns of the 23561 times 23561 matrix S show very smalldeviations from zero and hence are uninformative As beforewe reduced the number of columns of S by removing columnswith very low variance Here one option is to remove allcolumns with a variance below a preset threshold 120579 Howeverto avoid adding a threshold as a free parameter we setthe number of columns retained to the number of differenttriphones in a given dataset a number which is around 119899 =4500 In summary S denotes a 119908times119899 matrix that specifies foreach of 119908 words an 119899-dimensional semantic vector

52 Model Performance

521 Performance on Monolexomic Words We first exam-ined model performance on a dataset comprising monolex-omicwords that did not carry any inflectional exponentsThisdataset of 3987words comprised three irregular comparatives

(elder less andmore) two irregular superlatives (least most)28 irregular past tense forms and one irregular past participle(smelt) For this set of words we constructed a 3987 times 4446matrix of semantic vectors S119898 (a submatrix of the S matrixintroduced above in Section 3) and a 3987 times 4446 triphonematrix T119898 (a submatrix of T) The number of colums of S119898was set to the number of different triphones (the columndimension of T119898) Those column vectors of S119898 were retainedthat had the 4446 highest variances We then estimated thetransformation matrix G and used this matrix to predict thetriphones that define wordsrsquo forms

We evaluated model performance in two ways First weinspected whether the triphones with maximal support wereindeed the targeted triphones This turned out to be the casefor all words Targeted triphones had an activation valueclose to one and nontargeted triphones an activation closeto zero As the triphones are not ordered we also investi-gated whether the sequence of phones could be constructedcorrectly for these words To this end we set a thresholdof 099 extracted all triphones with an activation exceedingthis threshold and used the all simple paths (a path issimple if the vertices it visits are not visited more than once)function from the igraph package [127] to calculate all pathsstarting with any left-edge triphone in the set of extractedtriphones From the resulting set of paths we selected the

22 Complexity

longest path which invariably was perfectly aligned with thesequence of triphones that defines wordsrsquo forms

We also evaluated model performance with a secondmore general heuristic algorithm that also makes use of thesame algorithm from graph theory Our algorithm sets up agraph with vertices collected from the triphones that are bestsupported by the relevant semantic vectors and considers allpaths it can find that lead from an initial triphone to a finaltriphoneThis algorithm which is presented inmore detail inthe appendix and which is essential for novel complex wordsproduced the correct form for 3982 out of 3987 words Itselected a shorter form for five words int for intent lin forlinnen mis for mistress oint for ointment and pin for pippinThe correct forms were also found but ranked second due toa simple length penalty that is implemented in the algorithm

From these analyses it is clear that mapping nearly4000 semantic vectors on their corresponding triphone pathscan be accomplished with very high accuracy for Englishmonolexomic words The question to be addressed nextis how well this approach works for complex words Wefirst address inflected forms and limit ourselves here to theinflected variants of the present set of 3987 monolexomicwords

522 Performance on Inflected Words Following the classicdistinction between inflection and word formation inflectedwords did not receive semantic vectors of their own Never-theless we can create semantic vectors for inflected words byadding the semantic vector of an inflectional function to thesemantic vector of its base However there are several ways inwhich themapping frommeaning to form for inflectedwordscan be set up To explain this we need some further notation

Let S119898 and T119898 denote the submatrices of S and T thatcontain the semantic and triphone vectors of monolexomicwords Assume that a subset of 119896 of these monolexomicwords is attested in the training corpus with inflectionalfunction 119886 Let T119886 denote the matrix with the triphonevectors of these inflected words and let S119886 denote thecorresponding semantic vectors To obtain S119886 we take thepertinent submatrix S119898119886 from S119898 and add the semantic vectors119886 of the affix

S119886 = S119898119886 + i⨂ s119886 (27)

Here i is a unit vector of length 119896 and ⨂ is the generalizedKronecker product which in (27) stacks 119896 copies of s119886 row-wise As a first step we could define a separate mapping G119886for each inflectional function 119886

S119886G119886 = T119886 (28)

but in this set-up learning of inflected words does not benefitfrom the knowledge of the base words This can be remediedby a mapping for augmented matrices that contain the rowvectors for both base words and inflected words[S119898

S119886]G119886 = [T119898

T119886] (29)

The dimension of G119886 (length of semantic vector by lengthof triphone vector) remains the same so this option is notmore costly than the preceding one Nevertheless for eachinflectional function a separate large matrix is required Amuch more parsimonious solution is to build augmentedmatrices for base words and all inflected words jointly

[[[[[[[[[[S119898S1198861S1198862S119886119899

]]]]]]]]]]G = [[[[[[[[[[

T119898T1198861T1198862T119886119899

]]]]]]]]]](30)

The dimension of G is identical to that of G119886 but now allinflectional functions are dealt with by a single mappingIn what follows we report the results obtained with thismapping

We selected 6595 inflected variants which met the cri-terion that the frequency of the corresponding inflectionalfunction was at least 50 This resulted in a dataset with 91comparatives 97 superlatives 2401 plurals 1333 continuousforms (eg walking) 859 past tense forms and 1086 formsclassed as perfective (past participles) as well as 728 thirdperson verb forms (eg walks) Many forms can be analyzedas either past tenses or as past participles We followed theanalyses of the treetagger which resulted in a dataset inwhichboth inflectional functions are well-attested

Following (30) we obtained an (augmented) 10582 times5483 semantic matrix S whereas before we retained the 5483columns with the highest column variance The (augmented)triphone matrix T for this dataset had the same dimensions

Inspection of the activations of the triphones revealedthat targeted triphones had top activations for 85 of themonolexomic words and 86 of the inflected words Theproportion of words with at most one intruding triphone was97 for both monolexomic and inflected words The graph-based algorithm performed with an overall accuracy of 94accuracies broken down bymorphology revealed an accuracyof 99 for the monolexomic words and an accuracy of 92for inflected words One source of errors for the productionalgorithm is inconsistent coding in the celex database Forinstance the stem of prosper is coded as having a final schwafollowed by r but the inflected forms are coded without ther creating a mismatch between a partially rhotic stem andcompletely non-rhotic inflected variants

We next put model performance to a more stringent testby using 10-fold cross-validation for the inflected variantsFor each fold we trained on all stems and 90 of all inflectedforms and then evaluated performance on the 10of inflectedforms that were not seen in training In this way we can ascer-tain the extent to which our production system (network plusgraph-based algorithm for ordering triphones) is productiveWe excluded from the cross-validation procedure irregularinflected forms forms with celex phonological forms withinconsistent within-paradigm rhoticism as well as forms thestem of which was not available in the training set Thus

Complexity 23

cross-validation was carried out for a total of 6236 inflectedforms

As before the semantic vectors for inflected words wereobtained by addition of the corresponding content and inflec-tional semantic vectors For each training set we calculatedthe transformationmatrixG from the S andTmatrices of thattraining set For an out-of-bag inflected form in the test setwe calculated its semantic vector and multiplied this vectorwith the transformation matrix (using (21)) to obtain thepredicted triphone vector t

The proportion of forms that were predicted correctlywas 062 The proportion of forms ranked second was 025Forms that were incorrectly ranked first typically were otherinflected forms (including bare stems) that happened toreceive stronger support than the targeted form Such errorsare not uncommon in spoken English For instance inthe Buckeye corpus [110] closest is once reduced to closFurthermore Dell [90] classified forms such as concludementfor conclusion and he relax for he relaxes as (noncontextual)errors

The algorithm failed to produce the targeted form for3 of the cases Examples of the forms produced instead arethe past tense for blazed being realized with [zId] insteadof [zd] the plural mouths being predicted as having [Ts] ascoda rather than [Ds] and finest being reduced to finst Thevoiceless production formouth does not follow the dictionarynorm but is used as attested by on-line pronunciationdictionaries Furthermore the voicing alternation is partlyunpredictable (see eg [128] for final devoicing in Dutch)and hence model performance here is not unreasonable Wenext consider model accuracy for derived words

523 Performance on Derived Words When building thevector space model we distinguished between inflectionand word formation Inflected words did not receive theirown semantic vectors By contrast each derived word wasassigned its own lexome together with a lexome for itsderivational function Thus happiness was paired with twolexomes happiness and ness Since the semantic matrix forour dataset already contains semantic vectors for derivedwords we first investigated how well forms are predictedwhen derived words are assessed along with monolexomicwords without any further differentiation between the twoTo this end we constructed a semantic matrix S for 4885words (rows) by 4993 (columns) and constructed the trans-formation matrix G from this matrix and the correspondingtriphone matrix T The predicted form vectors of T =SG supported the targeted triphones above all other tri-phones without exception Furthermore the graph-basedalgorithm correctly reconstructed 99 of all forms withonly 5 cases where it assigned the correct form secondrank

Next we inspected how the algorithm performs whenthe semantic vectors of derived forms are obtained fromthe semantic vectors of their base words and those of theirderivational lexomes instead of using the semantic vectors ofthe derived words themselves To allow subsequent evalua-tion by cross-validation we selected those derived words that

contained an affix that occured at least 30 times in our data set(again (38) agent (177) ful (45) instrument (82) less(54) ly (127) and ness (57)) to a total of 770 complex words

We first combined these derived words with the 3987monolexomic words For the resulting 4885 words the Tand S matrices were constructed from which we derivedthe transformation matrix G and subsequently the matrixof predicted triphone strengths T The proportion of wordsfor which the targeted triphones were the best supportedtriphones was 096 and the graph algorithm performed withan accuracy of 989

In order to assess the productivity of the system weevaluated performance on derived words by means of 10-fold cross-validation For each fold we made sure that eachaffix was present proportionally to its frequency in the overalldataset and that a derived wordrsquos base word was included inthe training set

We first examined performance when the transformationmatrix is estimated from a semantic matrix that contains thesemantic vectors of the derived words themselves whereassemantic vectors for unseen derived words are obtained bysummation of the semantic vectors of base and affix It turnsout that this set-up results in a total failure In the presentframework the semantic vectors of derived words are tooidiosyncratic and too finely tuned to their own collocationalpreferences They are too scattered in semantic space tosupport a transformation matrix that supports the triphonesof both stem and affix

We then used exactly the same cross-validation procedureas outlined above for inflected words constructing semanticvectors for derived words from the semantic vectors of baseand affix and calculating the transformation matrix G fromthese summed vectors For unseen derivedwordsGwas usedto transform the semantic vectors for novel derived words(obtained by summing the vectors of base and affix) intotriphone space

For 75 of the derived words the graph algorithmreconstructed the targeted triphones For 14 of the derivednouns the targeted form was not retrieved These includecases such as resound which the model produced with [s]instead of the (unexpected) [z] sewer which the modelproduced with R instead of only R and tumbler where themodel used syllabic [l] instead of nonsyllabic [l] given in thecelex target form

53 Weak Links in the Triphone Graph and Delays in SpeechProduction An important property of the model is that thesupport for trigrams changes where a wordrsquos graph branchesout for different morphological variants This is illustratedin Figure 9 for the inflected form blending Support for thestem-final triphone End is still at 1 but then blend can endor continue as blends blended or blending The resultinguncertainty is reflected in the weights on the edges leavingEnd In this example the targeted ing form is driven by theinflectional semantic vector for continuous and hence theedge to ndI is best supported For other forms the edgeweights will be different and hence other paths will be bettersupported

24 Complexity

1

1

1051

024023

03

023

1

001 001

1

02

03

023

002

1 0051

bl

blE

lEn

EndndI

dIN

IN

INI

NIN

dId

Id

IdI

dII

IIN

ndz

dz

dzI

zIN

nd

EnI

nIN

lEI

EIN

Figure 9 The directed graph for blending Vertices represent triphones including triphones such as IIN that were posited by the graphalgorithm to bridge potentially relevant transitions that are not instantiated in the training data Edges are labelled with their edge weightsThe graph the edge weights of which are specific to the lexomes blend and continuous incorporates not only the path for blending butalso the paths for blend blends and blended

The uncertainty that arises at branching points in thetriphone graph is of interest in the light of several exper-imental results For instance inter keystroke intervals intyping become longer at syllable and morph boundaries[129 130] Evidence from articulography suggests variabilityis greater at morph boundaries [131] Longer keystroke exe-cution times and greater articulatory variability are exactlywhat is expected under reduced edge support We thereforeexamined whether the edge weights at the first branchingpoint are predictive for lexical processing To this end weinvestigated the acoustic duration of the segment at the centerof the first triphone with a reduced LDL edge weight (in thepresent example d in ndI henceforth lsquobranching segmentrsquo)for those words in our study that are attested in the Buckeyecorpus [110]

The dataset we extracted from the Buckeye corpus com-prised 15105 tokens of a total of 1327 word types collectedfrom 40 speakers For each of these words we calculatedthe relative duration of the branching segment calculatedby dividing segment duration by word duration For thepurposes of statistical evaluation the distribution of thisrelative duration was brought closer to normality bymeans ofa logarithmic transformationWe fitted a generalized additivemixed model [132] to log relative duration with randomintercepts for word and speaker log LDL edge weight aspredictor of interest and local speech rate (Phrase Rate)log neighborhood density (Ncount) and log word frequencyas control variables A summary of this model is presentedin Table 7 As expected an increase in LDL Edge Weight

goes hand in hand with a reduction in the duration of thebranching segment In other words when the edge weight isreduced production is slowed

The adverse effects of weaknesses in a wordrsquos path inthe graph where the path branches is of interest againstthe discussion about the function of weak links in diphonetransitions in the literature on comprehension [133ndash138] Forcomprehension it has been argued that bigram or diphonelsquotroughsrsquo ie low transitional probabilities in a chain of hightransitional probabilities provide points where sequencesare segmented and parsed into their constituents Fromthe perspective of discrimination learning however low-probability transitions function in exactly the opposite wayfor comprehension [54 124] Naive discriminative learningalso predicts that troughs should give rise to shorter process-ing times in comprehension but not because morphologicaldecomposition would proceed more effectively Since high-frequency boundary bigrams and diphones are typically usedword-internally across many words they have a low cuevalidity for thesewords Conversely low-frequency boundarybigrams are much more typical for very specific base+affixcombinations and hence are better discriminative cues thatafford enhanced activation of wordsrsquo lexomes This in turngives rise to faster processing (see also Ramscar et al [78] forthe discriminative function of low-frequency bigrams in low-frequency monolexomic words)

There is remarkable convergence between the directedgraphs for speech production such as illustrated in Figure 9and computational models using temporal self-organizing

Complexity 25

Table 7 Statistics for the partial effects of a generalized additive mixed model fitted to the relative duration of edge segments in the Buckeyecorpus The trend for log frequency is positive accelerating TPRS thin plate regression spline

A parametric coefficients Estimate Std Error t-value p-valueIntercept -19165 00592 -323908 lt 00001Log LDL Edge Weight -01358 00400 -33960 00007Phrase Rate 00047 00021 22449 00248Log Ncount 01114 00189 58837 lt 00001B smooth terms edf Refdf F-value p-valueTPRS log Frequency 19007 19050 75495 00004random intercepts word 11498839 13230000 314919 lt 00001random intercepts speaker 302995 390000 345579 lt 00001

maps (TSOMs [139ndash141]) TSOMs also use error-drivenlearning and in recent work attention is also drawn toweak edges in wordsrsquo paths in the TSOM and the conse-quences thereof for lexical processing [142] We are not usingTSOMs however one reason being that in our experiencethey do not scale up well to realistically sized lexiconsA second reason is that we are pursuing the hypothesisthat form serves meaning and that self-organization ofform by itself is not necessary This hypothesis is likelytoo strong especially as we do not provide a rationale forthe trigram and triphone units that we use to representaspects of form It is worth noting that spatial organizationof form features is not restricted to TSOMs Although inour approach the spatial organization of triphone features isleft unspecified such an organization can be easily enforced(see [85] for further details) by using an algorithm suchas graphopt which self-organizes the vertices of a graphinto a two-dimensional plane Figure 9 was obtained withthis algorithm (Graphopt was developed for the layout oflarge graphs httpwwwschmuhlorggraphopt andis implemented in the igraph package Graphopt uses basicprinciples of physics to iteratively determine an optimallayout Each node in the graph is given both mass and anelectric charge and edges between nodes are modeled asspringsThis sets up a system inwhich there are attracting andrepelling forces between the vertices of the graph and thisphysical system is simulated until it reaches an equilibrium)

54 Discussion Our production network reconstructsknown wordsrsquo forms with a high accuracy 999 formonolexomic words 92 for inflected words and 99 forderived words For novel complex words accuracy under10-fold cross-validation was 62 for inflected words and 75for derived words

The drop in performance for novel forms is perhapsunsurprising given that speakers understand many morewords than they themselves produce even though they hearnovel forms on a fairly regular basis as they proceed throughlife [79 143]

However we also encountered several technical problemsthat are not due to the algorithm but to the representationsthat the algorithm has had to work with First it is surprisingthat accuracy is as high as it is given that the semanticvectors are constructed from a small corpus with a very

simple discriminative algorithm Second we encounteredinconsistencies in the phonological forms retrieved from thecelex database inconsistencies that in part are due to theuse of discrete triphones Third many cases where the modelpredictions do not match the targeted triphone sequence thetargeted forms have a minor irregularity (eg resound with[z] instead of [s]) Fourth several of the typical errors that themodel makes are known kinds of speech errors or reducedforms that one might encounter in engaged conversationalspeech

It is noteworthy that themodel is almost completely data-driven The triphones are derived from celex and otherthan somemanual corrections for inconsistencies are derivedautomatically from wordsrsquo phone sequences The semanticvectors are based on the tasa corpus and were not in any wayoptimized for the production model Given the triphone andsemantic vectors the transformation matrix is completelydetermined No by-hand engineering of rules and exceptionsis required nor is it necessary to specify with hand-codedlinks what the first segment of a word is what its secondsegment is etc as in the weaver model [37] It is only in theheuristic graph algorithm that three thresholds are requiredin order to avoid that graphs become too large to remaincomputationally tractable

The speech production system that emerges from thisapproach comprises first of all an excellent memory forforms that have been encountered before Importantly thismemory is not a static memory but a dynamic one It is not arepository of stored forms but it reconstructs the forms fromthe semantics it is requested to encode For regular unseenforms however a second network is required that projects aregularized semantic space (obtained by accumulation of thesemantic vectors of content and inflectional or derivationalfunctions) onto the triphone output space Importantly itappears that no separate transformations or rules are requiredfor individual inflectional or derivational functions

Jointly the two networks both of which can also betrained incrementally using the learning rule ofWidrow-Hoff[44] define a dual route model attempts to build a singleintegrated network were not successfulThe semantic vectorsof derived words are too idiosyncratic to allow generalizationfor novel forms It is the property of the semantic vectorsof derived lexomes described in Section 323 namely thatthey are close to but not inside the cloud of their content

26 Complexity

lexomes which makes them productive Novel forms do notpartake in the idiosyncracies of lexicalized existing wordsbut instead remain bound to base and derivational lexomesIt is exactly this property that makes them pronounceableOnce produced a novel form will then gravitate as experi-ence accumulates towards the cloud of semantic vectors ofits morphological category meanwhile also developing itsown idiosyncracies in pronunciation including sometimeshighly reduced pronunciation variants (eg tyk for Dutchnatuurlijk ([natyrlk]) [111 144 145] It is perhaps possibleto merge the two routes into one but for this much morefine-grained semantic vectors are required that incorporateinformation about discourse and the speakerrsquos stance withrespect to the adressee (see eg [115] for detailed discussionof such factors)

6 Bringing in Time

Thus far we have not considered the role of time The audiosignal comes in over time longerwords are typically readwithmore than one fixation and likewise articulation is a temporalprocess In this section we briefly outline how time can bebrought into the model We do so by discussing the readingof proper names

Proper names (morphologically compounds) pose achallenge to compositional theories as the people referredto by names such as Richard Dawkins Richard NixonRichardThompson and SandraThompson are not obviouslysemantic composites of each other We therefore assignlexomes to names irrespective of whether the individu-als referred to are real or fictive alive or dead Further-more we assume that when the personal name and familyname receive their own fixations both names are under-stood as pertaining to the same named entity which istherefore coupled with its own unique lexome Thus forthe example sentences in Table 8 the letter trigrams ofJohn are paired with the lexome JohnClark and like-wise the letter trigrams of Clark are paired with this lex-ome

By way of illustration we obtained thematrix of semanticvectors training on 100 randomly ordered tokens of thesentences of Table 8 We then constructed a trigram cuematrix C specifying for each word (John Clark wrote )which trigrams it contains In parallel a matrix L specify-ing for each word its corresponding lexomes (JohnClarkJohnClark write ) was set up We then calculatedthe matrix F by solving CF = L and used F to cal-culate estimated (predicted) semantic vectors L Figure 10presents the correlations of the estimated semantic vectorsfor the word forms John John Welsch Janet and Clarkwith the targeted semantic vectors for the named entitiesJohnClark JohnWelsch JohnWiggam AnneHastieand JaneClark For the sequentially read words John andWelsch the semantic vector generated by John and thatgenerated by Welsch were summed to obtain the integratedsemantic vector for the composite name

Figure 10 upper left panel illustrates that upon readingthe personal name John there is considerable uncertainty

about which named entity is at issue When subsequentlythe family name is read (upper right panel) uncertainty isreduced and John Welsch now receives full support whereasother named entities have correlations close to zero or evennegative correlations As in the small world of the presenttoy example Janet is a unique personal name there is nouncertainty about what named entity is at issue when Janetis read For the family name Clark on its own by contrastJohn Clark and Janet Clark are both viable options

For the present example we assumed that every ortho-graphic word received one fixation However depending onwhether the eye lands far enough into the word namessuch as John Clark and compounds such as graph theorycan also be read with a single fixation For single-fixationreading learning events should comprise the joint trigrams ofthe two orthographic words which are then simultaneouslycalibrated against the targeted lexome (JohnClark graph-theory)

Obviously the true complexities of reading such asparafoveal preview are not captured by this example Never-theless as a rough first approximation this example showsa way forward for integrating time into the discriminativelexicon

7 General Discussion

We have presented a unified discrimination-driven modelfor visual and auditory word comprehension and wordproduction the discriminative lexicon The layout of thismodel is visualized in Figure 11 Input (cochlea and retina)and output (typing and speaking) systems are presented inlight gray the representations of themodel are shown in darkgray For auditory comprehension we make use of frequencyband summary (FBS) features low-level cues that we deriveautomatically from the speech signal The FBS features serveas input to the auditory network F119886 that maps vectors of FBSfeatures onto the semantic vectors of S For reading lettertrigrams represent the visual input at a level of granularitythat is optimal for a functional model of the lexicon (see [97]for a discussion of lower-level visual features) Trigrams arerelatively high-level features compared to the FBS featuresFor features implementing visual input at a much lower levelof visual granularity using histograms of oriented gradients[146] see Linke et al [15] Letter trigrams are mapped by thevisual network F119900 onto S

For reading we also implemented a network heredenoted by K119886 that maps trigrams onto auditory targets(auditory verbal images) represented by triphones Theseauditory triphones in turn are mapped by network H119886onto the semantic vectors S This network is motivated notonly by our observation that for predicting visual lexicaldecision latencies triphones outperform trigrams by a widemargin Network H119886 is also part of the control loop forspeech production see Hickok [147] for discussion of thiscontrol loop and the necessity of auditory targets in speechproduction Furthermore the acoustic durations with whichEnglish speakers realize the stem vowels of English verbs[148] as well as the duration of word final s in English [149]

Complexity 27

Table 8 Example sentences for the reading of proper names To keep the example simple inflectional lexomes are not taken into account

John Clark wrote great books about antsJohn Clark published a great book about dangerous antsJohn Welsch makes great photographs of airplanesJohn Welsch has a collection of great photographs of airplanes landingJohn Wiggam will build great harpsichordsJohn Wiggam will be a great expert in tuning harpsichordsAnne Hastie has taught statistics to great second year studentsAnne Hastie has taught probability to great first year studentsJanet Clark teaches graph theory to second year studentsJohnClark write great book about antJohnClark publish a great book about dangerous antJohnWelsch make great photograph of airplaneJohnWelsch have a collection of great photograph airplane landingJohnWiggam future build great harpsichordJohnWiggam future be a great expert in tuning harpsichordAnneHastie have teach statistics to great second year studentAnneHastie have teach probability to great first year studentJanetClark teach graphtheory to second year student

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

John

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

John Welsch

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

Janet

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

Clark

Pear

son

corr

elat

ion

minus0

50

00

51

0

Figure 10 Correlations of estimated semantic vectors for John JohnWelsch Janet andClarkwith the targeted semantic vectors of JohnClarkJohnWelsch JohnWiggam AnneHastie and JaneClark

can be predicted from ndl networks using auditory targets todiscriminate between lexomes [150]

Speech production is modeled by network G119886 whichmaps semantic vectors onto auditory targets which in turnserve as input for articulation Although current modelsof speech production assemble articulatory targets fromphonemes (see eg [126 147]) a possibility that is certainlycompatible with our model when phones are recontextual-ized as triphones we think that it is worthwhile investigatingwhether a network mapping semantic vectors onto vectors ofarticulatory parameters that in turn drive the articulators canbe made to work especially since many reduced forms arenot straightforwardly derivable from the phone sequences oftheir lsquocanonicalrsquo forms [111 144]

We have shown that the networks of our model havereasonable to high production and recognition accuraciesand also generate a wealth of well-supported predictions forlexical processing Central to all networks is the hypothesisthat the relation between form and meaning can be modeleddiscriminatively with large but otherwise surprisingly simplelinear networks the underlyingmathematics of which (linearalgebra) is well-understood Given the present results it isclear that the potential of linear algebra for understandingthe lexicon and lexical processing has thus far been severelyunderestimated (the model as outlined in Figure 11 is amodular one for reasons of computational convenience andinterpretabilityThe different networks probably interact (seefor some evidence [47]) One such interaction is explored

28 Complexity

typing speaking

orthographic targets

trigrams

auditory targets

triphones

semantic vectors

TASA

auditory cues

FBS features

orthographic cues

trigrams

cochlea retina

To Ta

Ha

GoGa

KaS

Fa Fo

CoCa

Figure 11 Overview of the discriminative lexicon Input and output systems are presented in light gray the representations of the model areshown in dark gray Each of these representations is a matrix with its own set of features Arcs are labeled with the discriminative networks(transformationmatrices) thatmapmatrices onto each otherThe arc from semantic vectors to orthographic vectors is in gray as thismappingis not explored in the present study

in Baayen et al [85] In their production model for Latininflection feedback from the projected triphone paths backto the semantics (synthesis by analysis) is allowed so thatof otherwise equally well-supported paths the path that bestexpresses the targeted semantics is selected For informationflows between systems in speech production see Hickok[147]) In what follows we touch upon a series of issues thatare relevant for evaluating our model

Incremental learning In the present study we esti-mated the weights on the connections of these networkswith matrix operations but importantly these weights canalso be estimated incrementally using the learning rule ofWidrow and Hoff [44] further improvements in accuracyare expected when using the Kalman filter [151] As allmatrices can be updated incrementally (and this holds aswell for the matrix with semantic vectors which are alsotime-variant) and in theory should be updated incrementally

whenever information about the order of learning events isavailable the present theory has potential for studying lexicalacquisition and the continuing development of the lexiconover the lifetime [79]

Morphology without compositional operations Wehave shown that our networks are predictive for a rangeof experimental findings Important from the perspective ofdiscriminative linguistics is that there are no compositionaloperations in the sense of Frege and Russell [1 2 152] Thework on compositionality in logic has deeply influencedformal linguistics (eg [3 153]) and has led to the beliefthat the ldquoarchitecture of the language facultyrdquo is groundedin a homomorphism between a calculus (or algebra) ofsyntactic representations and a calculus (or algebra) based onsemantic primitives Within this tradition compositionalityarises when rules combining representations of form arematched with rules combining representations of meaning

Complexity 29

The approach of Marelli and Baroni [60] who derive thesemantic vector of a complex word from the semantic vectorof its base and a dedicated affix-specific mapping from thepertinent base words to the corresponding derived wordsis in the spirit of compositionality in that a change in form(through affixation) goes hand in hand with a change inmeaning (modeled with a linear mapping in semantic space)

The perspective on morphological processing developedin the present study breaks with this tradition Althoughsemantic vectors for inflected words are obtained by summa-tion of the semantic vectors of the base word and those ofits inflectional functions (see [85] for modeling of the richverbal paradigms of Latin) the form vectors correspondingto the resulting semantic vectors are not obtained by thesummation of vectors for chunks of form nor by any otheroperations on forms Attempts to align chunks of wordsrsquo formwith their semantic structures mdash typically represented bygraphs constructed over semantic primitives (see eg [154])mdash are destined to end in quagmires of arbitrary decisionsabout the number of inflectional classes required (Estonian30 or 300) what the chunks should look like (does Germanhave a morpheme d in its articles and demonstratives derdie das diese ) whether Semitic morphology requiresoperations on form on one or two tiers [155ndash157] andhow many affixal position slots are required for Athebascanlanguages such asNavajo [158] A language that is particularlypainful for compositional theories is Yelı Dnye an Indo-Pacific language spoken on Rossel island (in the south-east ofPapuaNewGuinea)The language has a substantial inventoryof over a 1000 function words used for verb agreementTypically these words are monosyllables that simultaneouslyspecify values for negation tense aspect person and numberof subject deixis evidentiality associated motion and coun-terfactuality Furthermore verbs typically have no less thaneight different stems the use of which depends on particularconstellations of inflectional functions [159 160] Yelı Dnyethus provides a striking counterexample to the hypothesisthat a calculus of meaning would be paralleled by a calculusof form

Whereas the theory of linear discriminative learningradically rejects the idea that morphological processing isgrounded in compositionality as defined in mainstream for-mal linguistic theory it does allow for the conjoint expressionin form of lexical meanings and inflectional functions andfor forms to map onto such conjoint semantic vectors Butjust asmultiple inflectional functions can be expressed jointlyin a single word multiple words can map onto nonconjointsemantic vectors as was illustrated above for named entityrecognition John Wiggam can be understood to refer to avery specific person without this person being a composi-tional function of the proper name John and the surnameWiggam In both these cases there is no isomorphy betweeninflectional functions and lexical meanings on the one handand chunks of form on the other hand

Of course conjoint vectors for lexical meanings andinflectional functions can be implemented andmade to workonly because a Latin verb form such as sapıvissemus isunderstood when building the semantic space S to expresslexomes for person (first) number (plural) voice (active)

tense (pluperfect) mood (subjunctive) and lexical meaning(to know) (see [85] for computational modeling details)Although this could loosely be described as a decomposi-tional approach to inflected words we prefer not to use theterm lsquodecompositionalrsquo as in formal linguistics this termas outlined above has a very different meaning A moreadequate terminology would be that the system is conjointrealizational for production and conjoint inferential for com-prehension (we note here that we have focused on temporallyconjoint realization and inference leaving temporally disjointrealization and inference for sequences of words and possiblycompounds and fixed expression for further research)

Naive versus linear discriminative learning There isone important technical difference between the learningengine of the current study ldl and naive discriminativelearning (ndl) Because ndl makes the simplifying assump-tion that outcomes are orthogonal the weights from the cuesto a given outcome are independent of the weights from thecues to the other outcomes This independence assumptionwhich motivates the word naive in naive discriminativelearning makes it possible to very efficiently update networkweights during incremental learning In ldl by contrast thisindependence no longer holds learning is no longer naiveWe therefore refer to our networks not as ndl networksbut simply as linear discriminative learning networks Actualincremental updating of ldl networks is computationallymore costly but further numerical optimization is possibleand implementations are underway

Scaling An important property of the present approachis that good results are obtained already for datasets ofmodestsize Our semantic matrix is derived from the tasa corpuswhich with only 10 million words is dwarfed by the 28 billionwords of the ukWaC corpus used by Marelli and Baroni [60]more words than any individual speaker can ever encounterin their lifetime Similarly Arnold et al [18] report goodresults for word identification when models are trained on20 hours of speech which contrasts with the huge volumesof speech required for training deep learning networks forspeech recognition Although performance increases some-whatwithmore training data [161] it is clear that considerableheadway can be made with relatively moderate volumesof speech It thus becomes feasible to train models onfor instance Australian English and New Zealand Englishusing already existing speech corpora and to implementnetworks that make precise quantitative predictions for howunderstanding is mediated by listenersrsquo expectations abouttheir interlocutors (see eg [162])

In this study we have obtained good results with asemantic matrix with a dimensionality of around 4000lexomes By itself we find it surprising that a semantic vectorspace can be scaffolded with a mere 4000 words But thisfindingmay also shed light on the phenomenon of childhoodamnesia [163 164] Bauer and Larkina [165] pointed outthat young children have autobiographical memories thatthey may not remember a few years later They argue thatthe rate of forgetting diminishes as we grow older andstabilizes in adulthood Part of this process of forgettingmay relate to the changes in the semantic matrix Cruciallywe expect the correlational structure of lexomes to change

30 Complexity

considerably over time as experience accumulates If this isindeed the case the semantic vectors for the lexomes foryoung children are different from the corresponding lexomesfor older children which will again differ from those ofadults As a consequence autobiographical memories thatwere anchored to the lexomes at a young age can no longerbe accessed as the semantic vectors to which these memorieswere originally anchored no longer exist

No examplars Given how we estimate the transforma-tion matrices with which we navigate between form andmeaning it might seem that our approach is in essence animplementation of an exemplar-based theory of the mentallexicon After all for instance the C119886 matrix specifies foreach auditory word form (exemplar) its pertinent frequencyband summary features However the computational imple-mentation of the present study should not be confused withthe underlying algorithmic conceptualization The underly-ing conceptualization is that learning proceeds incrementallytrial by trial with the weights of transformation networksbeing updated with the learning rule of Widrow and Hoff[44] In the mean trial-by-trial updating with the Widrow-Hoff learning rule results in the same expected connectionweights as obtained by the present estimates using thegeneralized inverse The important conceptual advantage ofincremental learning is however that there is no need to storeexemplars By way of example for auditory comprehensionacoustic tokens are given with the input to the learningsystemThese tokens the ephemeral result of interactingwiththe environment leave their traces in the transformationmatrix F119886 but are not themselves stored The same holdsfor speech production We assume that speakers dynamicallyconstruct the semantic vectors from their past experiencesrather than retrieving these semantic vectors from somedictionary-like fixed representation familiar from currentfile systems The dynamic nature of these semantic vectorsemerged loud and clear from the modeling of novel complexwords for speech production under cross-validation In otherwords our hypothesis is that all input and output vectors areephemeral in the sense that they represent temporary statesof the cognitive system that are not themselves represented inthat system but that leave their traces in the transformationmatrices that jointly characterize and define the cognitivesystem that we approximate with the discriminative lexicon

This dynamic perspective on the mental lexicon asconsisting of several coupled networks that jointly bringthe cognitive system into states that in turn feed into fur-ther networks (not modeled here) for sentence integration(comprehension) and articulation (production) raises thequestion about the status of units in this theory Units playan important role when constructing numeric vectors forform and meaning Units for letter trigrams and phonetrigrams are of course context-enriched letter and phoneunits Content lexomes as well as lexomes for derivationaland inflectional functions are crutches for central semanticfunctions ranging from functions realizing relatively concreteonomasiological functions (lsquothis is a dog not a catrsquo) toabstract situational functions allowing entities events andstates to be specified on the dimensions of time spacequantity aspect etc However crucial to the central thrust of

the present study there are no morphological symbols (unitscombining form andmeaning) normorphological formunitssuch as stems and exponents of any kind in the modelAbove we have outlined the many linguistic considerationsthat have led us not to want to incorporate such units aspart of our theory Importantly as there are no hidden layersin our model it is not possible to seek for lsquosubsymbolicrsquoreflexes of morphological units in patterns of activation overhidden layers Here we depart from earlier connectionistmodels which rejected symbolic representations but retainedthe belief that there should be morphological representationalbeit subsymbolic ones Seidenberg and Gonnerman [166]for instance discuss morphological representations as beinglsquointerlevel representationsrsquo that are emergent reflections ofcorrelations among orthography phonology and semanticsThis is not to say that themore agglutinative a language is themore the present model will seem to the outside observer tobe operating withmorphemes But themore a language tendstowards fusional or polysyntheticmorphology the less strongthis impression will be

Model complexity Next consider the question of howdifficult lexical processing actually is Sidestepping the com-plexities of the neural embedding of lexical processing [126147] here we narrow down this question to algorithmiccomplexity State-of-the-art models in psychology [37 114167] implementmany layers of hierarchically organized unitsand many hold it for an established fact that such units arein some sense psychologically real (see eg [19 91 168])However empirical results can be equally well understoodwithout requiring the theoretical construct of the morpheme(see eg [47 54 58 166 169 170]) The present studyillustrates this point for lsquoboundaryrsquo effects in written andoral production When paths in the triphone graph branchuncertainty increases and processing slows down Althoughobserved lsquoboundaryrsquo effects are compatible with a post-Bloomfieldian lexicon the very same effects arise in ourmodel even though morphemes do not figure in any way inour computations

Importantly the simplicity of the linear mappings in ourmodel of the discriminative lexicon allows us to sidestep theproblem that typically arise in computationally implementedfull-scale hierarchical systems namely that errors arisingat lower levels propagate to higher levels The major stepsforward made in recent end-to-end models in languageengineering suggest that end-to-end cognitive models arealso likely to perform much better One might argue thatthe hidden layers of deep learning networks represent thetraditional lsquohidden layersrsquo of phonemes morphemes andword forms mediating between form and meaning in lin-guistic models and offshoots thereof in psychology Howeverthe case of baboon lexical learning [12 13] illustrates thatthis research strategy is not without risk and that simplerdiscriminative networks can substantially outperform deeplearning networks [15] We grant however that it is conceiv-able that model predictions will improve when the presentlinear networks are replaced by deep learning networks whenpresented with the same input and output representationsused here (see Zhao et al [171] for a discussion of lossfunctions for deep learning targeting numeric instead of

Complexity 31

categorical output vectors) What is unclear is whethersuch networks will improve our understandingmdashthe currentlinear networks offer the analyst connection weights betweeninput and output representations that are straightforwardlylinguistically interpretable Where we think deep learningnetworks really will come into their own is bridging the gapbetween for instance the cochlea and retina on the one handand the heuristic representations (letter trigrams) that wehave used as a starting point for our functional theory of themappings between form and meaning

Network flexibility An important challenge for deeplearning networks designed to model human lexical process-ing is to keep the networks flexible and open to rapid updat-ing This openness to continuous learning is required notonly by findings on categorization [40 41 41 42] includingphonetic categorization [172 173] but is also visible in timeseries of reaction times Baayen and Hendrix [17] reportedimproved prediction of visual lexical decision reaction timesin the British Lexicon Project when a naive discriminativelearning network is updated from trial to trial simulating thewithin-experiment learning that goes on as subjects proceedthrough the experiment In natural speech the phenomenonof phonetic convergence between interlocutors [174] likewisebears witness to a lexicon that is constantly recalibratedin interaction with its environment (for the many ways inwhich interlocutorsrsquo speech aligns see [175]) Likewise therapidity with which listeners adapt to foreign accents [176]within a few minutes shows that lexical networks exhibitfast local optimization As deep learning networks typicallyrequire huge numbers of cycles through the training dataonce trained they are at risk of not having the requiredflexibility for fast local adaptation This risk is reduced for(incremental) wide learning as illustrated by the phoneticconvergence evidenced by the model of Arnold et al [18] forauditory comprehension (a very similar point was made byLevelt [177] in his critique of connectionist models in thenineties)

Open questions This initial study necessarily leavesmany questions unanswered Compounds and inflectedderived words have not been addressed and morphologicaloperations such as reduplication infixation and nonconcate-native morphology provide further testing grounds for thepresent approach We also need to address the processingof compounds with multiple constituents ubiquitous inlanguages as different as German Mandarin and EstonianFurthermore in actual utterances words undergo assimila-tion at their boundaries and hence a crucial test case forthe discriminative lexicon is to model the production andcomprehension of words in multi-word utterances For ourproduction model we have also completely ignored stresssyllabification and pitch which however likely has renderedthe production task more difficult than necessary

A challenge for discriminative morphology is the devel-opment of proper semantic matrices For languages withrich morphology such as Estonian current morphologicalparsers will identify case-inflected words as nominativesgenitives or partitives (among others) but these labelsfor inflectional forms do not correspond to the semanticfunctions encoded in these forms (see eg [178 179]) What

is required are computational tools that detect the appropriateinflectional lexomes for these words in the sentences orutterances in which they are embedded

A related problem specifically for the speech productionmodel is how to predict the strongly reduced word formsthat are rampant in conversational speech [111] Our auditorycomprehension network F119886 is confronted with reduced formsin training and the study of Arnold et al [18] indicates thatpresent identification accuracymay approximate human per-formance for single-word recognition Whereas for auditorycomprehension we are making progress what is required forspeech production is a much more detailed understanding ofthe circumstances under which speakers actually use reducedword forms The factors driving reduction may be in partcaptured by the semantic vectors but we anticipate thataspects of the unfolding discourse play an important role aswell which brings us back to the unresolved question of howto best model not isolated words but words in context

An important challenge for our theory which formalizesincremental implicit learning is how to address and modelthe wealth of explicit knowledge that speakers have abouttheir language We play with the way words sound in poetrywe are perplexed by wordsrsquo meanings and reinterpret themso that they make more sense (eg the folk etymology [180]reflected in modern English crawfish a crustacean and nota fish originally Middle English crevis compare ecrevissein French) we teach morphology in schools and we evenfind that making morphological relations explicit may bebeneficial for aphasic patients [181] The rich culture of worduse cannot but interact with the implicit system but howexactly this interaction unfolds andwhat its consequences areat both sides of the consciousunconscious divide is presentlyunclear

Although many questions remain our model is remark-ably successful in capturing many aspects of lexical andmorphological processing in both comprehension and pro-duction Since themodel builds on a combination of linguisti-callymotivated lsquosmartrsquo low-level representations for form andmeaning and large but otherwise straightforward andmathe-matically well-understood two-layer linear networks (to beclear networks without any hidden layers) the conclusionseems justified that algorithmically the ldquomental lexiconrdquomaybe much simpler than previously thought

Appendix

A Vectors Matrices and MatrixMultiplication

Figure 12 presents two data points 119886 and 119887 with coordinates(3 4) and (-2 -3) Arrows drawn from the origin to thesepoints are shown in blue with a solid and dashed linerespectively We refer to these points as vectors Vectors aredenoted with lower case letters in bold and we place the 119909coordinate of a point next to the 119910 coordinate

a = (3 4) b = (minus2 minus3) (A1)

32 Complexity

minus4 minus2 0 2 4

minus4

minus2

02

4

x

y

a

b

x

y

Figure 12 Points 119886 and 119887 can be transformed into points 119901 and 119902 bya linear transformation

We can represent a and b jointly by placing them next to eachother in a matrix For matrices we use capital letters in boldfont

A = ( 3 4minus2 minus3) (A2)

In addition to the points 119886 and 119887 Figure 12 also shows twoother datapoints 119909 and 119910 The matrix for these two points is

B = (minus5 24 minus1) (A3)

Let us assume that points 119886 and 119887 represent the forms oftwo words 1205961 and 1205962 and that the points 119909 and 119910 representthe meanings of these two words (below we discuss in detailhow exactly wordsrsquo forms and meanings can be representedas vectors of numbers) As morphology is the study of therelation between wordsrsquo forms and their meanings we areinterested in how to transform points 119886 and b into points119909 and 119910 and likewise in how to transform points 119909 and119910 back into points 119886 and 119887 The first transformation isrequired for comprehension and the second transformationfor production

We begin with the transformation for comprehensionFormally we have matrix A which we want to transforminto matrix B A way of mapping A onto B is by means ofa linear transformation using matrix multiplication For 2 times2 matrices matrix multiplication is defined as follows(1198861 11988621198871 1198872) (1199091 11990921199101 1199102) = (11988611199091 + 11988621199101 11988611199092 + 1198862119910211988711199091 + 11988721199101 11988711199092 + 11988721199102) (A4)

Such a mapping of A onto B is given by a matrix F

F = ( 1 2minus2 minus1) (A5)

and it is straightforward to verify that indeed( 3 4minus2 minus3) ( 1 2minus2 minus1) = (minus5 24 minus1) (A6)

Using matrix notation we can write

AF = B (A7)

Given A and B how do we obtain F The answeris straightforward but we need two further concepts theidentity matrix and the matrix inverse For multiplication ofnumbers the identity multiplication is multiplication with 1For matrices multiplication with the identity matrix I leavesthe locations of the points unchangedThe identitymatrix hasones on the main diagonal and zeros elsewhere It is easilyverified that indeed(1 00 1) (1199091 11990921199101 1199102) = (1199091 11990921199101 1199102) (1 00 1) = (1199091 11990921199101 1199102) (A8)

For numbers the inverse of multiplication with 119904 is dividingby 119904 (1119904 ) (119904119909) = 119909 (A9)

For matrices the inverse of a square matrix X is that matrixXminus1 such that their product is the identity matrix

Xminus1X = XXminus1 = I (A10)

For nonsquare matrices the inverse Yminus1 is defined such that

Yminus1 (YX) = (XY) 119884minus1 = X (A11)

We find the square matrix F that maps the square matrices Aonto B as follows

AF = B

Aminus1AF = Aminus1B

IF = Aminus1B

F = Aminus1B(A12)

Since for the present example A and its inverse happen to beidentical (Aminus1 = A) we obtain (A6)

Fmaps wordsrsquo form vectors onto wordsrsquo semantic vectorsLet us now consider the reverse and see how we can obtainwordsrsquo form vectors from wordsrsquo semantic vectors That isgiven B we want to find that matrix G which maps B ontoA ie

BG = A (A13)

Solving this equation proceeds exactly as above

BG = A

Bminus1BG = Bminus1A

IG = Bminus1A

G = Bminus1A(A14)

Complexity 33

The inverse of B is

Bminus1 = (13 2343 53) (A15)

and hence we have

(13 2343 53) ( 3 4minus2 minus3) = (minus13 minus2323 13 ) = G (A16)

The inverse of a matrix needs not exist A square matrixthat does not have an inverse is referred to as a singularmatrix Almost all matrices with which we work in theremainder of this study are singular For singular matricesan approximation of the inverse can be used such as theMoore-Penrose generalized inverse (In R the generalizedinverse is available in theMASS package function ginv Thenumeric library that is used by ginv is LAPACK availableat httpwwwnetliborglapack) In this study wedenote the generalized inverse of a matrix X by X1015840

The examples presented here are restricted to 2 times 2matrices but themathematics generalize tomatrices of largerdimensions When multiplying two matrices X and Y theonly constraint is that the number of columns ofX is the sameas the number of rows ofYThe resultingmatrix has the samenumber of rows as X and the same number of columns as Y

In the present study the rows of matrices represent wordsand columns the features of thesewords For instance the rowvectors of A can represent wordsrsquo form features and the rowvectors of and B the corresponding semantic features Thefirst row vector of A is mapped onto the first row vector ofB as follows (3 4) ( 1 2minus2 minus1) = (minus5 2) (A17)

A transformationmatrix such as F can be represented as atwo-layer networkThe network corresponding to F is shownin Figure 13 When the input vector (3 4) is presented to thenetwork the nodes 1198941 and 1198942 are set to 3 and 4 respectively Toobtain the activations 1199001 = minus5 and 1199002 = 2 for the output vector(minus5 2) we cumulate the evidence at the input weighted bythe connection strengths specified on the edges of the graphThus for 1199001 we obtain the value 3 times 1 + 4 times minus2 = minus5 and for1199002 we have 3 times 2 + 4 times minus1 = 2 In other words the networkmaps the input vector (3 4) onto the output vector (minus5 2)

An important property of linear maps is that they areproductive We can present a novel form vector say

s = (2 2) (A18)

to the network and it will map this vector onto a novelsemantic vector Using matrices this new semantic vector isstraightforwardly calculated(2 2) ( 1 2minus2 minus1) = (minus2 2) (A19)

1 2 minus2 minus1

i1 i2

o1 o2

semantic vectors come out here

form features go in here

connection weightsare the transformationmatrix

Figure 13 The linear network corresponding to the transformationmatrix F There are two input nodes 1198941 and 1198942 (in blue)and twooutput nodes 1199001 and 1199002 in red When the input vector (3 4) ispresented to the network the value at 1199001 is 3 times 1 + 4 times minus2 = minus5and that at 1199002 is 3 times 2 + 4 times minus1 = 2 The network maps the point(3 4) onto the point (minus5 2)B Graph-Based Triphone Sequencing

The hypothesis underlying graph-based triphone sequencingis that in the directed graph that has triphones as verticesand an edge between any two vertices that overlap (iesegments 2 and 3 of triphone 1 are identical to segments 1and 2 of triphone 2) the path from a wordrsquos initial triphone(the triphone starting with ) to a wordrsquos final triphone (thetriphone ending with a ) is the path receiving the bestsupport of all possible paths from the initial to the finaltriphone Unfortunately the directed graph containing allvertices and edges is too large tomake computations tractablein a reasonable amount of computation time The followingalgorithm is a heuristic algorithm that first collects potentiallyrelevant vertices and edges and then calculates all pathsstarting fromany initial vertex using theall simple pathsfunction from the igraph package [127] As a first step allvertices that are supported by the stem and whole wordwith network support exceeding 01 are selected To theresulting set those vertices are added that are supported bythe affix which is conditional on these vertices having anetwork support exceeding 095 A directed graph can nowbe constructed and for each initial vertex all paths startingat these vertices are calculated The subset of those pathsreaching a final vertex is selected and the support for eachpath is calculated As longer paths trivially will have greatersupport the support for a path is weighted for its lengthsimply by dividing the raw support by the path length Pathsare ranked by weighted path support and the path withmaximal support is selected as the acoustic image drivingarticulation

It is possible that no path from an initial vertex to a finalvertex is found due to critical boundary triphones not beinginstantiated in the data on which the model was trainedThis happens when the model is assessed on novel inflected

34 Complexity

nobo

dyso

met

hing

ever

ythi

ngno

thin

gev

eryo

nean

ythi

ngan

yone

som

eone

hers

him his

via

than I

exce

ptou

tside

thei

rsw

hoev

erth

ence

min

ein

side

amon

gst its

thei

rou

rm

yyo

uyo

ur her

she

your

selv

esw

hate

ver us we it

they he

them m

em

yself

him

self

hers

elfou

rselv

esyo

urse

lfits

elfth

emse

lves

abou

tw

ithou

tw

ithin

beyo

ndam

ong

betw

een

thro

ugho

utdu

ring

since by

from in of

with at fo

r tobe

low

abov

e up offdo

wn

upon on

agai

nst

tow

ard

tow

ards

arou

ndne

aral

ong

unde

rbe

hind

besid

eth

roug

hon

toin

toov

erac

ross

nobodysomethingeverythingnothingeveryoneanythinganyonesomeonehershimhisviathanIexceptoutsidetheirswhoeverthencemineinsideamongstitstheirourmyyouyourhersheyourselveswhateverusweittheyhethemmemyselfhimselfherselfourselvesyourselfitselfthemselvesaboutwithoutwithinbeyondamongbetweenthroughoutduringsincebyfrominofwithatfortobelowaboveupoffdownupononagainsttowardtowardsaroundnearalongunderbehindbesidethroughontointooveracross

minus01 0 01 02Value

Color Keyand Histogram

050

015

0025

00C

ount

Figure 14 Heatmap for the correlations of the semantic vectors of pronouns and prepositions Note that quantifier pronouns form a groupof their own that prepositions and the other pronouns form distinguisable groups and that within groups further subclusters are visible egforwe and us she and her and across and over All diagonal elements represent correlations equal to 1 (this is not brought out by the heatmapwhich color-codes diagonal elements with the color for the highest correlation of the range we specified 025)

and derived words under cross-validationTherefore verticesand edges are added to the graph for all nonfinal verticesthat are at the end of any of the paths starting at any initialtriphone Such novel vertices and edges are assigned zeronetwork support Typically paths from the initial vertex tothe final vertex can now be constructed and path support isevaluated as above

For an algorithm that allows triphone paths to includecycles which may be the case in languages with much richer

morphology than English see Baayen et al [85] and for codethe R packageWpmWithLdl described therein

C A Heatmap for Function Words(Pronouns and Prepositions)

The heatmap for the correlations of the semantic vectors ofpronouns and prepositions is presented in Figure 14

Complexity 35

Data Availability

With the exception of the data from theDistributed Little RedHen Lab all primary data are publicly available from the citedsources

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The authors are indebted to Geoff Hollis Jessie Nixon ChrisWestbury and LuisMienhardt for their constructive feedbackon earlier versions of this manuscript This research wassupported by an ERC advancedGrant (no 742545) to the firstauthor

References

[1] G Frege ldquoBegriffsschrift a formula language modeled uponthat of arithmetic for pure thoughtrdquo in From Frege to GodelA Source Book in Mathematical Logic 1879-1931 pp 1ndash82 1967

[2] B RussellAn Inquiry intoMeaning andTruth Allen andUnwinLondon UK 1942

[3] R Montague ldquoThe proper treatment of quantification in ordi-nary englishrdquo in Approaches to Natural Language pp 221ndash242Springer 1973

[4] N Chomsky Syntactic Structures vol 33 Mouton The Hague1957

[5] W J M Levelt Formal Grammars in Linguistics And Psycholin-guistics Volume 1 An Introduction to the Theory of FormalLanguages and Automata Volume 2 Applications in LinguisticTheory Volume 3 Psycholinguistic Applications John BenjaminsPublishing 2008

[6] D F Kleinschmidt and T Florian Jaeger ldquoRobust speechperception Recognize the familiar generalize to the similar andadapt to the novelrdquo Psychological Review vol 122 no 2 pp 148ndash203 2015

[7] L Breiman ldquoRandom forestsrdquo Machine Learning vol 45 no 1pp 5ndash32 2001

[8] C Strobl A-L Boulesteix A Zeileis and T Hothorn ldquoBiasin random forest variable importance measures illustrationssources and a solutionrdquo BMC Bioinformatics vol 8 2007

[9] A Hannun C Case J Casper et al ldquoDeep speechScaling up end-to-end speech recognitionrdquo 2014httpsarxivorgabs14125567

[10] Z Tuske P Golik R Schluter and H Ney ldquoAcoustic modelingwith deep neural networks using raw time signal for lvcsrrdquo inProceedings of the INTERSPEECH pp 890ndash894 2014

[11] J Schmidhuber ldquoDeep learning in neural networks anoverviewrdquo Neural Networks vol 61 pp 85ndash117 2015

[12] T Hannagan J C Ziegler S Dufau J Fagot and J GraingerldquoDeep learning of orthographic representations in baboonsrdquoPLoS ONE vol 9 no 1 2014

[13] J Grainger S Dufau M Montant J C Ziegler and J FagotldquoOrthographic processing in baboons (Papio papio)rdquo Sciencevol 336 no 6078 pp 245ndash248 2012

[14] D Scarf K Boy A U Reinert J Devine O Gunturkun andM Colombo ldquoOrthographic processing in pigeons (columba

livia)rdquo Proceedings of the National Academy of Sciences vol 113no 40 pp 11272ndash11276 2016

[15] M Linke F Broker M Ramscar and R H Baayen ldquoArebaboons learning ldquoorthographicrdquo representations Probablynotrdquo PLoS ONE vol 12 no 8 2017

[16] D H Hubel and T N Wiesel ldquoReceptive fields binocularinteraction and functional architecture in the catrsquos visualcortexrdquo The Journal of Physiology vol 160 pp 106ndash154 1962

[17] R H Baayen and P Hendrix ldquoTwo-layer networks non-linear separation and human learningrdquo in From Semantics toDialectometry Festschrift in honor of JohnNerbonne Tributes 32M Wieling M Kroon G Van Noord and G Bouma Eds pp13ndash22 College Publications 2017

[18] D Arnold F Tomaschek K Sering F Lopez and R H BaayenldquoWords from spontaneous conversational speech can be rec-ognized with human-like accuracy by an error-driven learningalgorithm that discriminates between meanings straight fromsmart acoustic features bypassing the phoneme as recognitionunitrdquo PLoS ONE vol 12 no 4 2017

[19] P Zwitserlood ldquoProcessing and representation of morphologi-cal complexity in native language comprehension and produc-tionrdquo in The Construction of Words G E Booij Ed pp 583ndash602 Springer 2018

[20] G Booij ldquoConstruction morphologyrdquo Language and LinguisticsCompass vol 4 no 7 pp 543ndash555 2010

[21] J P Blevins ldquoWord-based morphologyrdquo Journal of Linguisticsvol 42 no 3 pp 531ndash573 2006

[22] G Stump Inflectional Morphology A Theory of ParadigmStructure Cambridge University Press 2001

[23] R Beard ldquoOn the extent and nature of irregularity in thelexiconrdquo Lingua vol 42 no 4 pp 305ndash341 1977

[24] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1974

[25] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1991

[26] C Hockett ldquoThe origin of speechrdquo Scientific American vol 203pp 89ndash97 1960

[27] M Erelt Estonian Language M Erelt Ed Estonian AcademyPublishers Tallinn Estonia 2003

[28] P H Matthews Grammatical Theory in the United States fromBloomfield to Chomsky Cambridge University Press Cam-bridge 1993

[29] N Chomsky andMHalleTheSound Pattern of English Harperand Row NY USA 1968

[30] T Landauer and S Dumais ldquoA solution to Platorsquos problem Thelatent semantic analysis theory of acquisition induction andrepresentation of knowledgerdquo Psychological Review vol 104 no2 pp 211ndash240 1997

[31] W Weaver ldquoTranslationrdquo in Machine Translation of LanguagesFourteen Essays W N Locke and A D Booth Eds pp 15ndash23MIT Press Cambridge 1955

[32] J R Firth Selected Papers of J R Firth 1952-59 IndianaUniversity Press 1968

[33] K Lund and C Burgess ldquoProducing high-dimensional seman-tic spaces from lexical co-occurrencerdquo Behavior Research Meth-ods Instruments and Computers vol 28 no 2 pp 203ndash2081996

[34] C Shaoul and C Westbury ldquoExploring lexical co-occurrencespace using HiDExrdquo Behavior Research Methods vol 42 no 2pp 393ndash413 2010

36 Complexity

[35] T Mikolov I Sutskever K Chen G Corrado and J DeanldquoDistributed representations of words and phrases and theircompositionalityrdquo inAdvances in Neural Information ProcessingSystems pp 3111ndash3119 2013

[36] A Roelofs ldquoThe WEAVER model of word-form encoding inspeech productionrdquo Cognition vol 64 no 3 pp 249ndash284 1997

[37] W J M Levelt A Roelofs and A S Meyer ldquoA theory of lexicalaccess in speech productionrdquoBehavioral and Brain Sciences vol22 no 1 pp 1ndash38 1999

[38] M Taft ldquoInteractive-activation as a Framework for Under-standing Morphological Processingrdquo Language and CognitiveProcesses vol 9 no 3 pp 271ndash294 1994

[39] R Schreuder and R H Baayen ldquoModeling morphologicalprocessingrdquo inMorphological Aspects of Language Processing LB Feldman Ed pp 131ndash154 Lawrence ErlbaumHillsdale NewJersey USA 1995

[40] B C Love D L Medin and T M Gureckis ldquoSUSTAIN ANetwork Model of Category Learningrdquo Psychological Reviewvol 111 no 2 pp 309ndash332 2004

[41] C J Marsolek ldquoWhat antipriming reveals about primingrdquoTrends in Cognitive Sciences vol 12 no 5 pp 176ndash181 2008

[42] M Ramscar and R Port ldquoCategorization (without categories)rdquoin Handbook of Cognitive Linguistics E Dabrowska and DDivjak Eds pp 75ndash99 De Gruyter Berlin Germany 2015

[43] R A Rescorla and A R Wagner ldquoA theory of Pavlovianconditioning Variations in the effectiveness of reinforcementand nonreinforcementrdquo in Classical Conditioning II CurrentResearch and Theory A H Black and W F Prokasy Eds pp64ndash99 Appleton Century Crofts NY USA 1972

[44] B Widrow and M E Hoff ldquoAdaptive switching circuitsrdquo 1960WESCON Convention Record Part IV pp 96ndash104 1960

[45] R H Baayen R Schreuder and R Sproat ldquoMorphology inthe Mental Lexicon A Computational Model for Visual WordRecognitionrdquo in Lexicon Development for Speech and LanguageProcessing F van Eynde and D Gibbon Eds pp 267ndash291Kluwer Academic Publishers 2000

[46] M Taft ldquoA morphological-decomposition model of lexicalrepresentationrdquo Linguistics vol 26 no 4 pp 657ndash667 1988

[47] MWHarmandM S Seidenberg ldquoComputing themeanings ofwords in reading Cooperative division of labor between visualand phonological processesrdquo Psychological Review vol 111 no3 pp 662ndash720 2004

[48] P C Trimmer J M McNamara A Houston and J A RMarshall ldquoDoes natural selection favour the Rescorla-Wagnerrulerdquo Journal of Theoretical Biology vol 302 pp 39ndash52 2012

[49] R R Miller R C Barnet and N J Grahame ldquoAssessment ofthe Rescorla-Wagner modelrdquo Psychological Bulletin vol 117 no3 pp 363ndash386 1995

[50] L J Kamin ldquoPredictability surprise attention and condition-ingrdquo in Punishment and Aversive Behavior B A Campbell andRMChurch Eds pp 276ndash296Appleton-Century-CroftsNYUSA 1969

[51] R A Rescorla ldquoPavlovian conditioning Itrsquos not what you thinkit isrdquo American Psychologist vol 43 no 3 pp 151ndash160 1988

[52] M Ramscar D Yarlett M Dye K Denny and K ThorpeldquoThe Effects of Feature-Label-Order and Their Implications forSymbolic Learningrdquo Cognitive Science vol 34 no 6 pp 909ndash957 2010

[53] M Ramscar and D Yarlett ldquoLinguistic self-correction in theabsence of feedback A new approach to the logical problem oflanguage acquisitionrdquo Cognitive Science vol 31 no 6 pp 927ndash960 2007

[54] R H Baayen P Milin D F ETHurđevic P Hendrix and MMarelli ldquoAn Amorphous Model for Morphological Process-ing in Visual Comprehension Based on Naive DiscriminativeLearningrdquo Psychological Review vol 118 no 3 pp 438ndash4812011

[55] F M Del Prado Martın R Bertram T Haikio R Schreuderand R H Baayen ldquoMorphological family size in a morphologi-cally rich languageThe case of Finnish compared to Dutch andHebrewrdquo Journal of Experimental Psychology LearningMemoryand Cognition vol 30 pp 1271ndash1278 2004

[56] P Milin D F Durdevic and F M Del Prado Martın ldquoThesimultaneous effects of inflectional paradigms and classes onlexical recognition Evidence from Serbianrdquo Journal of Memoryand Language vol 60 no 1 pp 50ndash64 2009

[57] B K Bergen ldquoThe psychological reality of phonaesthemesrdquoLanguage vol 80 no 2 pp 290ndash311 2004

[58] P Milin L B Feldman M Ramscar P Hendrix and R HBaayen ldquoDiscrimination in lexical decisionrdquo PLoS ONE vol 12no 2 2017

[59] R H Baayen P Milin and M Ramscar ldquoFrequency in lexicalprocessingrdquo Aphasiology vol 30 no 11 pp 1174ndash1220 2016

[60] M Marelli and M Baroni ldquoAffixation in semantic space Mod-eling morpheme meanings with compositional distributionalsemanticsrdquo Psychological Review vol 122 no 3 pp 485ndash5152015

[61] T Sering PMilin andRH Baayen ldquoLanguage comprehensionas amultiple label classification problemrdquo StatisticaNeerlandicapp 1ndash15 2018

[62] R Kaye and RWilson Linear Algebra Oxford University Press1998

[63] S H Ivens and B L Koslin Demands for Reading LiteracyRequire New Accountability Methods Touchstone Applied Sci-ence Associates 1991

[64] T K Landauer P W Foltz and D Laham ldquoAn introduction tolatent semantic analysisrdquoDiscourse Processes vol 25 no 2-3 pp259ndash284 1998

[65] H Schmid ldquoImprovements in Part-of-Speech Tagging with anApplication to Germanrdquo in Proceedings of the ACL SIGDAT-Workshop pp 13ndash25 Dublin Ireland 1995

[66] R H Baayen R Piepenbrock and L Gulikers The CELEXLexical Database (CD-ROM) Linguistic Data ConsortiumUniversity of Pennsylvania Philadelphia PA 1995

[67] J Mitchell and M Lapata ldquoVector-based models of semanticcompositionrdquo in Proceedings of the ACL pp 236ndash244 2008

[68] A Lazaridou M Marelli R Zamparelli and M BaronildquoCompositionally derived representations of morphologicallycomplex words in distributional semanticsrdquo in Proceedings ofthe ACL (1) pp 1517ndash1526 2013

[69] R Cotterell H Schutze and J Eisner ldquoMorphological smooth-ing and extrapolation of word embeddingsrdquo in Proceedings ofthe 54th Annual Meeting of the Association for ComputationalLinguistics vol 1 pp 1651ndash1660 2016

[70] B D Zeller S Pado and J Snajder ldquoTowards semantic valida-tion of a derivational lexiconrdquo in Proceedings of the COLING2014 the 25th International Conference on Computational Lin-guistics Technical Papers pp 1728ndash1739 2014

[71] H Marchand The Categories and Types of Present-Day EnglishWord Formation A Synchronic-Diachronic Approach BeckrsquoscheVerlagsbuchhandlung Munchen Germany 1969

[72] L Bauer EnglishWord-Formation Cambridge University PressCambridge 1983

Complexity 37

[73] I PlagWord-Formation in English CambridgeUniversity PressCambridge 2003

[74] R Beard Lexeme-Morpheme Base Morphology A GeneralTheory of Inflection and Word Formation State University ofNew York Press Albany NY USA 1995

[75] K Geeraert J Newman and R H Baayen ldquoIdiom VariationExperimental Data and a Blueprint of a Computational ModelrdquoTopics in Cognitive Science vol 9 no 3 pp 653ndash669 2017

[76] C Shaoul S Bitschau N Schilling et al ldquondl2 Naive discrim-inative learning an implementation in R R packagerdquo 2015

[77] P Milin D Divjak and R H Baayen ldquoA learning perspectiveon individual differences in skilled reading Exploring andexploiting orthographic and semantic discrimination cuesrdquoJournal of Experimental Psychology Learning Memory andCognition vol 43 no 11 pp 1730ndash1751 2017

[78] M Ramscar P Hendrix C Shaoul P Milin and R H BaayenldquoNonlinear dynamics of lifelong learning the myth of cognitivedeclinerdquo Topics in Cognitive Science vol 6 pp 5ndash42 2014

[79] M Ramscar C C Sun P Hendrix and R H Baayen ldquoTheMis-measurement of Mind Life-Span Changes in Paired-Associate-Learning Scores Reflect the ldquoCostrdquo of Learning Not CognitiveDeclinerdquo Psychological Science vol 28 no 8 pp 1171ndash1179 2017

[80] G desRosiers and D Ivison ldquoPaired associate learning Form 1and form 2 of the Wechsler memory scalerdquo Archives of ClinicalNeuropsychology vol 3 no 1 pp 47ndash67 1988

[81] E Bruni N K Tran andM Baroni ldquoMultimodal distributionalsemanticsrdquo Journal of Artificial Intelligence Research vol 49 pp1ndash47 2014

[82] W N Venables and B D RipleyModern Applied Statistics withS-PLUS Springer NY USA 2002

[83] A B Warriner V Kuperman and M Brysbaert ldquoNorms ofvalence arousal and dominance for 13915 English lemmasrdquoBehavior Research Methods vol 45 no 4 pp 1191ndash1207 2013

[84] D Kastovsky ldquoProductivity in word formationrdquo Linguistics vol24 no 3 pp 585ndash600 1986

[85] R H Baayen Y Chuang and J P Blevins ldquoInflectionalmorphology with linear mappingsrdquoTheMental Lexicon vol 13no 2 pp 232ndash270 2018

[86] Burnage CELEX A Guide for Users Centre for Lexical Infor-mation Nijmegen The Netherlands 1988

[87] C McBride-Chang ldquoModels of Speech Perception and Phono-logical Processing in Readingrdquo Child Development vol 67 no4 pp 1836ndash1856 1996

[88] G C Van Orden H Kloos et al ldquoThe question of phonologyand readingrdquo inThe Science of Reading A Handbook pp 61ndash782005

[89] D Jared and K OrsquoDonnell ldquoSkilled adult readers activate themeanings of high-frequency words using phonology Evidencefrom eye trackingrdquoMemoryampCognition vol 45 no 2 pp 334ndash346 2017

[90] G S Dell ldquoA Spreading-Activation Theory of Retrieval inSentence Productionrdquo Psychological Review vol 93 no 3 pp283ndash321 1986

[91] A Marantz ldquoNo escape from morphemes in morphologicalprocessingrdquo Language and Cognitive Processes vol 28 no 7 pp905ndash916 2013

[92] M Bozic W D Marslen-Wilson E A Stamatakis M HDavis and L K Tyler ldquoDifferentiating morphology formand meaning Neural correlates of morphological complexityrdquoCognitive Neuroscience vol 19 no 9 pp 1464ndash1475 2007

[93] R F Port andA P Leary ldquoAgainst formal phonologyrdquoLanguagevol 81 no 4 pp 927ndash964 2005

[94] J P Blevins ldquoStems and paradigmsrdquo Language vol 79 no 4 pp737ndash767 2003

[95] R Forsyth and D Holmes ldquoFeature-finding for test classifica-tionrdquo Literary and Linguistic Computing vol 11 no 4 pp 163ndash174 1996

[96] H Pham and R H Baayen ldquoVietnamese compounds showan anti-frequency effect in visual lexical decisionrdquo LanguageCognition and Neuroscience vol 30 no 9 pp 1077ndash1095 2015

[97] L Cohen and S Dehaene ldquoVentral and dorsal contributions toword readingrdquo inTheCognitive Neurosciences M S GazzanigaEd pp 789ndash804 Massachusetts Institute of Technology 2009

[98] E Keuleers P Lacey K Rastle and M Brysbaert ldquoThe BritishLexicon Project Lexical decision data for 28730 monosyllabicand disyllabic English wordsrdquo Behavior Research Methods vol44 no 1 pp 287ndash304 2012

[99] R K Wagner J K Torgesen and C A Rashotte ldquoDevel-opment of reading-related phonological processing abilitiesnew evidence of bidirectional causality from a latent variablelongitudinal studyrdquoDevelopmental Psychology vol 30 no 1 pp73ndash87 1994

[100] K F EWong andH-C Chen ldquoOrthographic and phonologicalprocessing in reading Chinese text Evidence from eye fixa-tionsrdquo Language and Cognitive Processes vol 14 no 5-6 pp461ndash480 1999

[101] R L Newman D Jared and C A Haigh ldquoDoes phonologyplay a role when skilled readers read high-frequency wordsEvidence from ERPsrdquo Language and Cognitive Processes vol 27no 9 pp 1361ndash1384 2012

[102] D Jared J Ashby S J Agauas and B A Levy ldquoPhonologicalactivation of word meanings in grade 5 readersrdquo Journal ofExperimental Psychology LearningMemory andCognition vol42 no 4 pp 524ndash541 2016

[103] T Bitan A Kaftory A Meiri-Leib Z Eviatar and O PelegldquoPhonological ambiguity modulates resolution of semanticambiguity during reading An fMRI study of Hebrewrdquo Neu-ropsychology vol 31 no 7 pp 759ndash777 2017

[104] D Jared and S Bainbridge ldquoReading homophone puns Evi-dence from eye trackingrdquo Canadian Journal of ExperimentalPsychology vol 71 no 1 pp 2ndash13 2017

[105] S Amenta M Marelli and S Sulpizio ldquoFrom sound tomeaning Phonology-to-Semantics mapping in visual wordrecognitionrdquo Psychonomic Bulletin amp Review vol 24 no 3 pp887ndash893 2017

[106] M Perrone-Bertolotti J Kujala J R Vidal et al ldquoHow silent issilent reading intracerebral evidence for top-down activationof temporal voice areas during readingrdquo The Journal of Neuro-science vol 32 no 49 pp 17554ndash17562 2012

[107] B Yao P Belin andC Scheepers ldquoSilent reading of direct versusindirect speech activates voice-selective areas in the auditorycortexrdquo Journal of Cognitive Neuroscience vol 23 no 10 pp3146ndash3152 2011

[108] M Coltheart B Curtis P Atkins and M Haller ldquoMod-els of Reading Aloud Dual-Route and Parallel-Distributed-Processing Approachesrdquo Psychological Review vol 100 no 4pp 589ndash608 1993

[109] M Coltheart ldquoModeling reading The dual-route approachrdquo inThe Science of Reading A Handbook pp 6ndash23 2005

38 Complexity

[110] M Pitt K Johnson E Hume S Kiesling and W RaymondldquoThe Buckeye corpus of conversational speech Labeling con-ventions and a test of transcriber reliabilityrdquo Speech Communi-cation vol 45 no 1 pp 89ndash95 2005

[111] K Johnson ldquoMassive reduction in conversational AmericanEnglishrdquo in Proceedings of the Spontaneous Speech Data andAnalysis Proceedings of the 1st Session of the 10th InternationalSymposium pp 29ndash54 The National International Institute forJapanese Language Tokyo Japan 2004

[112] C Phillips ldquoLevels of representation in the electrophysiology ofspeech perceptionrdquo Cognitive Science vol 25 no 5 pp 711ndash7312001

[113] R L Diehl A J Lotto and L L Holt ldquoSpeech perceptionrdquoAnnual Review of Psychology vol 55 pp 149ndash179 2004

[114] D Norris and J M McQueen ldquoShortlist B a Bayesian model ofcontinuous speech recognitionrdquo Psychological Review vol 115no 2 pp 357ndash395 2008

[115] S Hawkins ldquoRoles and representations of systematic finephonetic detail in speech understandingrdquo Journal of Phoneticsvol 31 no 3-4 pp 373ndash405 2003

[116] CCucchiarini andH Strik ldquoAutomatic Phonetic TranscriptionAn overviewrdquo in Proceedings of the 15th ICPhS pp 347ndash350Barcelona Spain 2003

[117] K JohnsonTheAuditoryPerceptual Basis for Speech Segmenta-tion Ohio State University Working Papers in Linguistics 1997

[118] H Fletcher ldquoAuditory patternsrdquo Reviews of Modern Physics vol12 no 1 pp 47ndash65 1940

[119] D Arnold ldquoAcousticndlcoder Coding sound files for use withndl R package version 101rdquo 2017

[120] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2016

[121] J Pickett and I Pollack ldquoIntelligibility of excerpts from fluentspeech Effects of rate of utterance and duration of excerptrdquoLanguage and Speech vol 6 pp 151ndash164 1963

[122] L Shockey ldquoPerception of reduced forms by non-native speak-ers of Englishrdquo in Sound Patterns of Spontaneous Speech DDuez Ed pp 97ndash100 ESCA Aix 1998

[123] M Ernestus H Baayen and R Schreuder ldquoThe recognition ofreduced word formsrdquo Brain and Language vol 81 no 1-3 pp162ndash173 2002

[124] R H Baayen C Shaoul J Willits and M Ramscar ldquoCom-prehension without segmentation a proof of concept withnaive discriminative learningrdquo Language Cognition and Neu-roscience vol 31 no 1 pp 106ndash128 2016

[125] C P Browman and L Goldstein ldquoArticulatory phonology Anoverviewrdquo Phonetica vol 49 no 3-4 pp 155ndash180 1992

[126] J A Tourville and F H Guenther ldquoThe DIVA model A neuraltheory of speech acquisition and productionrdquo Language andCognitive Processes vol 26 no 7 pp 952ndash981 2011

[127] G Csardi and T Nepusz ldquoThe igraph software package forcomplex network researchrdquo InterJournal Complex Systems vol1695 no 5 pp 1ndash9 2006

[128] M Ernestus and R H Baayen ldquoPredicting the unpredictableInterpreting neutralized segments in Dutchrdquo Language vol 79no 1 pp 5ndash38 2003

[129] R Weingarten G Nottbusch and U Will ldquoMorphemes sylla-bles and graphemes in written word productionrdquo inMultidisci-plinary Approaches to Speech Production T Pechmann and CHabel Eds pp 529ndash572 Mouton de Gruyter Berlin Germany2004

[130] R Bertram F E Toslashnnessen S Stromqvist J Hyona andP Niemi ldquoCascaded processing in written compound wordproductionrdquo Frontiers in Human Neuroscience vol 9 pp 1ndash102015

[131] T Cho ldquoEffects of morpheme boundaries on intergesturaltiming Evidence from Koreanrdquo Phonetica vol 58 no 3 pp129ndash162 2001

[132] S N Wood Generalized Additive Models Chapman ampHallCRC NY USA 2017

[133] M Seidenberg ldquoSublexical structures in visual word recogni-tion Access units or orthographic redundancyrdquo in Attentionand Performance XII M Coltheart Ed pp 245ndash264 LawrenceErlbaum Associates Hove 1987

[134] J M McQueen ldquoSegmentation of continuous speech usingphonotacticsrdquo Journal of Memory and Language vol 39 no 1pp 21ndash46 1998

[135] J B Hay Causes and Consequences of Word Structure Rout-ledge NY USA London UK 2003

[136] J B Hay ldquoFrom speech perception to morphology Affix-ordering revisitedrdquo Language vol 78 pp 527ndash555 2002

[137] J B Hay and R H Baayen ldquoPhonotactics parsing and produc-tivityrdquo Italian Journal of Linguistics vol 15 no 1 pp 99ndash1302003

[138] J Hay J Pierrehumbert and M Beckman ldquoSpeech perceptionwell-formedness and the statistics of the lexiconrdquo Papers inlaboratory phonology VI pp 58ndash74 2004

[139] M Ferro C Marzi and V Pirrelli ldquoA self-organizing modelof word storage and processing implications for morphologylearningrdquo Lingue e Linguaggio vol 10 no 2 pp 209ndash226 2011

[140] F ChersiM Ferro G Pezzulo andV Pirrelli ldquoTopological self-organization and prediction learning support both action andlexical chains in the brainrdquo Topics in Cognitive Science vol 6no 3 pp 476ndash491 2014

[141] V Pirrelli M Ferro and C Marzi ldquoComputational complexityof abstractive morphologyrdquo in Understanding and MeasuringMorphological Complexity M Bearman D Brown and G GCorbett Eds pp 141ndash166 Oxford University Press 2015

[142] C Marzi M Ferro and V Pirrelli ldquoIs inflectional irregularitydysfunctional to human processingrdquo in Abstract Booklet TheMental Lexicon 2018 V Kuperman Ed University of AlbertaEdmonton 2018

[143] E Keuleers M Stevens P Mandera and M Brysbaert ldquoWordknowledge in the crowd Measuring vocabulary size and wordprevalence in a massive online experimentrdquo The QuarterlyJournal of Experimental Psychology vol 68 no 8 pp 1665ndash16922015

[144] M Ernestus Voice Assimilation And Segment Reduction inCasual Dutch A Corpus-Based Study of the Phonology-PhoneticsInterface LOT Utrecht the Netherlands 2000

[145] R Kemps M Ernestus R Schreuder and H Baayen ldquoProcess-ing reducedword formsThe suffix restoration effectrdquoBrain andLanguage vol 90 no 1-3 pp 117ndash127 2004

[146] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) vol 1 pp 886ndash893 June 2005

[147] G Hickok ldquoThe architecture of speech production and the roleof the phoneme in speech processingrdquo Language Cognition andNeuroscience vol 29 no 1 pp 2ndash20 2014

[148] B V Tucker M Sims and R H Baayen Opposing Forceson Acoustic Duration Manuscript University of Alberta andUniversity of Tubingen 2018

Complexity 39

[149] I Plag J Homann and G Kunter ldquoHomophony and mor-phology The acoustics of word-final S in Englishrdquo Journal ofLinguistics vol 53 no 1 pp 181ndash216 2017

[150] F Tomaschek I Plag M Ernestus and R H BaayenModeling the Duration of Word-Final s in English withNaive Discriminative Learning Manuscript University ofSiegenTubingenNijmegen 2018

[151] R E Kalman ldquoA new approach to linear filtering and predictionproblemsrdquo Journal of Fluids Engineering vol 82 no 1 pp 35ndash451960

[152] B Russell ldquoOn denotingrdquo Mind vol 14 no 56 pp 479ndash4931905

[153] N Hornstein Logical Form FromGB toMinimalism Blackwell1995

[154] R Jackendoff Semantic StructuresMITPress Cambridge 1990[155] J J McCarthy ldquoA prosodic theory of non-concatenative mor-

phologyrdquo Linguistic Inquiry vol 12 pp 373ndash418 1981[156] A Ussishkin ldquoA Fixed Prosodic Theory of Nonconcatenative

Templaticmorphologyrdquo Natural Language amp Linguistic Theoryvol 23 no 1 pp 169ndash218 2005

[157] A Ussishkin ldquoAffix-favored contrast inequity and psycholin-guistic grounding for non-concatenative morphologyrdquo Mor-phology vol 16 no 1 pp 107ndash125 2006

[158] RW Young andWMorganTheNavajo Language A Grammarand Colloquial Dictionary University of New Mexico Press1980

[159] S C Levinson and A Majid ldquoThe island of time Yelı Dnye thelanguage of Rossel islandrdquo Frontiers in psychology vol 4 2013

[160] S C Levinson ldquoThe language of space in Yelı Dnyerdquo inGrammars of Space Explorations in Cognitive Diversity pp 157ndash203 Cambridge University Press 2006

[161] E Shafaei-Bajestan and R H Baayen ldquoWide Learning forAuditory Comprehensionrdquo in Proceedings of the Interspeech2018 2018

[162] J Hay and K Drager ldquoStuffed toys and speech perceptionrdquoLinguistics vol 48 no 4 pp 865ndash892 2010

[163] V Henri and C Henri ldquoOn our earliest recollections ofchildhoodrdquo Psychological Review vol 2 pp 215-216 1895

[164] S Freud ldquoChildhood and concealing memoriesrdquo in The BasicWritings of Sigmund Freud The Modern Library NY USA19051953

[165] P J Bauer and M Larkina ldquoThe onset of childhood amnesiain childhood A prospective investigation of the course anddeterminants of forgetting of early-life eventsrdquoMemory vol 22no 8 pp 907ndash924 2014

[166] M S Seidenberg and L M Gonnerman ldquoExplaining deriva-tional morphology as the convergence of codesrdquo Trends inCognitive Sciences vol 4 no 9 pp 353ndash361 2000

[167] M Coltheart K Rastle C Perry R Langdon and J ZieglerldquoDRC a dual route cascaded model of visual word recognitionand reading aloudrdquoPsychological Review vol 108 no 1 pp 204ndash256 2001

[168] M V Butz and E F Kutter How the Mind Comes into BeingIntroducing Cognitive Science from a Functional and Computa-tional Perspective Oxford University Press 2016

[169] M W Harm and M S Seidenberg ldquoPhonology readingacquisition and dyslexia Insights from connectionist modelsrdquoPsychological Review vol 106 no 3 pp 491ndash528 1999

[170] L M Gonnerman M S Seidenberg and E S AndersenldquoGraded semantic and phonological similarity effects in prim-ing Evidence for a distributed connectionist approach to

morphologyrdquo Journal of Experimental Psychology General vol136 no 2 pp 323ndash345 2007

[171] H Zhao O Gallo I Frosio and J Kautz ldquoLoss Functions forImage Restoration With Neural Networksrdquo IEEE Transactionson Computational Imaging vol 3 no 1 pp 47ndash57 2017

[172] C Clarke and P Luce ldquoPerceptual adaptation to speakercharacteristics Vot boundaries in stop voicing categorizationrdquoin Proceedings of the ISCA Workshop on Plasticity in SpeechPerception 2005

[173] DNorris JMMcQueen andACutler ldquoPerceptual learning inspeechrdquo Cognitive Psychology vol 47 no 2 pp 204ndash238 2003

[174] A Schweitzer and N Lewandowski ldquoSocial factors in conver-gence of f1 and f2 in spontaneous speechrdquo in Proceedings ofthe 10th International Seminar on Speech Production CologneGermany 2014

[175] M J Pickering and S Garrod ldquoToward a mechanistic psychol-ogy of dialoguerdquo Behavioral and Brain Sciences vol 27 no 2 pp169ndash190 2004

[176] C M Clarke and M F Garrett ldquoRapid adaptation to foreign-accented Englishrdquo The Journal of the Acoustical Society ofAmerica vol 116 no 6 pp 3647ndash3658 2004

[177] W J Levelt ldquoDie konnektionistische moderdquo Sprache UndKognition vol 10 no 2 pp 61ndash72 1991

[178] A Kostic ldquoInformational load constraints on processinginflected morphologyrdquo in Morphological Aspects of LanguageProcessing L B Feldman Ed Lawrence Erlbaum Inc Publish-ers New Jersey USA 1995

[179] J P BlevinsWord andParadigmMorphology OxfordUniversityPress 2016

[180] E Forstemann Uber Deutsche volksetymologie Zeitschrift furvergleichende Sprachforschung auf dem Gebiete des DeutschenGriechischen und Lateinischen 1852

[181] K Nault Morphological Therapy Protocal [Ph D thesis] Uni-versity of Alberta Edmonton 2010

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 18: The Discriminative Lexicon: A Unified Computational Model ...downloads.hindawi.com/journals/complexity/2019/4895891.pdf · as developed within computational linguistics and com-puter

18 Complexity

Table 6 Summary of a generalized additive model fitted to response latencies in visual lexical decision using measures from ndl te tensorproduct smooth

A parametric coefficients Estimate Std Error t-value p-valueintercept -16934 00086 -195933 lt0001word typeInflected 00286 00052 5486 lt0001word typeMonomorphemic 00042 00055 0770 = 4word length 00066 00011 5893 lt0001B smooth terms edf Refdf F p-valuete(activation activation diversity prior) 4134 4972 1084 lt0001

than the number of trigram features (3465 trigrams ver-sus 5929 triphones) More features which mathematicallyamount to more predictors enable more precise map-pings Furthermore the heavy use made in English ofletter sequences such as ough (with 10 different pro-nunciations httpswwwdictionarycomesough)reduces semantic discriminability compared to the corre-sponding spoken forms It is noteworthy however that thebenefits of the triphone-to-semantics mapping are possibleonly thanks to the high accuracy with which orthographictrigram vectors are mapped onto phonological triphonevectors (92)

It is unlikely that the lsquophonological routersquo is always dom-inant in silent reading Especially in fast lsquodiagonalrsquo readingthe lsquodirect routersquomay bemore dominantThere is remarkablealthough for the present authors unexpected convergencewith the dual route model for reading aloud of Coltheartet al [108] and Coltheart [109] However while the text-to-phonology route of their model has as primary function toexplain why nonwords can be pronounced our results showthat both routes can actually be active when silently readingreal words A fundamental difference is however that in ourmodel wordsrsquo semantics play a central role

43 Auditory Comprehension For the modeling of readingwe made use of letter trigrams as cues These cues abstractaway from the actual visual patterns that fall on the retinapatterns that are already transformed at the retina beforebeing sent to the visual cortex Our hypothesis is that lettertrigrams represent those high-level cells or cell assemblies inthe visual system that are critical for reading andwe thereforeleave the modeling possibly with deep learning networksof how patterns on the retina are transformed into lettertrigrams for further research

As a consequence of the high level of abstraction of thetrigrams a word form is represented by a unique vectorspecifying which of a fixed set of letter trigrams is present inthe word Although one might consider modeling auditorycomprehension with phone triplets (triphones) replacing theletter trigrams of visual comprehension such an approachwould not do justice to the enormous variability of actualspeech Whereas the modeling of reading printed wordscan depart from the assumption that the pixels of a wordrsquosletters on a computer screen are in a fixed configurationindependently of where the word is shown on the screenthe speech signal of the same word type varies from token

to token as illustrated in Figure 7 for the English wordcrisis A survey of the Buckeye corpus [110] of spontaneousconversations recorded at Columbus Ohio [111] indicatesthat around 5 of the words are spoken with one syllablemissing and that a little over 20 of words have at least onephone missing

It is widely believed that the phoneme as an abstractunit of sound is essential for coming to grips with thehuge variability that characterizes the speech signal [112ndash114]However the phoneme as theoretical linguistic constructis deeply problematic [93] and for many spoken formscanonical phonemes do not do justice to the phonetics ofthe actual sounds [115] Furthermore if words are defined assequences of phones the problem arises what representationsto posit for words with two ormore reduced variants Addingentries for reduced forms to the lexicon turns out not toafford better overal recognition [116] Although exemplarmodels have been put forward to overcome this problem[117] we take a different approach here and following Arnoldet al [18] lay out a discriminative approach to auditorycomprehension

The cues that we make use of to represent the acousticsignal are the Frequency Band Summary Features (FBSFs)introduced by Arnold et al [18] as input cues FBSFs summa-rize the information present in the spectrogram of a speechsignal The algorithm that derives FBSFs first chunks theinput at the minima of the Hilbert amplitude envelope of thesignalrsquos oscillogram (see the upper panel of Figure 8) Foreach chunk the algorithm distinguishes 21 frequency bandsin the MEL scaled spectrum and intensities are discretizedinto 5 levels (lower panel in Figure 8) for small intervalsof time For each chunk and for each frequency band inthese chunks a discrete feature is derived that specifies chunknumber frequency band number and a description of thetemporal variation in the band bringing together minimummaximum median initial and final intensity values The21 frequency bands are inspired by the 21 receptive areason the cochlear membrane that are sensitive to differentranges of frequencies [118] Thus a given FBSF is a proxyfor cell assemblies in the auditory cortex that respond to aparticular pattern of changes over time in spectral intensityThe AcousticNDLCodeR package [119] for R [120] wasemployed to extract the FBSFs from the audio files

We tested ldl on 20 hours of speech sampled from theaudio files of the ucla library broadcast newsscapedata a vast repository of multimodal TV news broadcasts

Complexity 19

0 5000 10000 15000 20000 25000 30000 35000 40000minus6000

minus4000

minus2000

0

2000

4000

6000

8000

0 5000 10000 15000 20000 25000 30000 35000 40000minus6000

minus4000

minus2000

0

2000

4000

6000

8000

Figure 7Oscillogram for twodifferent realizations of theword crisiswith different degrees of reduction in theNewsScape archive [kraız](left)and [khraısıs](right)

00 01 02 03 04 05 06time

minus020minus015minus010minus005

000005010015

ampl

itude

1

21

frequ

ency

ban

d

Figure 8 Oscillogram of the speech signal for a realization of the word economic with Hilbert envelope (in orange) outlining the signal isshown on the top panel The lower panel depicts the discretized MEL scaled spectrum of the signalThe vertical bars are the boundary pointsthat partition the signal into 4 chunks For one chunk horizontal lines (in red) highlight one of the frequency bands for which the FBSFsprovide summaries of the variation over time in that frequency band The FBSF for the highlighted band is ldquoband11-start3-median3-min2-max5-end3-part2rdquo

provided to us by the Distributed Little Red Hen Lab Theaudio files of this resource were automatically classified asclean for relatively clean parts where there is speech withoutbackground noise or music and noisy for speech snippetswhere background noise or music is present Here we reportresults for 20 hours of clean speech to a total of 131673word tokens (representing 4779word types) with in all 40639distinct FBSFs

The FBSFs for the word tokens are brought together ina matrix C119886 with dimensions 131673 audio tokens times 40639FBSFs The targeted semantic vectors are taken from the Smatrix which is expanded to a matrix with 131673 rows onefor each audio token and 4609 columns the dimension of

the semantic vectors Although the transformation matrix Fcould be obtained by calculating C1015840S the calculation of C1015840is numerically expensive To reduce computational costs wecalculated F as follows (the transpose of a square matrix Xdenoted by X119879 is obtained by replacing the upper triangleof the matrix by the lower triangle and vice versa Thus( 3 87 2 )119879 = ( 3 78 2 ) For a nonsquare matrix the transpose isobtained by switching rows and columnsThus a 4times2 matrixbecomes a 2 times 4 matrix when transposed)

CF = S

C119879CF = C119879S

20 Complexity

(C119879C)minus1 (C119879C) F = (C119879C)minus1 C119879SF = (C119879C)minus1 C119879S

(17)

In this way matrix inversion is required for a much smallermatrixC119879C which is a square matrix of size 40639 times 40639

To evaluate model performance we compared the esti-mated semantic vectors with the targeted semantic vectorsusing the Pearson correlation We therefore calculated the131673 times 131673 correlation matrix for all possible pairs ofestimated and targeted semantic vectors Precisionwas calcu-lated by comparing predicted vectors with the gold standardprovided by the targeted vectors Recognition was defined tobe successful if the correlation of the predicted vector withthe targeted gold vector was the highest of all the pairwisecorrelations of this predicted vector with any of the other goldsemantic vectors Precision defined as the proportion of cor-rect recognitions divided by the total number of audio tokenswas at 3361 For the correctly identified words the meancorrelation was 072 and for the incorrectly identified wordsit was 055 To place this in perspective a naive discrimi-nation model with discrete lexomes as output performed at12 and a deep convolution network Mozilla DeepSpeech(httpsgithubcommozillaDeepSpeech based onHannun et al [9]) performed at 6 The low performanceofMozilla DeepSpeech is due primarily due to its dependenceon a languagemodelWhenpresentedwith utterances insteadof single words it performs remarkably well

44 Discussion A problem with naive discriminative learn-ing that has been puzzling for a long time is that measuresbased on ndl performed well as predictors of processingtimes [54 58] whereas accuracy for lexical decisions waslow An accuracy of 27 for a model performing an 11480-classification task is perhaps reasonable but the lack ofprecision is unsatisfactory when the goal is to model humanvisual lexicality decisions Bymoving fromndl to ldlmodelaccuracy is substantially improved (to 59) At the sametime predictions for reaction times improved considerablyas well As lexicality decisions do not require word identifica-tion further improvement in predicting decision behavior isexpected to be possible by considering not only whether thepredicted semantic is closest to the targeted vector but alsomeasures such as how densely the space around the predictedsemantic vector is populated

Accuracy for auditory comprehension is lower for thedata we considered above at around 33 Interestingly aseries of studies indicates that recognizing isolated wordstaken out of running speech is a nontrivial task also forhuman listeners [121ndash123] Correct identification by nativespeakers of 1000 randomly sampled word tokens from aGerman corpus of spontaneous conversational speech rangedbetween 21 and 44 [18] For both human listeners andautomatic speech recognition systems recognition improvesconsiderably when words are presented in their naturalcontext Given that ldl with FBSFs performs very wellon isolated word recognition it seems worth investigating

further whether the present approach can be developed into afully-fledged model of auditory comprehension that can takefull utterances as input For a blueprint of how we plan toimplement such a model see Baayen et al [124]

5 Speech Production

This section examines whether we can predict wordsrsquo formsfrom the semantic vectors of S If this is possible for thepresent dataset with reasonable accuracy we have a proofof concept that discriminative morphology is feasible notonly for comprehension but also for speech productionThe first subsection introduces the computational imple-mentation The next subsection reports on the modelrsquosperformance which is evaluated first for monomorphemicwords then for inflected words and finally for derivedwords Section 53 provides further evidence for the pro-duction network by showing that as the support from thesemantics for the triphones becomes weaker the amount oftime required for articulating the corresponding segmentsincreases

51 Computational Implementation For a productionmodelsome representational format is required for the output thatin turn drives articulation In what follows we make useof triphones as output features Triphones capture part ofthe contextual dependencies that characterize speech andthat render problematic the phoneme as elementary unit ofa phonological calculus [93] Triphones are in many waysnot ideal in that they inherit the limitations that come withdiscrete units Other output features structured along thelines of gestural phonology [125] or time series of movementsof key articulators registered with electromagnetic articulog-raphy or ultrasound are on our list for further explorationFor now we use triphones as a convenience construct andwe will show that given the support from the semantics forthe triphones the sequence of phones can be reconstructedwith high accuracy As some models of speech productionassemble articulatory targets from phone segments (eg[126]) the present implementation can be seen as a front-endfor this type of model

Before putting the model to the test we first clarify theway the model works by means of our toy lexicon with thewords one two and three Above we introduced the semanticmatrix S (13) which we repeat here for convenience

S = one two threeonetwothree

( 100201031001

040110 ) (18)

as well as an C indicator matrix specifying which triphonesoccur in which words (12) As in what follows this matrixspecifies the triphones targeted for productionwe henceforthrefer to this matrix as the Tmatrix

Complexity 21

T

= wV wVn Vn tu tu Tr Tri rionetwothree

( 100100

100010

010001

001001 ) (19)

For production our interest is in thematrixG that transformsthe row vectors of S into the row vectors of T ie we need tosolve

SG = T (20)

Given G we can predict for any semantic vector s the vectorof triphones t that quantifies the support for the triphonesprovided by s simply by multiplying s with G

t = sG (21)

As before the transformation matrix G is straightforwardto estimate Let S1015840 denote the Moore-Penrose generalizedinverse of S Since

S1015840SG = S1015840T (22)

we have

G = S1015840T (23)

Given G we can predict the triphone matrix T from thesemantic matrix S

SG = T (24)

For the present example the inverse of S and S1015840 is

S1015840 = one two threeonetwothree

( 110minus021minus009minus029107minus008

minus041minus002104 ) (25)

and the transformation matrix G is

G = wV wVn Vn tu tu Tr Tri ri

onetwothree

( 110minus021minus009110minus021minus009

110minus021minus009minus029107minus008

minus029107minus008minus041minus002104

minus041minus002104minus041minus002104 ) (26)

For this simple example T is virtually identical to T Forrealistic data T will not be identical to T but will be anapproximation of it that is optimal in the least squares senseThe triphones with the strongest support are expected to bethe most likely to be the triphones defining a wordrsquos form

We made use of the S matrix which we derived fromthe tasa corpus as described in Section 3 The majority ofcolumns of the 23561 times 23561 matrix S show very smalldeviations from zero and hence are uninformative As beforewe reduced the number of columns of S by removing columnswith very low variance Here one option is to remove allcolumns with a variance below a preset threshold 120579 Howeverto avoid adding a threshold as a free parameter we setthe number of columns retained to the number of differenttriphones in a given dataset a number which is around 119899 =4500 In summary S denotes a 119908times119899 matrix that specifies foreach of 119908 words an 119899-dimensional semantic vector

52 Model Performance

521 Performance on Monolexomic Words We first exam-ined model performance on a dataset comprising monolex-omicwords that did not carry any inflectional exponentsThisdataset of 3987words comprised three irregular comparatives

(elder less andmore) two irregular superlatives (least most)28 irregular past tense forms and one irregular past participle(smelt) For this set of words we constructed a 3987 times 4446matrix of semantic vectors S119898 (a submatrix of the S matrixintroduced above in Section 3) and a 3987 times 4446 triphonematrix T119898 (a submatrix of T) The number of colums of S119898was set to the number of different triphones (the columndimension of T119898) Those column vectors of S119898 were retainedthat had the 4446 highest variances We then estimated thetransformation matrix G and used this matrix to predict thetriphones that define wordsrsquo forms

We evaluated model performance in two ways First weinspected whether the triphones with maximal support wereindeed the targeted triphones This turned out to be the casefor all words Targeted triphones had an activation valueclose to one and nontargeted triphones an activation closeto zero As the triphones are not ordered we also investi-gated whether the sequence of phones could be constructedcorrectly for these words To this end we set a thresholdof 099 extracted all triphones with an activation exceedingthis threshold and used the all simple paths (a path issimple if the vertices it visits are not visited more than once)function from the igraph package [127] to calculate all pathsstarting with any left-edge triphone in the set of extractedtriphones From the resulting set of paths we selected the

22 Complexity

longest path which invariably was perfectly aligned with thesequence of triphones that defines wordsrsquo forms

We also evaluated model performance with a secondmore general heuristic algorithm that also makes use of thesame algorithm from graph theory Our algorithm sets up agraph with vertices collected from the triphones that are bestsupported by the relevant semantic vectors and considers allpaths it can find that lead from an initial triphone to a finaltriphoneThis algorithm which is presented inmore detail inthe appendix and which is essential for novel complex wordsproduced the correct form for 3982 out of 3987 words Itselected a shorter form for five words int for intent lin forlinnen mis for mistress oint for ointment and pin for pippinThe correct forms were also found but ranked second due toa simple length penalty that is implemented in the algorithm

From these analyses it is clear that mapping nearly4000 semantic vectors on their corresponding triphone pathscan be accomplished with very high accuracy for Englishmonolexomic words The question to be addressed nextis how well this approach works for complex words Wefirst address inflected forms and limit ourselves here to theinflected variants of the present set of 3987 monolexomicwords

522 Performance on Inflected Words Following the classicdistinction between inflection and word formation inflectedwords did not receive semantic vectors of their own Never-theless we can create semantic vectors for inflected words byadding the semantic vector of an inflectional function to thesemantic vector of its base However there are several ways inwhich themapping frommeaning to form for inflectedwordscan be set up To explain this we need some further notation

Let S119898 and T119898 denote the submatrices of S and T thatcontain the semantic and triphone vectors of monolexomicwords Assume that a subset of 119896 of these monolexomicwords is attested in the training corpus with inflectionalfunction 119886 Let T119886 denote the matrix with the triphonevectors of these inflected words and let S119886 denote thecorresponding semantic vectors To obtain S119886 we take thepertinent submatrix S119898119886 from S119898 and add the semantic vectors119886 of the affix

S119886 = S119898119886 + i⨂ s119886 (27)

Here i is a unit vector of length 119896 and ⨂ is the generalizedKronecker product which in (27) stacks 119896 copies of s119886 row-wise As a first step we could define a separate mapping G119886for each inflectional function 119886

S119886G119886 = T119886 (28)

but in this set-up learning of inflected words does not benefitfrom the knowledge of the base words This can be remediedby a mapping for augmented matrices that contain the rowvectors for both base words and inflected words[S119898

S119886]G119886 = [T119898

T119886] (29)

The dimension of G119886 (length of semantic vector by lengthof triphone vector) remains the same so this option is notmore costly than the preceding one Nevertheless for eachinflectional function a separate large matrix is required Amuch more parsimonious solution is to build augmentedmatrices for base words and all inflected words jointly

[[[[[[[[[[S119898S1198861S1198862S119886119899

]]]]]]]]]]G = [[[[[[[[[[

T119898T1198861T1198862T119886119899

]]]]]]]]]](30)

The dimension of G is identical to that of G119886 but now allinflectional functions are dealt with by a single mappingIn what follows we report the results obtained with thismapping

We selected 6595 inflected variants which met the cri-terion that the frequency of the corresponding inflectionalfunction was at least 50 This resulted in a dataset with 91comparatives 97 superlatives 2401 plurals 1333 continuousforms (eg walking) 859 past tense forms and 1086 formsclassed as perfective (past participles) as well as 728 thirdperson verb forms (eg walks) Many forms can be analyzedas either past tenses or as past participles We followed theanalyses of the treetagger which resulted in a dataset inwhichboth inflectional functions are well-attested

Following (30) we obtained an (augmented) 10582 times5483 semantic matrix S whereas before we retained the 5483columns with the highest column variance The (augmented)triphone matrix T for this dataset had the same dimensions

Inspection of the activations of the triphones revealedthat targeted triphones had top activations for 85 of themonolexomic words and 86 of the inflected words Theproportion of words with at most one intruding triphone was97 for both monolexomic and inflected words The graph-based algorithm performed with an overall accuracy of 94accuracies broken down bymorphology revealed an accuracyof 99 for the monolexomic words and an accuracy of 92for inflected words One source of errors for the productionalgorithm is inconsistent coding in the celex database Forinstance the stem of prosper is coded as having a final schwafollowed by r but the inflected forms are coded without ther creating a mismatch between a partially rhotic stem andcompletely non-rhotic inflected variants

We next put model performance to a more stringent testby using 10-fold cross-validation for the inflected variantsFor each fold we trained on all stems and 90 of all inflectedforms and then evaluated performance on the 10of inflectedforms that were not seen in training In this way we can ascer-tain the extent to which our production system (network plusgraph-based algorithm for ordering triphones) is productiveWe excluded from the cross-validation procedure irregularinflected forms forms with celex phonological forms withinconsistent within-paradigm rhoticism as well as forms thestem of which was not available in the training set Thus

Complexity 23

cross-validation was carried out for a total of 6236 inflectedforms

As before the semantic vectors for inflected words wereobtained by addition of the corresponding content and inflec-tional semantic vectors For each training set we calculatedthe transformationmatrixG from the S andTmatrices of thattraining set For an out-of-bag inflected form in the test setwe calculated its semantic vector and multiplied this vectorwith the transformation matrix (using (21)) to obtain thepredicted triphone vector t

The proportion of forms that were predicted correctlywas 062 The proportion of forms ranked second was 025Forms that were incorrectly ranked first typically were otherinflected forms (including bare stems) that happened toreceive stronger support than the targeted form Such errorsare not uncommon in spoken English For instance inthe Buckeye corpus [110] closest is once reduced to closFurthermore Dell [90] classified forms such as concludementfor conclusion and he relax for he relaxes as (noncontextual)errors

The algorithm failed to produce the targeted form for3 of the cases Examples of the forms produced instead arethe past tense for blazed being realized with [zId] insteadof [zd] the plural mouths being predicted as having [Ts] ascoda rather than [Ds] and finest being reduced to finst Thevoiceless production formouth does not follow the dictionarynorm but is used as attested by on-line pronunciationdictionaries Furthermore the voicing alternation is partlyunpredictable (see eg [128] for final devoicing in Dutch)and hence model performance here is not unreasonable Wenext consider model accuracy for derived words

523 Performance on Derived Words When building thevector space model we distinguished between inflectionand word formation Inflected words did not receive theirown semantic vectors By contrast each derived word wasassigned its own lexome together with a lexome for itsderivational function Thus happiness was paired with twolexomes happiness and ness Since the semantic matrix forour dataset already contains semantic vectors for derivedwords we first investigated how well forms are predictedwhen derived words are assessed along with monolexomicwords without any further differentiation between the twoTo this end we constructed a semantic matrix S for 4885words (rows) by 4993 (columns) and constructed the trans-formation matrix G from this matrix and the correspondingtriphone matrix T The predicted form vectors of T =SG supported the targeted triphones above all other tri-phones without exception Furthermore the graph-basedalgorithm correctly reconstructed 99 of all forms withonly 5 cases where it assigned the correct form secondrank

Next we inspected how the algorithm performs whenthe semantic vectors of derived forms are obtained fromthe semantic vectors of their base words and those of theirderivational lexomes instead of using the semantic vectors ofthe derived words themselves To allow subsequent evalua-tion by cross-validation we selected those derived words that

contained an affix that occured at least 30 times in our data set(again (38) agent (177) ful (45) instrument (82) less(54) ly (127) and ness (57)) to a total of 770 complex words

We first combined these derived words with the 3987monolexomic words For the resulting 4885 words the Tand S matrices were constructed from which we derivedthe transformation matrix G and subsequently the matrixof predicted triphone strengths T The proportion of wordsfor which the targeted triphones were the best supportedtriphones was 096 and the graph algorithm performed withan accuracy of 989

In order to assess the productivity of the system weevaluated performance on derived words by means of 10-fold cross-validation For each fold we made sure that eachaffix was present proportionally to its frequency in the overalldataset and that a derived wordrsquos base word was included inthe training set

We first examined performance when the transformationmatrix is estimated from a semantic matrix that contains thesemantic vectors of the derived words themselves whereassemantic vectors for unseen derived words are obtained bysummation of the semantic vectors of base and affix It turnsout that this set-up results in a total failure In the presentframework the semantic vectors of derived words are tooidiosyncratic and too finely tuned to their own collocationalpreferences They are too scattered in semantic space tosupport a transformation matrix that supports the triphonesof both stem and affix

We then used exactly the same cross-validation procedureas outlined above for inflected words constructing semanticvectors for derived words from the semantic vectors of baseand affix and calculating the transformation matrix G fromthese summed vectors For unseen derivedwordsGwas usedto transform the semantic vectors for novel derived words(obtained by summing the vectors of base and affix) intotriphone space

For 75 of the derived words the graph algorithmreconstructed the targeted triphones For 14 of the derivednouns the targeted form was not retrieved These includecases such as resound which the model produced with [s]instead of the (unexpected) [z] sewer which the modelproduced with R instead of only R and tumbler where themodel used syllabic [l] instead of nonsyllabic [l] given in thecelex target form

53 Weak Links in the Triphone Graph and Delays in SpeechProduction An important property of the model is that thesupport for trigrams changes where a wordrsquos graph branchesout for different morphological variants This is illustratedin Figure 9 for the inflected form blending Support for thestem-final triphone End is still at 1 but then blend can endor continue as blends blended or blending The resultinguncertainty is reflected in the weights on the edges leavingEnd In this example the targeted ing form is driven by theinflectional semantic vector for continuous and hence theedge to ndI is best supported For other forms the edgeweights will be different and hence other paths will be bettersupported

24 Complexity

1

1

1051

024023

03

023

1

001 001

1

02

03

023

002

1 0051

bl

blE

lEn

EndndI

dIN

IN

INI

NIN

dId

Id

IdI

dII

IIN

ndz

dz

dzI

zIN

nd

EnI

nIN

lEI

EIN

Figure 9 The directed graph for blending Vertices represent triphones including triphones such as IIN that were posited by the graphalgorithm to bridge potentially relevant transitions that are not instantiated in the training data Edges are labelled with their edge weightsThe graph the edge weights of which are specific to the lexomes blend and continuous incorporates not only the path for blending butalso the paths for blend blends and blended

The uncertainty that arises at branching points in thetriphone graph is of interest in the light of several exper-imental results For instance inter keystroke intervals intyping become longer at syllable and morph boundaries[129 130] Evidence from articulography suggests variabilityis greater at morph boundaries [131] Longer keystroke exe-cution times and greater articulatory variability are exactlywhat is expected under reduced edge support We thereforeexamined whether the edge weights at the first branchingpoint are predictive for lexical processing To this end weinvestigated the acoustic duration of the segment at the centerof the first triphone with a reduced LDL edge weight (in thepresent example d in ndI henceforth lsquobranching segmentrsquo)for those words in our study that are attested in the Buckeyecorpus [110]

The dataset we extracted from the Buckeye corpus com-prised 15105 tokens of a total of 1327 word types collectedfrom 40 speakers For each of these words we calculatedthe relative duration of the branching segment calculatedby dividing segment duration by word duration For thepurposes of statistical evaluation the distribution of thisrelative duration was brought closer to normality bymeans ofa logarithmic transformationWe fitted a generalized additivemixed model [132] to log relative duration with randomintercepts for word and speaker log LDL edge weight aspredictor of interest and local speech rate (Phrase Rate)log neighborhood density (Ncount) and log word frequencyas control variables A summary of this model is presentedin Table 7 As expected an increase in LDL Edge Weight

goes hand in hand with a reduction in the duration of thebranching segment In other words when the edge weight isreduced production is slowed

The adverse effects of weaknesses in a wordrsquos path inthe graph where the path branches is of interest againstthe discussion about the function of weak links in diphonetransitions in the literature on comprehension [133ndash138] Forcomprehension it has been argued that bigram or diphonelsquotroughsrsquo ie low transitional probabilities in a chain of hightransitional probabilities provide points where sequencesare segmented and parsed into their constituents Fromthe perspective of discrimination learning however low-probability transitions function in exactly the opposite wayfor comprehension [54 124] Naive discriminative learningalso predicts that troughs should give rise to shorter process-ing times in comprehension but not because morphologicaldecomposition would proceed more effectively Since high-frequency boundary bigrams and diphones are typically usedword-internally across many words they have a low cuevalidity for thesewords Conversely low-frequency boundarybigrams are much more typical for very specific base+affixcombinations and hence are better discriminative cues thatafford enhanced activation of wordsrsquo lexomes This in turngives rise to faster processing (see also Ramscar et al [78] forthe discriminative function of low-frequency bigrams in low-frequency monolexomic words)

There is remarkable convergence between the directedgraphs for speech production such as illustrated in Figure 9and computational models using temporal self-organizing

Complexity 25

Table 7 Statistics for the partial effects of a generalized additive mixed model fitted to the relative duration of edge segments in the Buckeyecorpus The trend for log frequency is positive accelerating TPRS thin plate regression spline

A parametric coefficients Estimate Std Error t-value p-valueIntercept -19165 00592 -323908 lt 00001Log LDL Edge Weight -01358 00400 -33960 00007Phrase Rate 00047 00021 22449 00248Log Ncount 01114 00189 58837 lt 00001B smooth terms edf Refdf F-value p-valueTPRS log Frequency 19007 19050 75495 00004random intercepts word 11498839 13230000 314919 lt 00001random intercepts speaker 302995 390000 345579 lt 00001

maps (TSOMs [139ndash141]) TSOMs also use error-drivenlearning and in recent work attention is also drawn toweak edges in wordsrsquo paths in the TSOM and the conse-quences thereof for lexical processing [142] We are not usingTSOMs however one reason being that in our experiencethey do not scale up well to realistically sized lexiconsA second reason is that we are pursuing the hypothesisthat form serves meaning and that self-organization ofform by itself is not necessary This hypothesis is likelytoo strong especially as we do not provide a rationale forthe trigram and triphone units that we use to representaspects of form It is worth noting that spatial organizationof form features is not restricted to TSOMs Although inour approach the spatial organization of triphone features isleft unspecified such an organization can be easily enforced(see [85] for further details) by using an algorithm suchas graphopt which self-organizes the vertices of a graphinto a two-dimensional plane Figure 9 was obtained withthis algorithm (Graphopt was developed for the layout oflarge graphs httpwwwschmuhlorggraphopt andis implemented in the igraph package Graphopt uses basicprinciples of physics to iteratively determine an optimallayout Each node in the graph is given both mass and anelectric charge and edges between nodes are modeled asspringsThis sets up a system inwhich there are attracting andrepelling forces between the vertices of the graph and thisphysical system is simulated until it reaches an equilibrium)

54 Discussion Our production network reconstructsknown wordsrsquo forms with a high accuracy 999 formonolexomic words 92 for inflected words and 99 forderived words For novel complex words accuracy under10-fold cross-validation was 62 for inflected words and 75for derived words

The drop in performance for novel forms is perhapsunsurprising given that speakers understand many morewords than they themselves produce even though they hearnovel forms on a fairly regular basis as they proceed throughlife [79 143]

However we also encountered several technical problemsthat are not due to the algorithm but to the representationsthat the algorithm has had to work with First it is surprisingthat accuracy is as high as it is given that the semanticvectors are constructed from a small corpus with a very

simple discriminative algorithm Second we encounteredinconsistencies in the phonological forms retrieved from thecelex database inconsistencies that in part are due to theuse of discrete triphones Third many cases where the modelpredictions do not match the targeted triphone sequence thetargeted forms have a minor irregularity (eg resound with[z] instead of [s]) Fourth several of the typical errors that themodel makes are known kinds of speech errors or reducedforms that one might encounter in engaged conversationalspeech

It is noteworthy that themodel is almost completely data-driven The triphones are derived from celex and otherthan somemanual corrections for inconsistencies are derivedautomatically from wordsrsquo phone sequences The semanticvectors are based on the tasa corpus and were not in any wayoptimized for the production model Given the triphone andsemantic vectors the transformation matrix is completelydetermined No by-hand engineering of rules and exceptionsis required nor is it necessary to specify with hand-codedlinks what the first segment of a word is what its secondsegment is etc as in the weaver model [37] It is only in theheuristic graph algorithm that three thresholds are requiredin order to avoid that graphs become too large to remaincomputationally tractable

The speech production system that emerges from thisapproach comprises first of all an excellent memory forforms that have been encountered before Importantly thismemory is not a static memory but a dynamic one It is not arepository of stored forms but it reconstructs the forms fromthe semantics it is requested to encode For regular unseenforms however a second network is required that projects aregularized semantic space (obtained by accumulation of thesemantic vectors of content and inflectional or derivationalfunctions) onto the triphone output space Importantly itappears that no separate transformations or rules are requiredfor individual inflectional or derivational functions

Jointly the two networks both of which can also betrained incrementally using the learning rule ofWidrow-Hoff[44] define a dual route model attempts to build a singleintegrated network were not successfulThe semantic vectorsof derived words are too idiosyncratic to allow generalizationfor novel forms It is the property of the semantic vectorsof derived lexomes described in Section 323 namely thatthey are close to but not inside the cloud of their content

26 Complexity

lexomes which makes them productive Novel forms do notpartake in the idiosyncracies of lexicalized existing wordsbut instead remain bound to base and derivational lexomesIt is exactly this property that makes them pronounceableOnce produced a novel form will then gravitate as experi-ence accumulates towards the cloud of semantic vectors ofits morphological category meanwhile also developing itsown idiosyncracies in pronunciation including sometimeshighly reduced pronunciation variants (eg tyk for Dutchnatuurlijk ([natyrlk]) [111 144 145] It is perhaps possibleto merge the two routes into one but for this much morefine-grained semantic vectors are required that incorporateinformation about discourse and the speakerrsquos stance withrespect to the adressee (see eg [115] for detailed discussionof such factors)

6 Bringing in Time

Thus far we have not considered the role of time The audiosignal comes in over time longerwords are typically readwithmore than one fixation and likewise articulation is a temporalprocess In this section we briefly outline how time can bebrought into the model We do so by discussing the readingof proper names

Proper names (morphologically compounds) pose achallenge to compositional theories as the people referredto by names such as Richard Dawkins Richard NixonRichardThompson and SandraThompson are not obviouslysemantic composites of each other We therefore assignlexomes to names irrespective of whether the individu-als referred to are real or fictive alive or dead Further-more we assume that when the personal name and familyname receive their own fixations both names are under-stood as pertaining to the same named entity which istherefore coupled with its own unique lexome Thus forthe example sentences in Table 8 the letter trigrams ofJohn are paired with the lexome JohnClark and like-wise the letter trigrams of Clark are paired with this lex-ome

By way of illustration we obtained thematrix of semanticvectors training on 100 randomly ordered tokens of thesentences of Table 8 We then constructed a trigram cuematrix C specifying for each word (John Clark wrote )which trigrams it contains In parallel a matrix L specify-ing for each word its corresponding lexomes (JohnClarkJohnClark write ) was set up We then calculatedthe matrix F by solving CF = L and used F to cal-culate estimated (predicted) semantic vectors L Figure 10presents the correlations of the estimated semantic vectorsfor the word forms John John Welsch Janet and Clarkwith the targeted semantic vectors for the named entitiesJohnClark JohnWelsch JohnWiggam AnneHastieand JaneClark For the sequentially read words John andWelsch the semantic vector generated by John and thatgenerated by Welsch were summed to obtain the integratedsemantic vector for the composite name

Figure 10 upper left panel illustrates that upon readingthe personal name John there is considerable uncertainty

about which named entity is at issue When subsequentlythe family name is read (upper right panel) uncertainty isreduced and John Welsch now receives full support whereasother named entities have correlations close to zero or evennegative correlations As in the small world of the presenttoy example Janet is a unique personal name there is nouncertainty about what named entity is at issue when Janetis read For the family name Clark on its own by contrastJohn Clark and Janet Clark are both viable options

For the present example we assumed that every ortho-graphic word received one fixation However depending onwhether the eye lands far enough into the word namessuch as John Clark and compounds such as graph theorycan also be read with a single fixation For single-fixationreading learning events should comprise the joint trigrams ofthe two orthographic words which are then simultaneouslycalibrated against the targeted lexome (JohnClark graph-theory)

Obviously the true complexities of reading such asparafoveal preview are not captured by this example Never-theless as a rough first approximation this example showsa way forward for integrating time into the discriminativelexicon

7 General Discussion

We have presented a unified discrimination-driven modelfor visual and auditory word comprehension and wordproduction the discriminative lexicon The layout of thismodel is visualized in Figure 11 Input (cochlea and retina)and output (typing and speaking) systems are presented inlight gray the representations of themodel are shown in darkgray For auditory comprehension we make use of frequencyband summary (FBS) features low-level cues that we deriveautomatically from the speech signal The FBS features serveas input to the auditory network F119886 that maps vectors of FBSfeatures onto the semantic vectors of S For reading lettertrigrams represent the visual input at a level of granularitythat is optimal for a functional model of the lexicon (see [97]for a discussion of lower-level visual features) Trigrams arerelatively high-level features compared to the FBS featuresFor features implementing visual input at a much lower levelof visual granularity using histograms of oriented gradients[146] see Linke et al [15] Letter trigrams are mapped by thevisual network F119900 onto S

For reading we also implemented a network heredenoted by K119886 that maps trigrams onto auditory targets(auditory verbal images) represented by triphones Theseauditory triphones in turn are mapped by network H119886onto the semantic vectors S This network is motivated notonly by our observation that for predicting visual lexicaldecision latencies triphones outperform trigrams by a widemargin Network H119886 is also part of the control loop forspeech production see Hickok [147] for discussion of thiscontrol loop and the necessity of auditory targets in speechproduction Furthermore the acoustic durations with whichEnglish speakers realize the stem vowels of English verbs[148] as well as the duration of word final s in English [149]

Complexity 27

Table 8 Example sentences for the reading of proper names To keep the example simple inflectional lexomes are not taken into account

John Clark wrote great books about antsJohn Clark published a great book about dangerous antsJohn Welsch makes great photographs of airplanesJohn Welsch has a collection of great photographs of airplanes landingJohn Wiggam will build great harpsichordsJohn Wiggam will be a great expert in tuning harpsichordsAnne Hastie has taught statistics to great second year studentsAnne Hastie has taught probability to great first year studentsJanet Clark teaches graph theory to second year studentsJohnClark write great book about antJohnClark publish a great book about dangerous antJohnWelsch make great photograph of airplaneJohnWelsch have a collection of great photograph airplane landingJohnWiggam future build great harpsichordJohnWiggam future be a great expert in tuning harpsichordAnneHastie have teach statistics to great second year studentAnneHastie have teach probability to great first year studentJanetClark teach graphtheory to second year student

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

John

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

John Welsch

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

Janet

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

Clark

Pear

son

corr

elat

ion

minus0

50

00

51

0

Figure 10 Correlations of estimated semantic vectors for John JohnWelsch Janet andClarkwith the targeted semantic vectors of JohnClarkJohnWelsch JohnWiggam AnneHastie and JaneClark

can be predicted from ndl networks using auditory targets todiscriminate between lexomes [150]

Speech production is modeled by network G119886 whichmaps semantic vectors onto auditory targets which in turnserve as input for articulation Although current modelsof speech production assemble articulatory targets fromphonemes (see eg [126 147]) a possibility that is certainlycompatible with our model when phones are recontextual-ized as triphones we think that it is worthwhile investigatingwhether a network mapping semantic vectors onto vectors ofarticulatory parameters that in turn drive the articulators canbe made to work especially since many reduced forms arenot straightforwardly derivable from the phone sequences oftheir lsquocanonicalrsquo forms [111 144]

We have shown that the networks of our model havereasonable to high production and recognition accuraciesand also generate a wealth of well-supported predictions forlexical processing Central to all networks is the hypothesisthat the relation between form and meaning can be modeleddiscriminatively with large but otherwise surprisingly simplelinear networks the underlyingmathematics of which (linearalgebra) is well-understood Given the present results it isclear that the potential of linear algebra for understandingthe lexicon and lexical processing has thus far been severelyunderestimated (the model as outlined in Figure 11 is amodular one for reasons of computational convenience andinterpretabilityThe different networks probably interact (seefor some evidence [47]) One such interaction is explored

28 Complexity

typing speaking

orthographic targets

trigrams

auditory targets

triphones

semantic vectors

TASA

auditory cues

FBS features

orthographic cues

trigrams

cochlea retina

To Ta

Ha

GoGa

KaS

Fa Fo

CoCa

Figure 11 Overview of the discriminative lexicon Input and output systems are presented in light gray the representations of the model areshown in dark gray Each of these representations is a matrix with its own set of features Arcs are labeled with the discriminative networks(transformationmatrices) thatmapmatrices onto each otherThe arc from semantic vectors to orthographic vectors is in gray as thismappingis not explored in the present study

in Baayen et al [85] In their production model for Latininflection feedback from the projected triphone paths backto the semantics (synthesis by analysis) is allowed so thatof otherwise equally well-supported paths the path that bestexpresses the targeted semantics is selected For informationflows between systems in speech production see Hickok[147]) In what follows we touch upon a series of issues thatare relevant for evaluating our model

Incremental learning In the present study we esti-mated the weights on the connections of these networkswith matrix operations but importantly these weights canalso be estimated incrementally using the learning rule ofWidrow and Hoff [44] further improvements in accuracyare expected when using the Kalman filter [151] As allmatrices can be updated incrementally (and this holds aswell for the matrix with semantic vectors which are alsotime-variant) and in theory should be updated incrementally

whenever information about the order of learning events isavailable the present theory has potential for studying lexicalacquisition and the continuing development of the lexiconover the lifetime [79]

Morphology without compositional operations Wehave shown that our networks are predictive for a rangeof experimental findings Important from the perspective ofdiscriminative linguistics is that there are no compositionaloperations in the sense of Frege and Russell [1 2 152] Thework on compositionality in logic has deeply influencedformal linguistics (eg [3 153]) and has led to the beliefthat the ldquoarchitecture of the language facultyrdquo is groundedin a homomorphism between a calculus (or algebra) ofsyntactic representations and a calculus (or algebra) based onsemantic primitives Within this tradition compositionalityarises when rules combining representations of form arematched with rules combining representations of meaning

Complexity 29

The approach of Marelli and Baroni [60] who derive thesemantic vector of a complex word from the semantic vectorof its base and a dedicated affix-specific mapping from thepertinent base words to the corresponding derived wordsis in the spirit of compositionality in that a change in form(through affixation) goes hand in hand with a change inmeaning (modeled with a linear mapping in semantic space)

The perspective on morphological processing developedin the present study breaks with this tradition Althoughsemantic vectors for inflected words are obtained by summa-tion of the semantic vectors of the base word and those ofits inflectional functions (see [85] for modeling of the richverbal paradigms of Latin) the form vectors correspondingto the resulting semantic vectors are not obtained by thesummation of vectors for chunks of form nor by any otheroperations on forms Attempts to align chunks of wordsrsquo formwith their semantic structures mdash typically represented bygraphs constructed over semantic primitives (see eg [154])mdash are destined to end in quagmires of arbitrary decisionsabout the number of inflectional classes required (Estonian30 or 300) what the chunks should look like (does Germanhave a morpheme d in its articles and demonstratives derdie das diese ) whether Semitic morphology requiresoperations on form on one or two tiers [155ndash157] andhow many affixal position slots are required for Athebascanlanguages such asNavajo [158] A language that is particularlypainful for compositional theories is Yelı Dnye an Indo-Pacific language spoken on Rossel island (in the south-east ofPapuaNewGuinea)The language has a substantial inventoryof over a 1000 function words used for verb agreementTypically these words are monosyllables that simultaneouslyspecify values for negation tense aspect person and numberof subject deixis evidentiality associated motion and coun-terfactuality Furthermore verbs typically have no less thaneight different stems the use of which depends on particularconstellations of inflectional functions [159 160] Yelı Dnyethus provides a striking counterexample to the hypothesisthat a calculus of meaning would be paralleled by a calculusof form

Whereas the theory of linear discriminative learningradically rejects the idea that morphological processing isgrounded in compositionality as defined in mainstream for-mal linguistic theory it does allow for the conjoint expressionin form of lexical meanings and inflectional functions andfor forms to map onto such conjoint semantic vectors Butjust asmultiple inflectional functions can be expressed jointlyin a single word multiple words can map onto nonconjointsemantic vectors as was illustrated above for named entityrecognition John Wiggam can be understood to refer to avery specific person without this person being a composi-tional function of the proper name John and the surnameWiggam In both these cases there is no isomorphy betweeninflectional functions and lexical meanings on the one handand chunks of form on the other hand

Of course conjoint vectors for lexical meanings andinflectional functions can be implemented andmade to workonly because a Latin verb form such as sapıvissemus isunderstood when building the semantic space S to expresslexomes for person (first) number (plural) voice (active)

tense (pluperfect) mood (subjunctive) and lexical meaning(to know) (see [85] for computational modeling details)Although this could loosely be described as a decomposi-tional approach to inflected words we prefer not to use theterm lsquodecompositionalrsquo as in formal linguistics this termas outlined above has a very different meaning A moreadequate terminology would be that the system is conjointrealizational for production and conjoint inferential for com-prehension (we note here that we have focused on temporallyconjoint realization and inference leaving temporally disjointrealization and inference for sequences of words and possiblycompounds and fixed expression for further research)

Naive versus linear discriminative learning There isone important technical difference between the learningengine of the current study ldl and naive discriminativelearning (ndl) Because ndl makes the simplifying assump-tion that outcomes are orthogonal the weights from the cuesto a given outcome are independent of the weights from thecues to the other outcomes This independence assumptionwhich motivates the word naive in naive discriminativelearning makes it possible to very efficiently update networkweights during incremental learning In ldl by contrast thisindependence no longer holds learning is no longer naiveWe therefore refer to our networks not as ndl networksbut simply as linear discriminative learning networks Actualincremental updating of ldl networks is computationallymore costly but further numerical optimization is possibleand implementations are underway

Scaling An important property of the present approachis that good results are obtained already for datasets ofmodestsize Our semantic matrix is derived from the tasa corpuswhich with only 10 million words is dwarfed by the 28 billionwords of the ukWaC corpus used by Marelli and Baroni [60]more words than any individual speaker can ever encounterin their lifetime Similarly Arnold et al [18] report goodresults for word identification when models are trained on20 hours of speech which contrasts with the huge volumesof speech required for training deep learning networks forspeech recognition Although performance increases some-whatwithmore training data [161] it is clear that considerableheadway can be made with relatively moderate volumesof speech It thus becomes feasible to train models onfor instance Australian English and New Zealand Englishusing already existing speech corpora and to implementnetworks that make precise quantitative predictions for howunderstanding is mediated by listenersrsquo expectations abouttheir interlocutors (see eg [162])

In this study we have obtained good results with asemantic matrix with a dimensionality of around 4000lexomes By itself we find it surprising that a semantic vectorspace can be scaffolded with a mere 4000 words But thisfindingmay also shed light on the phenomenon of childhoodamnesia [163 164] Bauer and Larkina [165] pointed outthat young children have autobiographical memories thatthey may not remember a few years later They argue thatthe rate of forgetting diminishes as we grow older andstabilizes in adulthood Part of this process of forgettingmay relate to the changes in the semantic matrix Cruciallywe expect the correlational structure of lexomes to change

30 Complexity

considerably over time as experience accumulates If this isindeed the case the semantic vectors for the lexomes foryoung children are different from the corresponding lexomesfor older children which will again differ from those ofadults As a consequence autobiographical memories thatwere anchored to the lexomes at a young age can no longerbe accessed as the semantic vectors to which these memorieswere originally anchored no longer exist

No examplars Given how we estimate the transforma-tion matrices with which we navigate between form andmeaning it might seem that our approach is in essence animplementation of an exemplar-based theory of the mentallexicon After all for instance the C119886 matrix specifies foreach auditory word form (exemplar) its pertinent frequencyband summary features However the computational imple-mentation of the present study should not be confused withthe underlying algorithmic conceptualization The underly-ing conceptualization is that learning proceeds incrementallytrial by trial with the weights of transformation networksbeing updated with the learning rule of Widrow and Hoff[44] In the mean trial-by-trial updating with the Widrow-Hoff learning rule results in the same expected connectionweights as obtained by the present estimates using thegeneralized inverse The important conceptual advantage ofincremental learning is however that there is no need to storeexemplars By way of example for auditory comprehensionacoustic tokens are given with the input to the learningsystemThese tokens the ephemeral result of interactingwiththe environment leave their traces in the transformationmatrix F119886 but are not themselves stored The same holdsfor speech production We assume that speakers dynamicallyconstruct the semantic vectors from their past experiencesrather than retrieving these semantic vectors from somedictionary-like fixed representation familiar from currentfile systems The dynamic nature of these semantic vectorsemerged loud and clear from the modeling of novel complexwords for speech production under cross-validation In otherwords our hypothesis is that all input and output vectors areephemeral in the sense that they represent temporary statesof the cognitive system that are not themselves represented inthat system but that leave their traces in the transformationmatrices that jointly characterize and define the cognitivesystem that we approximate with the discriminative lexicon

This dynamic perspective on the mental lexicon asconsisting of several coupled networks that jointly bringthe cognitive system into states that in turn feed into fur-ther networks (not modeled here) for sentence integration(comprehension) and articulation (production) raises thequestion about the status of units in this theory Units playan important role when constructing numeric vectors forform and meaning Units for letter trigrams and phonetrigrams are of course context-enriched letter and phoneunits Content lexomes as well as lexomes for derivationaland inflectional functions are crutches for central semanticfunctions ranging from functions realizing relatively concreteonomasiological functions (lsquothis is a dog not a catrsquo) toabstract situational functions allowing entities events andstates to be specified on the dimensions of time spacequantity aspect etc However crucial to the central thrust of

the present study there are no morphological symbols (unitscombining form andmeaning) normorphological formunitssuch as stems and exponents of any kind in the modelAbove we have outlined the many linguistic considerationsthat have led us not to want to incorporate such units aspart of our theory Importantly as there are no hidden layersin our model it is not possible to seek for lsquosubsymbolicrsquoreflexes of morphological units in patterns of activation overhidden layers Here we depart from earlier connectionistmodels which rejected symbolic representations but retainedthe belief that there should be morphological representationalbeit subsymbolic ones Seidenberg and Gonnerman [166]for instance discuss morphological representations as beinglsquointerlevel representationsrsquo that are emergent reflections ofcorrelations among orthography phonology and semanticsThis is not to say that themore agglutinative a language is themore the present model will seem to the outside observer tobe operating withmorphemes But themore a language tendstowards fusional or polysyntheticmorphology the less strongthis impression will be

Model complexity Next consider the question of howdifficult lexical processing actually is Sidestepping the com-plexities of the neural embedding of lexical processing [126147] here we narrow down this question to algorithmiccomplexity State-of-the-art models in psychology [37 114167] implementmany layers of hierarchically organized unitsand many hold it for an established fact that such units arein some sense psychologically real (see eg [19 91 168])However empirical results can be equally well understoodwithout requiring the theoretical construct of the morpheme(see eg [47 54 58 166 169 170]) The present studyillustrates this point for lsquoboundaryrsquo effects in written andoral production When paths in the triphone graph branchuncertainty increases and processing slows down Althoughobserved lsquoboundaryrsquo effects are compatible with a post-Bloomfieldian lexicon the very same effects arise in ourmodel even though morphemes do not figure in any way inour computations

Importantly the simplicity of the linear mappings in ourmodel of the discriminative lexicon allows us to sidestep theproblem that typically arise in computationally implementedfull-scale hierarchical systems namely that errors arisingat lower levels propagate to higher levels The major stepsforward made in recent end-to-end models in languageengineering suggest that end-to-end cognitive models arealso likely to perform much better One might argue thatthe hidden layers of deep learning networks represent thetraditional lsquohidden layersrsquo of phonemes morphemes andword forms mediating between form and meaning in lin-guistic models and offshoots thereof in psychology Howeverthe case of baboon lexical learning [12 13] illustrates thatthis research strategy is not without risk and that simplerdiscriminative networks can substantially outperform deeplearning networks [15] We grant however that it is conceiv-able that model predictions will improve when the presentlinear networks are replaced by deep learning networks whenpresented with the same input and output representationsused here (see Zhao et al [171] for a discussion of lossfunctions for deep learning targeting numeric instead of

Complexity 31

categorical output vectors) What is unclear is whethersuch networks will improve our understandingmdashthe currentlinear networks offer the analyst connection weights betweeninput and output representations that are straightforwardlylinguistically interpretable Where we think deep learningnetworks really will come into their own is bridging the gapbetween for instance the cochlea and retina on the one handand the heuristic representations (letter trigrams) that wehave used as a starting point for our functional theory of themappings between form and meaning

Network flexibility An important challenge for deeplearning networks designed to model human lexical process-ing is to keep the networks flexible and open to rapid updat-ing This openness to continuous learning is required notonly by findings on categorization [40 41 41 42] includingphonetic categorization [172 173] but is also visible in timeseries of reaction times Baayen and Hendrix [17] reportedimproved prediction of visual lexical decision reaction timesin the British Lexicon Project when a naive discriminativelearning network is updated from trial to trial simulating thewithin-experiment learning that goes on as subjects proceedthrough the experiment In natural speech the phenomenonof phonetic convergence between interlocutors [174] likewisebears witness to a lexicon that is constantly recalibratedin interaction with its environment (for the many ways inwhich interlocutorsrsquo speech aligns see [175]) Likewise therapidity with which listeners adapt to foreign accents [176]within a few minutes shows that lexical networks exhibitfast local optimization As deep learning networks typicallyrequire huge numbers of cycles through the training dataonce trained they are at risk of not having the requiredflexibility for fast local adaptation This risk is reduced for(incremental) wide learning as illustrated by the phoneticconvergence evidenced by the model of Arnold et al [18] forauditory comprehension (a very similar point was made byLevelt [177] in his critique of connectionist models in thenineties)

Open questions This initial study necessarily leavesmany questions unanswered Compounds and inflectedderived words have not been addressed and morphologicaloperations such as reduplication infixation and nonconcate-native morphology provide further testing grounds for thepresent approach We also need to address the processingof compounds with multiple constituents ubiquitous inlanguages as different as German Mandarin and EstonianFurthermore in actual utterances words undergo assimila-tion at their boundaries and hence a crucial test case forthe discriminative lexicon is to model the production andcomprehension of words in multi-word utterances For ourproduction model we have also completely ignored stresssyllabification and pitch which however likely has renderedthe production task more difficult than necessary

A challenge for discriminative morphology is the devel-opment of proper semantic matrices For languages withrich morphology such as Estonian current morphologicalparsers will identify case-inflected words as nominativesgenitives or partitives (among others) but these labelsfor inflectional forms do not correspond to the semanticfunctions encoded in these forms (see eg [178 179]) What

is required are computational tools that detect the appropriateinflectional lexomes for these words in the sentences orutterances in which they are embedded

A related problem specifically for the speech productionmodel is how to predict the strongly reduced word formsthat are rampant in conversational speech [111] Our auditorycomprehension network F119886 is confronted with reduced formsin training and the study of Arnold et al [18] indicates thatpresent identification accuracymay approximate human per-formance for single-word recognition Whereas for auditorycomprehension we are making progress what is required forspeech production is a much more detailed understanding ofthe circumstances under which speakers actually use reducedword forms The factors driving reduction may be in partcaptured by the semantic vectors but we anticipate thataspects of the unfolding discourse play an important role aswell which brings us back to the unresolved question of howto best model not isolated words but words in context

An important challenge for our theory which formalizesincremental implicit learning is how to address and modelthe wealth of explicit knowledge that speakers have abouttheir language We play with the way words sound in poetrywe are perplexed by wordsrsquo meanings and reinterpret themso that they make more sense (eg the folk etymology [180]reflected in modern English crawfish a crustacean and nota fish originally Middle English crevis compare ecrevissein French) we teach morphology in schools and we evenfind that making morphological relations explicit may bebeneficial for aphasic patients [181] The rich culture of worduse cannot but interact with the implicit system but howexactly this interaction unfolds andwhat its consequences areat both sides of the consciousunconscious divide is presentlyunclear

Although many questions remain our model is remark-ably successful in capturing many aspects of lexical andmorphological processing in both comprehension and pro-duction Since themodel builds on a combination of linguisti-callymotivated lsquosmartrsquo low-level representations for form andmeaning and large but otherwise straightforward andmathe-matically well-understood two-layer linear networks (to beclear networks without any hidden layers) the conclusionseems justified that algorithmically the ldquomental lexiconrdquomaybe much simpler than previously thought

Appendix

A Vectors Matrices and MatrixMultiplication

Figure 12 presents two data points 119886 and 119887 with coordinates(3 4) and (-2 -3) Arrows drawn from the origin to thesepoints are shown in blue with a solid and dashed linerespectively We refer to these points as vectors Vectors aredenoted with lower case letters in bold and we place the 119909coordinate of a point next to the 119910 coordinate

a = (3 4) b = (minus2 minus3) (A1)

32 Complexity

minus4 minus2 0 2 4

minus4

minus2

02

4

x

y

a

b

x

y

Figure 12 Points 119886 and 119887 can be transformed into points 119901 and 119902 bya linear transformation

We can represent a and b jointly by placing them next to eachother in a matrix For matrices we use capital letters in boldfont

A = ( 3 4minus2 minus3) (A2)

In addition to the points 119886 and 119887 Figure 12 also shows twoother datapoints 119909 and 119910 The matrix for these two points is

B = (minus5 24 minus1) (A3)

Let us assume that points 119886 and 119887 represent the forms oftwo words 1205961 and 1205962 and that the points 119909 and 119910 representthe meanings of these two words (below we discuss in detailhow exactly wordsrsquo forms and meanings can be representedas vectors of numbers) As morphology is the study of therelation between wordsrsquo forms and their meanings we areinterested in how to transform points 119886 and b into points119909 and 119910 and likewise in how to transform points 119909 and119910 back into points 119886 and 119887 The first transformation isrequired for comprehension and the second transformationfor production

We begin with the transformation for comprehensionFormally we have matrix A which we want to transforminto matrix B A way of mapping A onto B is by means ofa linear transformation using matrix multiplication For 2 times2 matrices matrix multiplication is defined as follows(1198861 11988621198871 1198872) (1199091 11990921199101 1199102) = (11988611199091 + 11988621199101 11988611199092 + 1198862119910211988711199091 + 11988721199101 11988711199092 + 11988721199102) (A4)

Such a mapping of A onto B is given by a matrix F

F = ( 1 2minus2 minus1) (A5)

and it is straightforward to verify that indeed( 3 4minus2 minus3) ( 1 2minus2 minus1) = (minus5 24 minus1) (A6)

Using matrix notation we can write

AF = B (A7)

Given A and B how do we obtain F The answeris straightforward but we need two further concepts theidentity matrix and the matrix inverse For multiplication ofnumbers the identity multiplication is multiplication with 1For matrices multiplication with the identity matrix I leavesthe locations of the points unchangedThe identitymatrix hasones on the main diagonal and zeros elsewhere It is easilyverified that indeed(1 00 1) (1199091 11990921199101 1199102) = (1199091 11990921199101 1199102) (1 00 1) = (1199091 11990921199101 1199102) (A8)

For numbers the inverse of multiplication with 119904 is dividingby 119904 (1119904 ) (119904119909) = 119909 (A9)

For matrices the inverse of a square matrix X is that matrixXminus1 such that their product is the identity matrix

Xminus1X = XXminus1 = I (A10)

For nonsquare matrices the inverse Yminus1 is defined such that

Yminus1 (YX) = (XY) 119884minus1 = X (A11)

We find the square matrix F that maps the square matrices Aonto B as follows

AF = B

Aminus1AF = Aminus1B

IF = Aminus1B

F = Aminus1B(A12)

Since for the present example A and its inverse happen to beidentical (Aminus1 = A) we obtain (A6)

Fmaps wordsrsquo form vectors onto wordsrsquo semantic vectorsLet us now consider the reverse and see how we can obtainwordsrsquo form vectors from wordsrsquo semantic vectors That isgiven B we want to find that matrix G which maps B ontoA ie

BG = A (A13)

Solving this equation proceeds exactly as above

BG = A

Bminus1BG = Bminus1A

IG = Bminus1A

G = Bminus1A(A14)

Complexity 33

The inverse of B is

Bminus1 = (13 2343 53) (A15)

and hence we have

(13 2343 53) ( 3 4minus2 minus3) = (minus13 minus2323 13 ) = G (A16)

The inverse of a matrix needs not exist A square matrixthat does not have an inverse is referred to as a singularmatrix Almost all matrices with which we work in theremainder of this study are singular For singular matricesan approximation of the inverse can be used such as theMoore-Penrose generalized inverse (In R the generalizedinverse is available in theMASS package function ginv Thenumeric library that is used by ginv is LAPACK availableat httpwwwnetliborglapack) In this study wedenote the generalized inverse of a matrix X by X1015840

The examples presented here are restricted to 2 times 2matrices but themathematics generalize tomatrices of largerdimensions When multiplying two matrices X and Y theonly constraint is that the number of columns ofX is the sameas the number of rows ofYThe resultingmatrix has the samenumber of rows as X and the same number of columns as Y

In the present study the rows of matrices represent wordsand columns the features of thesewords For instance the rowvectors of A can represent wordsrsquo form features and the rowvectors of and B the corresponding semantic features Thefirst row vector of A is mapped onto the first row vector ofB as follows (3 4) ( 1 2minus2 minus1) = (minus5 2) (A17)

A transformationmatrix such as F can be represented as atwo-layer networkThe network corresponding to F is shownin Figure 13 When the input vector (3 4) is presented to thenetwork the nodes 1198941 and 1198942 are set to 3 and 4 respectively Toobtain the activations 1199001 = minus5 and 1199002 = 2 for the output vector(minus5 2) we cumulate the evidence at the input weighted bythe connection strengths specified on the edges of the graphThus for 1199001 we obtain the value 3 times 1 + 4 times minus2 = minus5 and for1199002 we have 3 times 2 + 4 times minus1 = 2 In other words the networkmaps the input vector (3 4) onto the output vector (minus5 2)

An important property of linear maps is that they areproductive We can present a novel form vector say

s = (2 2) (A18)

to the network and it will map this vector onto a novelsemantic vector Using matrices this new semantic vector isstraightforwardly calculated(2 2) ( 1 2minus2 minus1) = (minus2 2) (A19)

1 2 minus2 minus1

i1 i2

o1 o2

semantic vectors come out here

form features go in here

connection weightsare the transformationmatrix

Figure 13 The linear network corresponding to the transformationmatrix F There are two input nodes 1198941 and 1198942 (in blue)and twooutput nodes 1199001 and 1199002 in red When the input vector (3 4) ispresented to the network the value at 1199001 is 3 times 1 + 4 times minus2 = minus5and that at 1199002 is 3 times 2 + 4 times minus1 = 2 The network maps the point(3 4) onto the point (minus5 2)B Graph-Based Triphone Sequencing

The hypothesis underlying graph-based triphone sequencingis that in the directed graph that has triphones as verticesand an edge between any two vertices that overlap (iesegments 2 and 3 of triphone 1 are identical to segments 1and 2 of triphone 2) the path from a wordrsquos initial triphone(the triphone starting with ) to a wordrsquos final triphone (thetriphone ending with a ) is the path receiving the bestsupport of all possible paths from the initial to the finaltriphone Unfortunately the directed graph containing allvertices and edges is too large tomake computations tractablein a reasonable amount of computation time The followingalgorithm is a heuristic algorithm that first collects potentiallyrelevant vertices and edges and then calculates all pathsstarting fromany initial vertex using theall simple pathsfunction from the igraph package [127] As a first step allvertices that are supported by the stem and whole wordwith network support exceeding 01 are selected To theresulting set those vertices are added that are supported bythe affix which is conditional on these vertices having anetwork support exceeding 095 A directed graph can nowbe constructed and for each initial vertex all paths startingat these vertices are calculated The subset of those pathsreaching a final vertex is selected and the support for eachpath is calculated As longer paths trivially will have greatersupport the support for a path is weighted for its lengthsimply by dividing the raw support by the path length Pathsare ranked by weighted path support and the path withmaximal support is selected as the acoustic image drivingarticulation

It is possible that no path from an initial vertex to a finalvertex is found due to critical boundary triphones not beinginstantiated in the data on which the model was trainedThis happens when the model is assessed on novel inflected

34 Complexity

nobo

dyso

met

hing

ever

ythi

ngno

thin

gev

eryo

nean

ythi

ngan

yone

som

eone

hers

him his

via

than I

exce

ptou

tside

thei

rsw

hoev

erth

ence

min

ein

side

amon

gst its

thei

rou

rm

yyo

uyo

ur her

she

your

selv

esw

hate

ver us we it

they he

them m

em

yself

him

self

hers

elfou

rselv

esyo

urse

lfits

elfth

emse

lves

abou

tw

ithou

tw

ithin

beyo

ndam

ong

betw

een

thro

ugho

utdu

ring

since by

from in of

with at fo

r tobe

low

abov

e up offdo

wn

upon on

agai

nst

tow

ard

tow

ards

arou

ndne

aral

ong

unde

rbe

hind

besid

eth

roug

hon

toin

toov

erac

ross

nobodysomethingeverythingnothingeveryoneanythinganyonesomeonehershimhisviathanIexceptoutsidetheirswhoeverthencemineinsideamongstitstheirourmyyouyourhersheyourselveswhateverusweittheyhethemmemyselfhimselfherselfourselvesyourselfitselfthemselvesaboutwithoutwithinbeyondamongbetweenthroughoutduringsincebyfrominofwithatfortobelowaboveupoffdownupononagainsttowardtowardsaroundnearalongunderbehindbesidethroughontointooveracross

minus01 0 01 02Value

Color Keyand Histogram

050

015

0025

00C

ount

Figure 14 Heatmap for the correlations of the semantic vectors of pronouns and prepositions Note that quantifier pronouns form a groupof their own that prepositions and the other pronouns form distinguisable groups and that within groups further subclusters are visible egforwe and us she and her and across and over All diagonal elements represent correlations equal to 1 (this is not brought out by the heatmapwhich color-codes diagonal elements with the color for the highest correlation of the range we specified 025)

and derived words under cross-validationTherefore verticesand edges are added to the graph for all nonfinal verticesthat are at the end of any of the paths starting at any initialtriphone Such novel vertices and edges are assigned zeronetwork support Typically paths from the initial vertex tothe final vertex can now be constructed and path support isevaluated as above

For an algorithm that allows triphone paths to includecycles which may be the case in languages with much richer

morphology than English see Baayen et al [85] and for codethe R packageWpmWithLdl described therein

C A Heatmap for Function Words(Pronouns and Prepositions)

The heatmap for the correlations of the semantic vectors ofpronouns and prepositions is presented in Figure 14

Complexity 35

Data Availability

With the exception of the data from theDistributed Little RedHen Lab all primary data are publicly available from the citedsources

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The authors are indebted to Geoff Hollis Jessie Nixon ChrisWestbury and LuisMienhardt for their constructive feedbackon earlier versions of this manuscript This research wassupported by an ERC advancedGrant (no 742545) to the firstauthor

References

[1] G Frege ldquoBegriffsschrift a formula language modeled uponthat of arithmetic for pure thoughtrdquo in From Frege to GodelA Source Book in Mathematical Logic 1879-1931 pp 1ndash82 1967

[2] B RussellAn Inquiry intoMeaning andTruth Allen andUnwinLondon UK 1942

[3] R Montague ldquoThe proper treatment of quantification in ordi-nary englishrdquo in Approaches to Natural Language pp 221ndash242Springer 1973

[4] N Chomsky Syntactic Structures vol 33 Mouton The Hague1957

[5] W J M Levelt Formal Grammars in Linguistics And Psycholin-guistics Volume 1 An Introduction to the Theory of FormalLanguages and Automata Volume 2 Applications in LinguisticTheory Volume 3 Psycholinguistic Applications John BenjaminsPublishing 2008

[6] D F Kleinschmidt and T Florian Jaeger ldquoRobust speechperception Recognize the familiar generalize to the similar andadapt to the novelrdquo Psychological Review vol 122 no 2 pp 148ndash203 2015

[7] L Breiman ldquoRandom forestsrdquo Machine Learning vol 45 no 1pp 5ndash32 2001

[8] C Strobl A-L Boulesteix A Zeileis and T Hothorn ldquoBiasin random forest variable importance measures illustrationssources and a solutionrdquo BMC Bioinformatics vol 8 2007

[9] A Hannun C Case J Casper et al ldquoDeep speechScaling up end-to-end speech recognitionrdquo 2014httpsarxivorgabs14125567

[10] Z Tuske P Golik R Schluter and H Ney ldquoAcoustic modelingwith deep neural networks using raw time signal for lvcsrrdquo inProceedings of the INTERSPEECH pp 890ndash894 2014

[11] J Schmidhuber ldquoDeep learning in neural networks anoverviewrdquo Neural Networks vol 61 pp 85ndash117 2015

[12] T Hannagan J C Ziegler S Dufau J Fagot and J GraingerldquoDeep learning of orthographic representations in baboonsrdquoPLoS ONE vol 9 no 1 2014

[13] J Grainger S Dufau M Montant J C Ziegler and J FagotldquoOrthographic processing in baboons (Papio papio)rdquo Sciencevol 336 no 6078 pp 245ndash248 2012

[14] D Scarf K Boy A U Reinert J Devine O Gunturkun andM Colombo ldquoOrthographic processing in pigeons (columba

livia)rdquo Proceedings of the National Academy of Sciences vol 113no 40 pp 11272ndash11276 2016

[15] M Linke F Broker M Ramscar and R H Baayen ldquoArebaboons learning ldquoorthographicrdquo representations Probablynotrdquo PLoS ONE vol 12 no 8 2017

[16] D H Hubel and T N Wiesel ldquoReceptive fields binocularinteraction and functional architecture in the catrsquos visualcortexrdquo The Journal of Physiology vol 160 pp 106ndash154 1962

[17] R H Baayen and P Hendrix ldquoTwo-layer networks non-linear separation and human learningrdquo in From Semantics toDialectometry Festschrift in honor of JohnNerbonne Tributes 32M Wieling M Kroon G Van Noord and G Bouma Eds pp13ndash22 College Publications 2017

[18] D Arnold F Tomaschek K Sering F Lopez and R H BaayenldquoWords from spontaneous conversational speech can be rec-ognized with human-like accuracy by an error-driven learningalgorithm that discriminates between meanings straight fromsmart acoustic features bypassing the phoneme as recognitionunitrdquo PLoS ONE vol 12 no 4 2017

[19] P Zwitserlood ldquoProcessing and representation of morphologi-cal complexity in native language comprehension and produc-tionrdquo in The Construction of Words G E Booij Ed pp 583ndash602 Springer 2018

[20] G Booij ldquoConstruction morphologyrdquo Language and LinguisticsCompass vol 4 no 7 pp 543ndash555 2010

[21] J P Blevins ldquoWord-based morphologyrdquo Journal of Linguisticsvol 42 no 3 pp 531ndash573 2006

[22] G Stump Inflectional Morphology A Theory of ParadigmStructure Cambridge University Press 2001

[23] R Beard ldquoOn the extent and nature of irregularity in thelexiconrdquo Lingua vol 42 no 4 pp 305ndash341 1977

[24] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1974

[25] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1991

[26] C Hockett ldquoThe origin of speechrdquo Scientific American vol 203pp 89ndash97 1960

[27] M Erelt Estonian Language M Erelt Ed Estonian AcademyPublishers Tallinn Estonia 2003

[28] P H Matthews Grammatical Theory in the United States fromBloomfield to Chomsky Cambridge University Press Cam-bridge 1993

[29] N Chomsky andMHalleTheSound Pattern of English Harperand Row NY USA 1968

[30] T Landauer and S Dumais ldquoA solution to Platorsquos problem Thelatent semantic analysis theory of acquisition induction andrepresentation of knowledgerdquo Psychological Review vol 104 no2 pp 211ndash240 1997

[31] W Weaver ldquoTranslationrdquo in Machine Translation of LanguagesFourteen Essays W N Locke and A D Booth Eds pp 15ndash23MIT Press Cambridge 1955

[32] J R Firth Selected Papers of J R Firth 1952-59 IndianaUniversity Press 1968

[33] K Lund and C Burgess ldquoProducing high-dimensional seman-tic spaces from lexical co-occurrencerdquo Behavior Research Meth-ods Instruments and Computers vol 28 no 2 pp 203ndash2081996

[34] C Shaoul and C Westbury ldquoExploring lexical co-occurrencespace using HiDExrdquo Behavior Research Methods vol 42 no 2pp 393ndash413 2010

36 Complexity

[35] T Mikolov I Sutskever K Chen G Corrado and J DeanldquoDistributed representations of words and phrases and theircompositionalityrdquo inAdvances in Neural Information ProcessingSystems pp 3111ndash3119 2013

[36] A Roelofs ldquoThe WEAVER model of word-form encoding inspeech productionrdquo Cognition vol 64 no 3 pp 249ndash284 1997

[37] W J M Levelt A Roelofs and A S Meyer ldquoA theory of lexicalaccess in speech productionrdquoBehavioral and Brain Sciences vol22 no 1 pp 1ndash38 1999

[38] M Taft ldquoInteractive-activation as a Framework for Under-standing Morphological Processingrdquo Language and CognitiveProcesses vol 9 no 3 pp 271ndash294 1994

[39] R Schreuder and R H Baayen ldquoModeling morphologicalprocessingrdquo inMorphological Aspects of Language Processing LB Feldman Ed pp 131ndash154 Lawrence ErlbaumHillsdale NewJersey USA 1995

[40] B C Love D L Medin and T M Gureckis ldquoSUSTAIN ANetwork Model of Category Learningrdquo Psychological Reviewvol 111 no 2 pp 309ndash332 2004

[41] C J Marsolek ldquoWhat antipriming reveals about primingrdquoTrends in Cognitive Sciences vol 12 no 5 pp 176ndash181 2008

[42] M Ramscar and R Port ldquoCategorization (without categories)rdquoin Handbook of Cognitive Linguistics E Dabrowska and DDivjak Eds pp 75ndash99 De Gruyter Berlin Germany 2015

[43] R A Rescorla and A R Wagner ldquoA theory of Pavlovianconditioning Variations in the effectiveness of reinforcementand nonreinforcementrdquo in Classical Conditioning II CurrentResearch and Theory A H Black and W F Prokasy Eds pp64ndash99 Appleton Century Crofts NY USA 1972

[44] B Widrow and M E Hoff ldquoAdaptive switching circuitsrdquo 1960WESCON Convention Record Part IV pp 96ndash104 1960

[45] R H Baayen R Schreuder and R Sproat ldquoMorphology inthe Mental Lexicon A Computational Model for Visual WordRecognitionrdquo in Lexicon Development for Speech and LanguageProcessing F van Eynde and D Gibbon Eds pp 267ndash291Kluwer Academic Publishers 2000

[46] M Taft ldquoA morphological-decomposition model of lexicalrepresentationrdquo Linguistics vol 26 no 4 pp 657ndash667 1988

[47] MWHarmandM S Seidenberg ldquoComputing themeanings ofwords in reading Cooperative division of labor between visualand phonological processesrdquo Psychological Review vol 111 no3 pp 662ndash720 2004

[48] P C Trimmer J M McNamara A Houston and J A RMarshall ldquoDoes natural selection favour the Rescorla-Wagnerrulerdquo Journal of Theoretical Biology vol 302 pp 39ndash52 2012

[49] R R Miller R C Barnet and N J Grahame ldquoAssessment ofthe Rescorla-Wagner modelrdquo Psychological Bulletin vol 117 no3 pp 363ndash386 1995

[50] L J Kamin ldquoPredictability surprise attention and condition-ingrdquo in Punishment and Aversive Behavior B A Campbell andRMChurch Eds pp 276ndash296Appleton-Century-CroftsNYUSA 1969

[51] R A Rescorla ldquoPavlovian conditioning Itrsquos not what you thinkit isrdquo American Psychologist vol 43 no 3 pp 151ndash160 1988

[52] M Ramscar D Yarlett M Dye K Denny and K ThorpeldquoThe Effects of Feature-Label-Order and Their Implications forSymbolic Learningrdquo Cognitive Science vol 34 no 6 pp 909ndash957 2010

[53] M Ramscar and D Yarlett ldquoLinguistic self-correction in theabsence of feedback A new approach to the logical problem oflanguage acquisitionrdquo Cognitive Science vol 31 no 6 pp 927ndash960 2007

[54] R H Baayen P Milin D F ETHurđevic P Hendrix and MMarelli ldquoAn Amorphous Model for Morphological Process-ing in Visual Comprehension Based on Naive DiscriminativeLearningrdquo Psychological Review vol 118 no 3 pp 438ndash4812011

[55] F M Del Prado Martın R Bertram T Haikio R Schreuderand R H Baayen ldquoMorphological family size in a morphologi-cally rich languageThe case of Finnish compared to Dutch andHebrewrdquo Journal of Experimental Psychology LearningMemoryand Cognition vol 30 pp 1271ndash1278 2004

[56] P Milin D F Durdevic and F M Del Prado Martın ldquoThesimultaneous effects of inflectional paradigms and classes onlexical recognition Evidence from Serbianrdquo Journal of Memoryand Language vol 60 no 1 pp 50ndash64 2009

[57] B K Bergen ldquoThe psychological reality of phonaesthemesrdquoLanguage vol 80 no 2 pp 290ndash311 2004

[58] P Milin L B Feldman M Ramscar P Hendrix and R HBaayen ldquoDiscrimination in lexical decisionrdquo PLoS ONE vol 12no 2 2017

[59] R H Baayen P Milin and M Ramscar ldquoFrequency in lexicalprocessingrdquo Aphasiology vol 30 no 11 pp 1174ndash1220 2016

[60] M Marelli and M Baroni ldquoAffixation in semantic space Mod-eling morpheme meanings with compositional distributionalsemanticsrdquo Psychological Review vol 122 no 3 pp 485ndash5152015

[61] T Sering PMilin andRH Baayen ldquoLanguage comprehensionas amultiple label classification problemrdquo StatisticaNeerlandicapp 1ndash15 2018

[62] R Kaye and RWilson Linear Algebra Oxford University Press1998

[63] S H Ivens and B L Koslin Demands for Reading LiteracyRequire New Accountability Methods Touchstone Applied Sci-ence Associates 1991

[64] T K Landauer P W Foltz and D Laham ldquoAn introduction tolatent semantic analysisrdquoDiscourse Processes vol 25 no 2-3 pp259ndash284 1998

[65] H Schmid ldquoImprovements in Part-of-Speech Tagging with anApplication to Germanrdquo in Proceedings of the ACL SIGDAT-Workshop pp 13ndash25 Dublin Ireland 1995

[66] R H Baayen R Piepenbrock and L Gulikers The CELEXLexical Database (CD-ROM) Linguistic Data ConsortiumUniversity of Pennsylvania Philadelphia PA 1995

[67] J Mitchell and M Lapata ldquoVector-based models of semanticcompositionrdquo in Proceedings of the ACL pp 236ndash244 2008

[68] A Lazaridou M Marelli R Zamparelli and M BaronildquoCompositionally derived representations of morphologicallycomplex words in distributional semanticsrdquo in Proceedings ofthe ACL (1) pp 1517ndash1526 2013

[69] R Cotterell H Schutze and J Eisner ldquoMorphological smooth-ing and extrapolation of word embeddingsrdquo in Proceedings ofthe 54th Annual Meeting of the Association for ComputationalLinguistics vol 1 pp 1651ndash1660 2016

[70] B D Zeller S Pado and J Snajder ldquoTowards semantic valida-tion of a derivational lexiconrdquo in Proceedings of the COLING2014 the 25th International Conference on Computational Lin-guistics Technical Papers pp 1728ndash1739 2014

[71] H Marchand The Categories and Types of Present-Day EnglishWord Formation A Synchronic-Diachronic Approach BeckrsquoscheVerlagsbuchhandlung Munchen Germany 1969

[72] L Bauer EnglishWord-Formation Cambridge University PressCambridge 1983

Complexity 37

[73] I PlagWord-Formation in English CambridgeUniversity PressCambridge 2003

[74] R Beard Lexeme-Morpheme Base Morphology A GeneralTheory of Inflection and Word Formation State University ofNew York Press Albany NY USA 1995

[75] K Geeraert J Newman and R H Baayen ldquoIdiom VariationExperimental Data and a Blueprint of a Computational ModelrdquoTopics in Cognitive Science vol 9 no 3 pp 653ndash669 2017

[76] C Shaoul S Bitschau N Schilling et al ldquondl2 Naive discrim-inative learning an implementation in R R packagerdquo 2015

[77] P Milin D Divjak and R H Baayen ldquoA learning perspectiveon individual differences in skilled reading Exploring andexploiting orthographic and semantic discrimination cuesrdquoJournal of Experimental Psychology Learning Memory andCognition vol 43 no 11 pp 1730ndash1751 2017

[78] M Ramscar P Hendrix C Shaoul P Milin and R H BaayenldquoNonlinear dynamics of lifelong learning the myth of cognitivedeclinerdquo Topics in Cognitive Science vol 6 pp 5ndash42 2014

[79] M Ramscar C C Sun P Hendrix and R H Baayen ldquoTheMis-measurement of Mind Life-Span Changes in Paired-Associate-Learning Scores Reflect the ldquoCostrdquo of Learning Not CognitiveDeclinerdquo Psychological Science vol 28 no 8 pp 1171ndash1179 2017

[80] G desRosiers and D Ivison ldquoPaired associate learning Form 1and form 2 of the Wechsler memory scalerdquo Archives of ClinicalNeuropsychology vol 3 no 1 pp 47ndash67 1988

[81] E Bruni N K Tran andM Baroni ldquoMultimodal distributionalsemanticsrdquo Journal of Artificial Intelligence Research vol 49 pp1ndash47 2014

[82] W N Venables and B D RipleyModern Applied Statistics withS-PLUS Springer NY USA 2002

[83] A B Warriner V Kuperman and M Brysbaert ldquoNorms ofvalence arousal and dominance for 13915 English lemmasrdquoBehavior Research Methods vol 45 no 4 pp 1191ndash1207 2013

[84] D Kastovsky ldquoProductivity in word formationrdquo Linguistics vol24 no 3 pp 585ndash600 1986

[85] R H Baayen Y Chuang and J P Blevins ldquoInflectionalmorphology with linear mappingsrdquoTheMental Lexicon vol 13no 2 pp 232ndash270 2018

[86] Burnage CELEX A Guide for Users Centre for Lexical Infor-mation Nijmegen The Netherlands 1988

[87] C McBride-Chang ldquoModels of Speech Perception and Phono-logical Processing in Readingrdquo Child Development vol 67 no4 pp 1836ndash1856 1996

[88] G C Van Orden H Kloos et al ldquoThe question of phonologyand readingrdquo inThe Science of Reading A Handbook pp 61ndash782005

[89] D Jared and K OrsquoDonnell ldquoSkilled adult readers activate themeanings of high-frequency words using phonology Evidencefrom eye trackingrdquoMemoryampCognition vol 45 no 2 pp 334ndash346 2017

[90] G S Dell ldquoA Spreading-Activation Theory of Retrieval inSentence Productionrdquo Psychological Review vol 93 no 3 pp283ndash321 1986

[91] A Marantz ldquoNo escape from morphemes in morphologicalprocessingrdquo Language and Cognitive Processes vol 28 no 7 pp905ndash916 2013

[92] M Bozic W D Marslen-Wilson E A Stamatakis M HDavis and L K Tyler ldquoDifferentiating morphology formand meaning Neural correlates of morphological complexityrdquoCognitive Neuroscience vol 19 no 9 pp 1464ndash1475 2007

[93] R F Port andA P Leary ldquoAgainst formal phonologyrdquoLanguagevol 81 no 4 pp 927ndash964 2005

[94] J P Blevins ldquoStems and paradigmsrdquo Language vol 79 no 4 pp737ndash767 2003

[95] R Forsyth and D Holmes ldquoFeature-finding for test classifica-tionrdquo Literary and Linguistic Computing vol 11 no 4 pp 163ndash174 1996

[96] H Pham and R H Baayen ldquoVietnamese compounds showan anti-frequency effect in visual lexical decisionrdquo LanguageCognition and Neuroscience vol 30 no 9 pp 1077ndash1095 2015

[97] L Cohen and S Dehaene ldquoVentral and dorsal contributions toword readingrdquo inTheCognitive Neurosciences M S GazzanigaEd pp 789ndash804 Massachusetts Institute of Technology 2009

[98] E Keuleers P Lacey K Rastle and M Brysbaert ldquoThe BritishLexicon Project Lexical decision data for 28730 monosyllabicand disyllabic English wordsrdquo Behavior Research Methods vol44 no 1 pp 287ndash304 2012

[99] R K Wagner J K Torgesen and C A Rashotte ldquoDevel-opment of reading-related phonological processing abilitiesnew evidence of bidirectional causality from a latent variablelongitudinal studyrdquoDevelopmental Psychology vol 30 no 1 pp73ndash87 1994

[100] K F EWong andH-C Chen ldquoOrthographic and phonologicalprocessing in reading Chinese text Evidence from eye fixa-tionsrdquo Language and Cognitive Processes vol 14 no 5-6 pp461ndash480 1999

[101] R L Newman D Jared and C A Haigh ldquoDoes phonologyplay a role when skilled readers read high-frequency wordsEvidence from ERPsrdquo Language and Cognitive Processes vol 27no 9 pp 1361ndash1384 2012

[102] D Jared J Ashby S J Agauas and B A Levy ldquoPhonologicalactivation of word meanings in grade 5 readersrdquo Journal ofExperimental Psychology LearningMemory andCognition vol42 no 4 pp 524ndash541 2016

[103] T Bitan A Kaftory A Meiri-Leib Z Eviatar and O PelegldquoPhonological ambiguity modulates resolution of semanticambiguity during reading An fMRI study of Hebrewrdquo Neu-ropsychology vol 31 no 7 pp 759ndash777 2017

[104] D Jared and S Bainbridge ldquoReading homophone puns Evi-dence from eye trackingrdquo Canadian Journal of ExperimentalPsychology vol 71 no 1 pp 2ndash13 2017

[105] S Amenta M Marelli and S Sulpizio ldquoFrom sound tomeaning Phonology-to-Semantics mapping in visual wordrecognitionrdquo Psychonomic Bulletin amp Review vol 24 no 3 pp887ndash893 2017

[106] M Perrone-Bertolotti J Kujala J R Vidal et al ldquoHow silent issilent reading intracerebral evidence for top-down activationof temporal voice areas during readingrdquo The Journal of Neuro-science vol 32 no 49 pp 17554ndash17562 2012

[107] B Yao P Belin andC Scheepers ldquoSilent reading of direct versusindirect speech activates voice-selective areas in the auditorycortexrdquo Journal of Cognitive Neuroscience vol 23 no 10 pp3146ndash3152 2011

[108] M Coltheart B Curtis P Atkins and M Haller ldquoMod-els of Reading Aloud Dual-Route and Parallel-Distributed-Processing Approachesrdquo Psychological Review vol 100 no 4pp 589ndash608 1993

[109] M Coltheart ldquoModeling reading The dual-route approachrdquo inThe Science of Reading A Handbook pp 6ndash23 2005

38 Complexity

[110] M Pitt K Johnson E Hume S Kiesling and W RaymondldquoThe Buckeye corpus of conversational speech Labeling con-ventions and a test of transcriber reliabilityrdquo Speech Communi-cation vol 45 no 1 pp 89ndash95 2005

[111] K Johnson ldquoMassive reduction in conversational AmericanEnglishrdquo in Proceedings of the Spontaneous Speech Data andAnalysis Proceedings of the 1st Session of the 10th InternationalSymposium pp 29ndash54 The National International Institute forJapanese Language Tokyo Japan 2004

[112] C Phillips ldquoLevels of representation in the electrophysiology ofspeech perceptionrdquo Cognitive Science vol 25 no 5 pp 711ndash7312001

[113] R L Diehl A J Lotto and L L Holt ldquoSpeech perceptionrdquoAnnual Review of Psychology vol 55 pp 149ndash179 2004

[114] D Norris and J M McQueen ldquoShortlist B a Bayesian model ofcontinuous speech recognitionrdquo Psychological Review vol 115no 2 pp 357ndash395 2008

[115] S Hawkins ldquoRoles and representations of systematic finephonetic detail in speech understandingrdquo Journal of Phoneticsvol 31 no 3-4 pp 373ndash405 2003

[116] CCucchiarini andH Strik ldquoAutomatic Phonetic TranscriptionAn overviewrdquo in Proceedings of the 15th ICPhS pp 347ndash350Barcelona Spain 2003

[117] K JohnsonTheAuditoryPerceptual Basis for Speech Segmenta-tion Ohio State University Working Papers in Linguistics 1997

[118] H Fletcher ldquoAuditory patternsrdquo Reviews of Modern Physics vol12 no 1 pp 47ndash65 1940

[119] D Arnold ldquoAcousticndlcoder Coding sound files for use withndl R package version 101rdquo 2017

[120] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2016

[121] J Pickett and I Pollack ldquoIntelligibility of excerpts from fluentspeech Effects of rate of utterance and duration of excerptrdquoLanguage and Speech vol 6 pp 151ndash164 1963

[122] L Shockey ldquoPerception of reduced forms by non-native speak-ers of Englishrdquo in Sound Patterns of Spontaneous Speech DDuez Ed pp 97ndash100 ESCA Aix 1998

[123] M Ernestus H Baayen and R Schreuder ldquoThe recognition ofreduced word formsrdquo Brain and Language vol 81 no 1-3 pp162ndash173 2002

[124] R H Baayen C Shaoul J Willits and M Ramscar ldquoCom-prehension without segmentation a proof of concept withnaive discriminative learningrdquo Language Cognition and Neu-roscience vol 31 no 1 pp 106ndash128 2016

[125] C P Browman and L Goldstein ldquoArticulatory phonology Anoverviewrdquo Phonetica vol 49 no 3-4 pp 155ndash180 1992

[126] J A Tourville and F H Guenther ldquoThe DIVA model A neuraltheory of speech acquisition and productionrdquo Language andCognitive Processes vol 26 no 7 pp 952ndash981 2011

[127] G Csardi and T Nepusz ldquoThe igraph software package forcomplex network researchrdquo InterJournal Complex Systems vol1695 no 5 pp 1ndash9 2006

[128] M Ernestus and R H Baayen ldquoPredicting the unpredictableInterpreting neutralized segments in Dutchrdquo Language vol 79no 1 pp 5ndash38 2003

[129] R Weingarten G Nottbusch and U Will ldquoMorphemes sylla-bles and graphemes in written word productionrdquo inMultidisci-plinary Approaches to Speech Production T Pechmann and CHabel Eds pp 529ndash572 Mouton de Gruyter Berlin Germany2004

[130] R Bertram F E Toslashnnessen S Stromqvist J Hyona andP Niemi ldquoCascaded processing in written compound wordproductionrdquo Frontiers in Human Neuroscience vol 9 pp 1ndash102015

[131] T Cho ldquoEffects of morpheme boundaries on intergesturaltiming Evidence from Koreanrdquo Phonetica vol 58 no 3 pp129ndash162 2001

[132] S N Wood Generalized Additive Models Chapman ampHallCRC NY USA 2017

[133] M Seidenberg ldquoSublexical structures in visual word recogni-tion Access units or orthographic redundancyrdquo in Attentionand Performance XII M Coltheart Ed pp 245ndash264 LawrenceErlbaum Associates Hove 1987

[134] J M McQueen ldquoSegmentation of continuous speech usingphonotacticsrdquo Journal of Memory and Language vol 39 no 1pp 21ndash46 1998

[135] J B Hay Causes and Consequences of Word Structure Rout-ledge NY USA London UK 2003

[136] J B Hay ldquoFrom speech perception to morphology Affix-ordering revisitedrdquo Language vol 78 pp 527ndash555 2002

[137] J B Hay and R H Baayen ldquoPhonotactics parsing and produc-tivityrdquo Italian Journal of Linguistics vol 15 no 1 pp 99ndash1302003

[138] J Hay J Pierrehumbert and M Beckman ldquoSpeech perceptionwell-formedness and the statistics of the lexiconrdquo Papers inlaboratory phonology VI pp 58ndash74 2004

[139] M Ferro C Marzi and V Pirrelli ldquoA self-organizing modelof word storage and processing implications for morphologylearningrdquo Lingue e Linguaggio vol 10 no 2 pp 209ndash226 2011

[140] F ChersiM Ferro G Pezzulo andV Pirrelli ldquoTopological self-organization and prediction learning support both action andlexical chains in the brainrdquo Topics in Cognitive Science vol 6no 3 pp 476ndash491 2014

[141] V Pirrelli M Ferro and C Marzi ldquoComputational complexityof abstractive morphologyrdquo in Understanding and MeasuringMorphological Complexity M Bearman D Brown and G GCorbett Eds pp 141ndash166 Oxford University Press 2015

[142] C Marzi M Ferro and V Pirrelli ldquoIs inflectional irregularitydysfunctional to human processingrdquo in Abstract Booklet TheMental Lexicon 2018 V Kuperman Ed University of AlbertaEdmonton 2018

[143] E Keuleers M Stevens P Mandera and M Brysbaert ldquoWordknowledge in the crowd Measuring vocabulary size and wordprevalence in a massive online experimentrdquo The QuarterlyJournal of Experimental Psychology vol 68 no 8 pp 1665ndash16922015

[144] M Ernestus Voice Assimilation And Segment Reduction inCasual Dutch A Corpus-Based Study of the Phonology-PhoneticsInterface LOT Utrecht the Netherlands 2000

[145] R Kemps M Ernestus R Schreuder and H Baayen ldquoProcess-ing reducedword formsThe suffix restoration effectrdquoBrain andLanguage vol 90 no 1-3 pp 117ndash127 2004

[146] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) vol 1 pp 886ndash893 June 2005

[147] G Hickok ldquoThe architecture of speech production and the roleof the phoneme in speech processingrdquo Language Cognition andNeuroscience vol 29 no 1 pp 2ndash20 2014

[148] B V Tucker M Sims and R H Baayen Opposing Forceson Acoustic Duration Manuscript University of Alberta andUniversity of Tubingen 2018

Complexity 39

[149] I Plag J Homann and G Kunter ldquoHomophony and mor-phology The acoustics of word-final S in Englishrdquo Journal ofLinguistics vol 53 no 1 pp 181ndash216 2017

[150] F Tomaschek I Plag M Ernestus and R H BaayenModeling the Duration of Word-Final s in English withNaive Discriminative Learning Manuscript University ofSiegenTubingenNijmegen 2018

[151] R E Kalman ldquoA new approach to linear filtering and predictionproblemsrdquo Journal of Fluids Engineering vol 82 no 1 pp 35ndash451960

[152] B Russell ldquoOn denotingrdquo Mind vol 14 no 56 pp 479ndash4931905

[153] N Hornstein Logical Form FromGB toMinimalism Blackwell1995

[154] R Jackendoff Semantic StructuresMITPress Cambridge 1990[155] J J McCarthy ldquoA prosodic theory of non-concatenative mor-

phologyrdquo Linguistic Inquiry vol 12 pp 373ndash418 1981[156] A Ussishkin ldquoA Fixed Prosodic Theory of Nonconcatenative

Templaticmorphologyrdquo Natural Language amp Linguistic Theoryvol 23 no 1 pp 169ndash218 2005

[157] A Ussishkin ldquoAffix-favored contrast inequity and psycholin-guistic grounding for non-concatenative morphologyrdquo Mor-phology vol 16 no 1 pp 107ndash125 2006

[158] RW Young andWMorganTheNavajo Language A Grammarand Colloquial Dictionary University of New Mexico Press1980

[159] S C Levinson and A Majid ldquoThe island of time Yelı Dnye thelanguage of Rossel islandrdquo Frontiers in psychology vol 4 2013

[160] S C Levinson ldquoThe language of space in Yelı Dnyerdquo inGrammars of Space Explorations in Cognitive Diversity pp 157ndash203 Cambridge University Press 2006

[161] E Shafaei-Bajestan and R H Baayen ldquoWide Learning forAuditory Comprehensionrdquo in Proceedings of the Interspeech2018 2018

[162] J Hay and K Drager ldquoStuffed toys and speech perceptionrdquoLinguistics vol 48 no 4 pp 865ndash892 2010

[163] V Henri and C Henri ldquoOn our earliest recollections ofchildhoodrdquo Psychological Review vol 2 pp 215-216 1895

[164] S Freud ldquoChildhood and concealing memoriesrdquo in The BasicWritings of Sigmund Freud The Modern Library NY USA19051953

[165] P J Bauer and M Larkina ldquoThe onset of childhood amnesiain childhood A prospective investigation of the course anddeterminants of forgetting of early-life eventsrdquoMemory vol 22no 8 pp 907ndash924 2014

[166] M S Seidenberg and L M Gonnerman ldquoExplaining deriva-tional morphology as the convergence of codesrdquo Trends inCognitive Sciences vol 4 no 9 pp 353ndash361 2000

[167] M Coltheart K Rastle C Perry R Langdon and J ZieglerldquoDRC a dual route cascaded model of visual word recognitionand reading aloudrdquoPsychological Review vol 108 no 1 pp 204ndash256 2001

[168] M V Butz and E F Kutter How the Mind Comes into BeingIntroducing Cognitive Science from a Functional and Computa-tional Perspective Oxford University Press 2016

[169] M W Harm and M S Seidenberg ldquoPhonology readingacquisition and dyslexia Insights from connectionist modelsrdquoPsychological Review vol 106 no 3 pp 491ndash528 1999

[170] L M Gonnerman M S Seidenberg and E S AndersenldquoGraded semantic and phonological similarity effects in prim-ing Evidence for a distributed connectionist approach to

morphologyrdquo Journal of Experimental Psychology General vol136 no 2 pp 323ndash345 2007

[171] H Zhao O Gallo I Frosio and J Kautz ldquoLoss Functions forImage Restoration With Neural Networksrdquo IEEE Transactionson Computational Imaging vol 3 no 1 pp 47ndash57 2017

[172] C Clarke and P Luce ldquoPerceptual adaptation to speakercharacteristics Vot boundaries in stop voicing categorizationrdquoin Proceedings of the ISCA Workshop on Plasticity in SpeechPerception 2005

[173] DNorris JMMcQueen andACutler ldquoPerceptual learning inspeechrdquo Cognitive Psychology vol 47 no 2 pp 204ndash238 2003

[174] A Schweitzer and N Lewandowski ldquoSocial factors in conver-gence of f1 and f2 in spontaneous speechrdquo in Proceedings ofthe 10th International Seminar on Speech Production CologneGermany 2014

[175] M J Pickering and S Garrod ldquoToward a mechanistic psychol-ogy of dialoguerdquo Behavioral and Brain Sciences vol 27 no 2 pp169ndash190 2004

[176] C M Clarke and M F Garrett ldquoRapid adaptation to foreign-accented Englishrdquo The Journal of the Acoustical Society ofAmerica vol 116 no 6 pp 3647ndash3658 2004

[177] W J Levelt ldquoDie konnektionistische moderdquo Sprache UndKognition vol 10 no 2 pp 61ndash72 1991

[178] A Kostic ldquoInformational load constraints on processinginflected morphologyrdquo in Morphological Aspects of LanguageProcessing L B Feldman Ed Lawrence Erlbaum Inc Publish-ers New Jersey USA 1995

[179] J P BlevinsWord andParadigmMorphology OxfordUniversityPress 2016

[180] E Forstemann Uber Deutsche volksetymologie Zeitschrift furvergleichende Sprachforschung auf dem Gebiete des DeutschenGriechischen und Lateinischen 1852

[181] K Nault Morphological Therapy Protocal [Ph D thesis] Uni-versity of Alberta Edmonton 2010

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 19: The Discriminative Lexicon: A Unified Computational Model ...downloads.hindawi.com/journals/complexity/2019/4895891.pdf · as developed within computational linguistics and com-puter

Complexity 19

0 5000 10000 15000 20000 25000 30000 35000 40000minus6000

minus4000

minus2000

0

2000

4000

6000

8000

0 5000 10000 15000 20000 25000 30000 35000 40000minus6000

minus4000

minus2000

0

2000

4000

6000

8000

Figure 7Oscillogram for twodifferent realizations of theword crisiswith different degrees of reduction in theNewsScape archive [kraız](left)and [khraısıs](right)

00 01 02 03 04 05 06time

minus020minus015minus010minus005

000005010015

ampl

itude

1

21

frequ

ency

ban

d

Figure 8 Oscillogram of the speech signal for a realization of the word economic with Hilbert envelope (in orange) outlining the signal isshown on the top panel The lower panel depicts the discretized MEL scaled spectrum of the signalThe vertical bars are the boundary pointsthat partition the signal into 4 chunks For one chunk horizontal lines (in red) highlight one of the frequency bands for which the FBSFsprovide summaries of the variation over time in that frequency band The FBSF for the highlighted band is ldquoband11-start3-median3-min2-max5-end3-part2rdquo

provided to us by the Distributed Little Red Hen Lab Theaudio files of this resource were automatically classified asclean for relatively clean parts where there is speech withoutbackground noise or music and noisy for speech snippetswhere background noise or music is present Here we reportresults for 20 hours of clean speech to a total of 131673word tokens (representing 4779word types) with in all 40639distinct FBSFs

The FBSFs for the word tokens are brought together ina matrix C119886 with dimensions 131673 audio tokens times 40639FBSFs The targeted semantic vectors are taken from the Smatrix which is expanded to a matrix with 131673 rows onefor each audio token and 4609 columns the dimension of

the semantic vectors Although the transformation matrix Fcould be obtained by calculating C1015840S the calculation of C1015840is numerically expensive To reduce computational costs wecalculated F as follows (the transpose of a square matrix Xdenoted by X119879 is obtained by replacing the upper triangleof the matrix by the lower triangle and vice versa Thus( 3 87 2 )119879 = ( 3 78 2 ) For a nonsquare matrix the transpose isobtained by switching rows and columnsThus a 4times2 matrixbecomes a 2 times 4 matrix when transposed)

CF = S

C119879CF = C119879S

20 Complexity

(C119879C)minus1 (C119879C) F = (C119879C)minus1 C119879SF = (C119879C)minus1 C119879S

(17)

In this way matrix inversion is required for a much smallermatrixC119879C which is a square matrix of size 40639 times 40639

To evaluate model performance we compared the esti-mated semantic vectors with the targeted semantic vectorsusing the Pearson correlation We therefore calculated the131673 times 131673 correlation matrix for all possible pairs ofestimated and targeted semantic vectors Precisionwas calcu-lated by comparing predicted vectors with the gold standardprovided by the targeted vectors Recognition was defined tobe successful if the correlation of the predicted vector withthe targeted gold vector was the highest of all the pairwisecorrelations of this predicted vector with any of the other goldsemantic vectors Precision defined as the proportion of cor-rect recognitions divided by the total number of audio tokenswas at 3361 For the correctly identified words the meancorrelation was 072 and for the incorrectly identified wordsit was 055 To place this in perspective a naive discrimi-nation model with discrete lexomes as output performed at12 and a deep convolution network Mozilla DeepSpeech(httpsgithubcommozillaDeepSpeech based onHannun et al [9]) performed at 6 The low performanceofMozilla DeepSpeech is due primarily due to its dependenceon a languagemodelWhenpresentedwith utterances insteadof single words it performs remarkably well

44 Discussion A problem with naive discriminative learn-ing that has been puzzling for a long time is that measuresbased on ndl performed well as predictors of processingtimes [54 58] whereas accuracy for lexical decisions waslow An accuracy of 27 for a model performing an 11480-classification task is perhaps reasonable but the lack ofprecision is unsatisfactory when the goal is to model humanvisual lexicality decisions Bymoving fromndl to ldlmodelaccuracy is substantially improved (to 59) At the sametime predictions for reaction times improved considerablyas well As lexicality decisions do not require word identifica-tion further improvement in predicting decision behavior isexpected to be possible by considering not only whether thepredicted semantic is closest to the targeted vector but alsomeasures such as how densely the space around the predictedsemantic vector is populated

Accuracy for auditory comprehension is lower for thedata we considered above at around 33 Interestingly aseries of studies indicates that recognizing isolated wordstaken out of running speech is a nontrivial task also forhuman listeners [121ndash123] Correct identification by nativespeakers of 1000 randomly sampled word tokens from aGerman corpus of spontaneous conversational speech rangedbetween 21 and 44 [18] For both human listeners andautomatic speech recognition systems recognition improvesconsiderably when words are presented in their naturalcontext Given that ldl with FBSFs performs very wellon isolated word recognition it seems worth investigating

further whether the present approach can be developed into afully-fledged model of auditory comprehension that can takefull utterances as input For a blueprint of how we plan toimplement such a model see Baayen et al [124]

5 Speech Production

This section examines whether we can predict wordsrsquo formsfrom the semantic vectors of S If this is possible for thepresent dataset with reasonable accuracy we have a proofof concept that discriminative morphology is feasible notonly for comprehension but also for speech productionThe first subsection introduces the computational imple-mentation The next subsection reports on the modelrsquosperformance which is evaluated first for monomorphemicwords then for inflected words and finally for derivedwords Section 53 provides further evidence for the pro-duction network by showing that as the support from thesemantics for the triphones becomes weaker the amount oftime required for articulating the corresponding segmentsincreases

51 Computational Implementation For a productionmodelsome representational format is required for the output thatin turn drives articulation In what follows we make useof triphones as output features Triphones capture part ofthe contextual dependencies that characterize speech andthat render problematic the phoneme as elementary unit ofa phonological calculus [93] Triphones are in many waysnot ideal in that they inherit the limitations that come withdiscrete units Other output features structured along thelines of gestural phonology [125] or time series of movementsof key articulators registered with electromagnetic articulog-raphy or ultrasound are on our list for further explorationFor now we use triphones as a convenience construct andwe will show that given the support from the semantics forthe triphones the sequence of phones can be reconstructedwith high accuracy As some models of speech productionassemble articulatory targets from phone segments (eg[126]) the present implementation can be seen as a front-endfor this type of model

Before putting the model to the test we first clarify theway the model works by means of our toy lexicon with thewords one two and three Above we introduced the semanticmatrix S (13) which we repeat here for convenience

S = one two threeonetwothree

( 100201031001

040110 ) (18)

as well as an C indicator matrix specifying which triphonesoccur in which words (12) As in what follows this matrixspecifies the triphones targeted for productionwe henceforthrefer to this matrix as the Tmatrix

Complexity 21

T

= wV wVn Vn tu tu Tr Tri rionetwothree

( 100100

100010

010001

001001 ) (19)

For production our interest is in thematrixG that transformsthe row vectors of S into the row vectors of T ie we need tosolve

SG = T (20)

Given G we can predict for any semantic vector s the vectorof triphones t that quantifies the support for the triphonesprovided by s simply by multiplying s with G

t = sG (21)

As before the transformation matrix G is straightforwardto estimate Let S1015840 denote the Moore-Penrose generalizedinverse of S Since

S1015840SG = S1015840T (22)

we have

G = S1015840T (23)

Given G we can predict the triphone matrix T from thesemantic matrix S

SG = T (24)

For the present example the inverse of S and S1015840 is

S1015840 = one two threeonetwothree

( 110minus021minus009minus029107minus008

minus041minus002104 ) (25)

and the transformation matrix G is

G = wV wVn Vn tu tu Tr Tri ri

onetwothree

( 110minus021minus009110minus021minus009

110minus021minus009minus029107minus008

minus029107minus008minus041minus002104

minus041minus002104minus041minus002104 ) (26)

For this simple example T is virtually identical to T Forrealistic data T will not be identical to T but will be anapproximation of it that is optimal in the least squares senseThe triphones with the strongest support are expected to bethe most likely to be the triphones defining a wordrsquos form

We made use of the S matrix which we derived fromthe tasa corpus as described in Section 3 The majority ofcolumns of the 23561 times 23561 matrix S show very smalldeviations from zero and hence are uninformative As beforewe reduced the number of columns of S by removing columnswith very low variance Here one option is to remove allcolumns with a variance below a preset threshold 120579 Howeverto avoid adding a threshold as a free parameter we setthe number of columns retained to the number of differenttriphones in a given dataset a number which is around 119899 =4500 In summary S denotes a 119908times119899 matrix that specifies foreach of 119908 words an 119899-dimensional semantic vector

52 Model Performance

521 Performance on Monolexomic Words We first exam-ined model performance on a dataset comprising monolex-omicwords that did not carry any inflectional exponentsThisdataset of 3987words comprised three irregular comparatives

(elder less andmore) two irregular superlatives (least most)28 irregular past tense forms and one irregular past participle(smelt) For this set of words we constructed a 3987 times 4446matrix of semantic vectors S119898 (a submatrix of the S matrixintroduced above in Section 3) and a 3987 times 4446 triphonematrix T119898 (a submatrix of T) The number of colums of S119898was set to the number of different triphones (the columndimension of T119898) Those column vectors of S119898 were retainedthat had the 4446 highest variances We then estimated thetransformation matrix G and used this matrix to predict thetriphones that define wordsrsquo forms

We evaluated model performance in two ways First weinspected whether the triphones with maximal support wereindeed the targeted triphones This turned out to be the casefor all words Targeted triphones had an activation valueclose to one and nontargeted triphones an activation closeto zero As the triphones are not ordered we also investi-gated whether the sequence of phones could be constructedcorrectly for these words To this end we set a thresholdof 099 extracted all triphones with an activation exceedingthis threshold and used the all simple paths (a path issimple if the vertices it visits are not visited more than once)function from the igraph package [127] to calculate all pathsstarting with any left-edge triphone in the set of extractedtriphones From the resulting set of paths we selected the

22 Complexity

longest path which invariably was perfectly aligned with thesequence of triphones that defines wordsrsquo forms

We also evaluated model performance with a secondmore general heuristic algorithm that also makes use of thesame algorithm from graph theory Our algorithm sets up agraph with vertices collected from the triphones that are bestsupported by the relevant semantic vectors and considers allpaths it can find that lead from an initial triphone to a finaltriphoneThis algorithm which is presented inmore detail inthe appendix and which is essential for novel complex wordsproduced the correct form for 3982 out of 3987 words Itselected a shorter form for five words int for intent lin forlinnen mis for mistress oint for ointment and pin for pippinThe correct forms were also found but ranked second due toa simple length penalty that is implemented in the algorithm

From these analyses it is clear that mapping nearly4000 semantic vectors on their corresponding triphone pathscan be accomplished with very high accuracy for Englishmonolexomic words The question to be addressed nextis how well this approach works for complex words Wefirst address inflected forms and limit ourselves here to theinflected variants of the present set of 3987 monolexomicwords

522 Performance on Inflected Words Following the classicdistinction between inflection and word formation inflectedwords did not receive semantic vectors of their own Never-theless we can create semantic vectors for inflected words byadding the semantic vector of an inflectional function to thesemantic vector of its base However there are several ways inwhich themapping frommeaning to form for inflectedwordscan be set up To explain this we need some further notation

Let S119898 and T119898 denote the submatrices of S and T thatcontain the semantic and triphone vectors of monolexomicwords Assume that a subset of 119896 of these monolexomicwords is attested in the training corpus with inflectionalfunction 119886 Let T119886 denote the matrix with the triphonevectors of these inflected words and let S119886 denote thecorresponding semantic vectors To obtain S119886 we take thepertinent submatrix S119898119886 from S119898 and add the semantic vectors119886 of the affix

S119886 = S119898119886 + i⨂ s119886 (27)

Here i is a unit vector of length 119896 and ⨂ is the generalizedKronecker product which in (27) stacks 119896 copies of s119886 row-wise As a first step we could define a separate mapping G119886for each inflectional function 119886

S119886G119886 = T119886 (28)

but in this set-up learning of inflected words does not benefitfrom the knowledge of the base words This can be remediedby a mapping for augmented matrices that contain the rowvectors for both base words and inflected words[S119898

S119886]G119886 = [T119898

T119886] (29)

The dimension of G119886 (length of semantic vector by lengthof triphone vector) remains the same so this option is notmore costly than the preceding one Nevertheless for eachinflectional function a separate large matrix is required Amuch more parsimonious solution is to build augmentedmatrices for base words and all inflected words jointly

[[[[[[[[[[S119898S1198861S1198862S119886119899

]]]]]]]]]]G = [[[[[[[[[[

T119898T1198861T1198862T119886119899

]]]]]]]]]](30)

The dimension of G is identical to that of G119886 but now allinflectional functions are dealt with by a single mappingIn what follows we report the results obtained with thismapping

We selected 6595 inflected variants which met the cri-terion that the frequency of the corresponding inflectionalfunction was at least 50 This resulted in a dataset with 91comparatives 97 superlatives 2401 plurals 1333 continuousforms (eg walking) 859 past tense forms and 1086 formsclassed as perfective (past participles) as well as 728 thirdperson verb forms (eg walks) Many forms can be analyzedas either past tenses or as past participles We followed theanalyses of the treetagger which resulted in a dataset inwhichboth inflectional functions are well-attested

Following (30) we obtained an (augmented) 10582 times5483 semantic matrix S whereas before we retained the 5483columns with the highest column variance The (augmented)triphone matrix T for this dataset had the same dimensions

Inspection of the activations of the triphones revealedthat targeted triphones had top activations for 85 of themonolexomic words and 86 of the inflected words Theproportion of words with at most one intruding triphone was97 for both monolexomic and inflected words The graph-based algorithm performed with an overall accuracy of 94accuracies broken down bymorphology revealed an accuracyof 99 for the monolexomic words and an accuracy of 92for inflected words One source of errors for the productionalgorithm is inconsistent coding in the celex database Forinstance the stem of prosper is coded as having a final schwafollowed by r but the inflected forms are coded without ther creating a mismatch between a partially rhotic stem andcompletely non-rhotic inflected variants

We next put model performance to a more stringent testby using 10-fold cross-validation for the inflected variantsFor each fold we trained on all stems and 90 of all inflectedforms and then evaluated performance on the 10of inflectedforms that were not seen in training In this way we can ascer-tain the extent to which our production system (network plusgraph-based algorithm for ordering triphones) is productiveWe excluded from the cross-validation procedure irregularinflected forms forms with celex phonological forms withinconsistent within-paradigm rhoticism as well as forms thestem of which was not available in the training set Thus

Complexity 23

cross-validation was carried out for a total of 6236 inflectedforms

As before the semantic vectors for inflected words wereobtained by addition of the corresponding content and inflec-tional semantic vectors For each training set we calculatedthe transformationmatrixG from the S andTmatrices of thattraining set For an out-of-bag inflected form in the test setwe calculated its semantic vector and multiplied this vectorwith the transformation matrix (using (21)) to obtain thepredicted triphone vector t

The proportion of forms that were predicted correctlywas 062 The proportion of forms ranked second was 025Forms that were incorrectly ranked first typically were otherinflected forms (including bare stems) that happened toreceive stronger support than the targeted form Such errorsare not uncommon in spoken English For instance inthe Buckeye corpus [110] closest is once reduced to closFurthermore Dell [90] classified forms such as concludementfor conclusion and he relax for he relaxes as (noncontextual)errors

The algorithm failed to produce the targeted form for3 of the cases Examples of the forms produced instead arethe past tense for blazed being realized with [zId] insteadof [zd] the plural mouths being predicted as having [Ts] ascoda rather than [Ds] and finest being reduced to finst Thevoiceless production formouth does not follow the dictionarynorm but is used as attested by on-line pronunciationdictionaries Furthermore the voicing alternation is partlyunpredictable (see eg [128] for final devoicing in Dutch)and hence model performance here is not unreasonable Wenext consider model accuracy for derived words

523 Performance on Derived Words When building thevector space model we distinguished between inflectionand word formation Inflected words did not receive theirown semantic vectors By contrast each derived word wasassigned its own lexome together with a lexome for itsderivational function Thus happiness was paired with twolexomes happiness and ness Since the semantic matrix forour dataset already contains semantic vectors for derivedwords we first investigated how well forms are predictedwhen derived words are assessed along with monolexomicwords without any further differentiation between the twoTo this end we constructed a semantic matrix S for 4885words (rows) by 4993 (columns) and constructed the trans-formation matrix G from this matrix and the correspondingtriphone matrix T The predicted form vectors of T =SG supported the targeted triphones above all other tri-phones without exception Furthermore the graph-basedalgorithm correctly reconstructed 99 of all forms withonly 5 cases where it assigned the correct form secondrank

Next we inspected how the algorithm performs whenthe semantic vectors of derived forms are obtained fromthe semantic vectors of their base words and those of theirderivational lexomes instead of using the semantic vectors ofthe derived words themselves To allow subsequent evalua-tion by cross-validation we selected those derived words that

contained an affix that occured at least 30 times in our data set(again (38) agent (177) ful (45) instrument (82) less(54) ly (127) and ness (57)) to a total of 770 complex words

We first combined these derived words with the 3987monolexomic words For the resulting 4885 words the Tand S matrices were constructed from which we derivedthe transformation matrix G and subsequently the matrixof predicted triphone strengths T The proportion of wordsfor which the targeted triphones were the best supportedtriphones was 096 and the graph algorithm performed withan accuracy of 989

In order to assess the productivity of the system weevaluated performance on derived words by means of 10-fold cross-validation For each fold we made sure that eachaffix was present proportionally to its frequency in the overalldataset and that a derived wordrsquos base word was included inthe training set

We first examined performance when the transformationmatrix is estimated from a semantic matrix that contains thesemantic vectors of the derived words themselves whereassemantic vectors for unseen derived words are obtained bysummation of the semantic vectors of base and affix It turnsout that this set-up results in a total failure In the presentframework the semantic vectors of derived words are tooidiosyncratic and too finely tuned to their own collocationalpreferences They are too scattered in semantic space tosupport a transformation matrix that supports the triphonesof both stem and affix

We then used exactly the same cross-validation procedureas outlined above for inflected words constructing semanticvectors for derived words from the semantic vectors of baseand affix and calculating the transformation matrix G fromthese summed vectors For unseen derivedwordsGwas usedto transform the semantic vectors for novel derived words(obtained by summing the vectors of base and affix) intotriphone space

For 75 of the derived words the graph algorithmreconstructed the targeted triphones For 14 of the derivednouns the targeted form was not retrieved These includecases such as resound which the model produced with [s]instead of the (unexpected) [z] sewer which the modelproduced with R instead of only R and tumbler where themodel used syllabic [l] instead of nonsyllabic [l] given in thecelex target form

53 Weak Links in the Triphone Graph and Delays in SpeechProduction An important property of the model is that thesupport for trigrams changes where a wordrsquos graph branchesout for different morphological variants This is illustratedin Figure 9 for the inflected form blending Support for thestem-final triphone End is still at 1 but then blend can endor continue as blends blended or blending The resultinguncertainty is reflected in the weights on the edges leavingEnd In this example the targeted ing form is driven by theinflectional semantic vector for continuous and hence theedge to ndI is best supported For other forms the edgeweights will be different and hence other paths will be bettersupported

24 Complexity

1

1

1051

024023

03

023

1

001 001

1

02

03

023

002

1 0051

bl

blE

lEn

EndndI

dIN

IN

INI

NIN

dId

Id

IdI

dII

IIN

ndz

dz

dzI

zIN

nd

EnI

nIN

lEI

EIN

Figure 9 The directed graph for blending Vertices represent triphones including triphones such as IIN that were posited by the graphalgorithm to bridge potentially relevant transitions that are not instantiated in the training data Edges are labelled with their edge weightsThe graph the edge weights of which are specific to the lexomes blend and continuous incorporates not only the path for blending butalso the paths for blend blends and blended

The uncertainty that arises at branching points in thetriphone graph is of interest in the light of several exper-imental results For instance inter keystroke intervals intyping become longer at syllable and morph boundaries[129 130] Evidence from articulography suggests variabilityis greater at morph boundaries [131] Longer keystroke exe-cution times and greater articulatory variability are exactlywhat is expected under reduced edge support We thereforeexamined whether the edge weights at the first branchingpoint are predictive for lexical processing To this end weinvestigated the acoustic duration of the segment at the centerof the first triphone with a reduced LDL edge weight (in thepresent example d in ndI henceforth lsquobranching segmentrsquo)for those words in our study that are attested in the Buckeyecorpus [110]

The dataset we extracted from the Buckeye corpus com-prised 15105 tokens of a total of 1327 word types collectedfrom 40 speakers For each of these words we calculatedthe relative duration of the branching segment calculatedby dividing segment duration by word duration For thepurposes of statistical evaluation the distribution of thisrelative duration was brought closer to normality bymeans ofa logarithmic transformationWe fitted a generalized additivemixed model [132] to log relative duration with randomintercepts for word and speaker log LDL edge weight aspredictor of interest and local speech rate (Phrase Rate)log neighborhood density (Ncount) and log word frequencyas control variables A summary of this model is presentedin Table 7 As expected an increase in LDL Edge Weight

goes hand in hand with a reduction in the duration of thebranching segment In other words when the edge weight isreduced production is slowed

The adverse effects of weaknesses in a wordrsquos path inthe graph where the path branches is of interest againstthe discussion about the function of weak links in diphonetransitions in the literature on comprehension [133ndash138] Forcomprehension it has been argued that bigram or diphonelsquotroughsrsquo ie low transitional probabilities in a chain of hightransitional probabilities provide points where sequencesare segmented and parsed into their constituents Fromthe perspective of discrimination learning however low-probability transitions function in exactly the opposite wayfor comprehension [54 124] Naive discriminative learningalso predicts that troughs should give rise to shorter process-ing times in comprehension but not because morphologicaldecomposition would proceed more effectively Since high-frequency boundary bigrams and diphones are typically usedword-internally across many words they have a low cuevalidity for thesewords Conversely low-frequency boundarybigrams are much more typical for very specific base+affixcombinations and hence are better discriminative cues thatafford enhanced activation of wordsrsquo lexomes This in turngives rise to faster processing (see also Ramscar et al [78] forthe discriminative function of low-frequency bigrams in low-frequency monolexomic words)

There is remarkable convergence between the directedgraphs for speech production such as illustrated in Figure 9and computational models using temporal self-organizing

Complexity 25

Table 7 Statistics for the partial effects of a generalized additive mixed model fitted to the relative duration of edge segments in the Buckeyecorpus The trend for log frequency is positive accelerating TPRS thin plate regression spline

A parametric coefficients Estimate Std Error t-value p-valueIntercept -19165 00592 -323908 lt 00001Log LDL Edge Weight -01358 00400 -33960 00007Phrase Rate 00047 00021 22449 00248Log Ncount 01114 00189 58837 lt 00001B smooth terms edf Refdf F-value p-valueTPRS log Frequency 19007 19050 75495 00004random intercepts word 11498839 13230000 314919 lt 00001random intercepts speaker 302995 390000 345579 lt 00001

maps (TSOMs [139ndash141]) TSOMs also use error-drivenlearning and in recent work attention is also drawn toweak edges in wordsrsquo paths in the TSOM and the conse-quences thereof for lexical processing [142] We are not usingTSOMs however one reason being that in our experiencethey do not scale up well to realistically sized lexiconsA second reason is that we are pursuing the hypothesisthat form serves meaning and that self-organization ofform by itself is not necessary This hypothesis is likelytoo strong especially as we do not provide a rationale forthe trigram and triphone units that we use to representaspects of form It is worth noting that spatial organizationof form features is not restricted to TSOMs Although inour approach the spatial organization of triphone features isleft unspecified such an organization can be easily enforced(see [85] for further details) by using an algorithm suchas graphopt which self-organizes the vertices of a graphinto a two-dimensional plane Figure 9 was obtained withthis algorithm (Graphopt was developed for the layout oflarge graphs httpwwwschmuhlorggraphopt andis implemented in the igraph package Graphopt uses basicprinciples of physics to iteratively determine an optimallayout Each node in the graph is given both mass and anelectric charge and edges between nodes are modeled asspringsThis sets up a system inwhich there are attracting andrepelling forces between the vertices of the graph and thisphysical system is simulated until it reaches an equilibrium)

54 Discussion Our production network reconstructsknown wordsrsquo forms with a high accuracy 999 formonolexomic words 92 for inflected words and 99 forderived words For novel complex words accuracy under10-fold cross-validation was 62 for inflected words and 75for derived words

The drop in performance for novel forms is perhapsunsurprising given that speakers understand many morewords than they themselves produce even though they hearnovel forms on a fairly regular basis as they proceed throughlife [79 143]

However we also encountered several technical problemsthat are not due to the algorithm but to the representationsthat the algorithm has had to work with First it is surprisingthat accuracy is as high as it is given that the semanticvectors are constructed from a small corpus with a very

simple discriminative algorithm Second we encounteredinconsistencies in the phonological forms retrieved from thecelex database inconsistencies that in part are due to theuse of discrete triphones Third many cases where the modelpredictions do not match the targeted triphone sequence thetargeted forms have a minor irregularity (eg resound with[z] instead of [s]) Fourth several of the typical errors that themodel makes are known kinds of speech errors or reducedforms that one might encounter in engaged conversationalspeech

It is noteworthy that themodel is almost completely data-driven The triphones are derived from celex and otherthan somemanual corrections for inconsistencies are derivedautomatically from wordsrsquo phone sequences The semanticvectors are based on the tasa corpus and were not in any wayoptimized for the production model Given the triphone andsemantic vectors the transformation matrix is completelydetermined No by-hand engineering of rules and exceptionsis required nor is it necessary to specify with hand-codedlinks what the first segment of a word is what its secondsegment is etc as in the weaver model [37] It is only in theheuristic graph algorithm that three thresholds are requiredin order to avoid that graphs become too large to remaincomputationally tractable

The speech production system that emerges from thisapproach comprises first of all an excellent memory forforms that have been encountered before Importantly thismemory is not a static memory but a dynamic one It is not arepository of stored forms but it reconstructs the forms fromthe semantics it is requested to encode For regular unseenforms however a second network is required that projects aregularized semantic space (obtained by accumulation of thesemantic vectors of content and inflectional or derivationalfunctions) onto the triphone output space Importantly itappears that no separate transformations or rules are requiredfor individual inflectional or derivational functions

Jointly the two networks both of which can also betrained incrementally using the learning rule ofWidrow-Hoff[44] define a dual route model attempts to build a singleintegrated network were not successfulThe semantic vectorsof derived words are too idiosyncratic to allow generalizationfor novel forms It is the property of the semantic vectorsof derived lexomes described in Section 323 namely thatthey are close to but not inside the cloud of their content

26 Complexity

lexomes which makes them productive Novel forms do notpartake in the idiosyncracies of lexicalized existing wordsbut instead remain bound to base and derivational lexomesIt is exactly this property that makes them pronounceableOnce produced a novel form will then gravitate as experi-ence accumulates towards the cloud of semantic vectors ofits morphological category meanwhile also developing itsown idiosyncracies in pronunciation including sometimeshighly reduced pronunciation variants (eg tyk for Dutchnatuurlijk ([natyrlk]) [111 144 145] It is perhaps possibleto merge the two routes into one but for this much morefine-grained semantic vectors are required that incorporateinformation about discourse and the speakerrsquos stance withrespect to the adressee (see eg [115] for detailed discussionof such factors)

6 Bringing in Time

Thus far we have not considered the role of time The audiosignal comes in over time longerwords are typically readwithmore than one fixation and likewise articulation is a temporalprocess In this section we briefly outline how time can bebrought into the model We do so by discussing the readingof proper names

Proper names (morphologically compounds) pose achallenge to compositional theories as the people referredto by names such as Richard Dawkins Richard NixonRichardThompson and SandraThompson are not obviouslysemantic composites of each other We therefore assignlexomes to names irrespective of whether the individu-als referred to are real or fictive alive or dead Further-more we assume that when the personal name and familyname receive their own fixations both names are under-stood as pertaining to the same named entity which istherefore coupled with its own unique lexome Thus forthe example sentences in Table 8 the letter trigrams ofJohn are paired with the lexome JohnClark and like-wise the letter trigrams of Clark are paired with this lex-ome

By way of illustration we obtained thematrix of semanticvectors training on 100 randomly ordered tokens of thesentences of Table 8 We then constructed a trigram cuematrix C specifying for each word (John Clark wrote )which trigrams it contains In parallel a matrix L specify-ing for each word its corresponding lexomes (JohnClarkJohnClark write ) was set up We then calculatedthe matrix F by solving CF = L and used F to cal-culate estimated (predicted) semantic vectors L Figure 10presents the correlations of the estimated semantic vectorsfor the word forms John John Welsch Janet and Clarkwith the targeted semantic vectors for the named entitiesJohnClark JohnWelsch JohnWiggam AnneHastieand JaneClark For the sequentially read words John andWelsch the semantic vector generated by John and thatgenerated by Welsch were summed to obtain the integratedsemantic vector for the composite name

Figure 10 upper left panel illustrates that upon readingthe personal name John there is considerable uncertainty

about which named entity is at issue When subsequentlythe family name is read (upper right panel) uncertainty isreduced and John Welsch now receives full support whereasother named entities have correlations close to zero or evennegative correlations As in the small world of the presenttoy example Janet is a unique personal name there is nouncertainty about what named entity is at issue when Janetis read For the family name Clark on its own by contrastJohn Clark and Janet Clark are both viable options

For the present example we assumed that every ortho-graphic word received one fixation However depending onwhether the eye lands far enough into the word namessuch as John Clark and compounds such as graph theorycan also be read with a single fixation For single-fixationreading learning events should comprise the joint trigrams ofthe two orthographic words which are then simultaneouslycalibrated against the targeted lexome (JohnClark graph-theory)

Obviously the true complexities of reading such asparafoveal preview are not captured by this example Never-theless as a rough first approximation this example showsa way forward for integrating time into the discriminativelexicon

7 General Discussion

We have presented a unified discrimination-driven modelfor visual and auditory word comprehension and wordproduction the discriminative lexicon The layout of thismodel is visualized in Figure 11 Input (cochlea and retina)and output (typing and speaking) systems are presented inlight gray the representations of themodel are shown in darkgray For auditory comprehension we make use of frequencyband summary (FBS) features low-level cues that we deriveautomatically from the speech signal The FBS features serveas input to the auditory network F119886 that maps vectors of FBSfeatures onto the semantic vectors of S For reading lettertrigrams represent the visual input at a level of granularitythat is optimal for a functional model of the lexicon (see [97]for a discussion of lower-level visual features) Trigrams arerelatively high-level features compared to the FBS featuresFor features implementing visual input at a much lower levelof visual granularity using histograms of oriented gradients[146] see Linke et al [15] Letter trigrams are mapped by thevisual network F119900 onto S

For reading we also implemented a network heredenoted by K119886 that maps trigrams onto auditory targets(auditory verbal images) represented by triphones Theseauditory triphones in turn are mapped by network H119886onto the semantic vectors S This network is motivated notonly by our observation that for predicting visual lexicaldecision latencies triphones outperform trigrams by a widemargin Network H119886 is also part of the control loop forspeech production see Hickok [147] for discussion of thiscontrol loop and the necessity of auditory targets in speechproduction Furthermore the acoustic durations with whichEnglish speakers realize the stem vowels of English verbs[148] as well as the duration of word final s in English [149]

Complexity 27

Table 8 Example sentences for the reading of proper names To keep the example simple inflectional lexomes are not taken into account

John Clark wrote great books about antsJohn Clark published a great book about dangerous antsJohn Welsch makes great photographs of airplanesJohn Welsch has a collection of great photographs of airplanes landingJohn Wiggam will build great harpsichordsJohn Wiggam will be a great expert in tuning harpsichordsAnne Hastie has taught statistics to great second year studentsAnne Hastie has taught probability to great first year studentsJanet Clark teaches graph theory to second year studentsJohnClark write great book about antJohnClark publish a great book about dangerous antJohnWelsch make great photograph of airplaneJohnWelsch have a collection of great photograph airplane landingJohnWiggam future build great harpsichordJohnWiggam future be a great expert in tuning harpsichordAnneHastie have teach statistics to great second year studentAnneHastie have teach probability to great first year studentJanetClark teach graphtheory to second year student

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

John

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

John Welsch

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

Janet

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

Clark

Pear

son

corr

elat

ion

minus0

50

00

51

0

Figure 10 Correlations of estimated semantic vectors for John JohnWelsch Janet andClarkwith the targeted semantic vectors of JohnClarkJohnWelsch JohnWiggam AnneHastie and JaneClark

can be predicted from ndl networks using auditory targets todiscriminate between lexomes [150]

Speech production is modeled by network G119886 whichmaps semantic vectors onto auditory targets which in turnserve as input for articulation Although current modelsof speech production assemble articulatory targets fromphonemes (see eg [126 147]) a possibility that is certainlycompatible with our model when phones are recontextual-ized as triphones we think that it is worthwhile investigatingwhether a network mapping semantic vectors onto vectors ofarticulatory parameters that in turn drive the articulators canbe made to work especially since many reduced forms arenot straightforwardly derivable from the phone sequences oftheir lsquocanonicalrsquo forms [111 144]

We have shown that the networks of our model havereasonable to high production and recognition accuraciesand also generate a wealth of well-supported predictions forlexical processing Central to all networks is the hypothesisthat the relation between form and meaning can be modeleddiscriminatively with large but otherwise surprisingly simplelinear networks the underlyingmathematics of which (linearalgebra) is well-understood Given the present results it isclear that the potential of linear algebra for understandingthe lexicon and lexical processing has thus far been severelyunderestimated (the model as outlined in Figure 11 is amodular one for reasons of computational convenience andinterpretabilityThe different networks probably interact (seefor some evidence [47]) One such interaction is explored

28 Complexity

typing speaking

orthographic targets

trigrams

auditory targets

triphones

semantic vectors

TASA

auditory cues

FBS features

orthographic cues

trigrams

cochlea retina

To Ta

Ha

GoGa

KaS

Fa Fo

CoCa

Figure 11 Overview of the discriminative lexicon Input and output systems are presented in light gray the representations of the model areshown in dark gray Each of these representations is a matrix with its own set of features Arcs are labeled with the discriminative networks(transformationmatrices) thatmapmatrices onto each otherThe arc from semantic vectors to orthographic vectors is in gray as thismappingis not explored in the present study

in Baayen et al [85] In their production model for Latininflection feedback from the projected triphone paths backto the semantics (synthesis by analysis) is allowed so thatof otherwise equally well-supported paths the path that bestexpresses the targeted semantics is selected For informationflows between systems in speech production see Hickok[147]) In what follows we touch upon a series of issues thatare relevant for evaluating our model

Incremental learning In the present study we esti-mated the weights on the connections of these networkswith matrix operations but importantly these weights canalso be estimated incrementally using the learning rule ofWidrow and Hoff [44] further improvements in accuracyare expected when using the Kalman filter [151] As allmatrices can be updated incrementally (and this holds aswell for the matrix with semantic vectors which are alsotime-variant) and in theory should be updated incrementally

whenever information about the order of learning events isavailable the present theory has potential for studying lexicalacquisition and the continuing development of the lexiconover the lifetime [79]

Morphology without compositional operations Wehave shown that our networks are predictive for a rangeof experimental findings Important from the perspective ofdiscriminative linguistics is that there are no compositionaloperations in the sense of Frege and Russell [1 2 152] Thework on compositionality in logic has deeply influencedformal linguistics (eg [3 153]) and has led to the beliefthat the ldquoarchitecture of the language facultyrdquo is groundedin a homomorphism between a calculus (or algebra) ofsyntactic representations and a calculus (or algebra) based onsemantic primitives Within this tradition compositionalityarises when rules combining representations of form arematched with rules combining representations of meaning

Complexity 29

The approach of Marelli and Baroni [60] who derive thesemantic vector of a complex word from the semantic vectorof its base and a dedicated affix-specific mapping from thepertinent base words to the corresponding derived wordsis in the spirit of compositionality in that a change in form(through affixation) goes hand in hand with a change inmeaning (modeled with a linear mapping in semantic space)

The perspective on morphological processing developedin the present study breaks with this tradition Althoughsemantic vectors for inflected words are obtained by summa-tion of the semantic vectors of the base word and those ofits inflectional functions (see [85] for modeling of the richverbal paradigms of Latin) the form vectors correspondingto the resulting semantic vectors are not obtained by thesummation of vectors for chunks of form nor by any otheroperations on forms Attempts to align chunks of wordsrsquo formwith their semantic structures mdash typically represented bygraphs constructed over semantic primitives (see eg [154])mdash are destined to end in quagmires of arbitrary decisionsabout the number of inflectional classes required (Estonian30 or 300) what the chunks should look like (does Germanhave a morpheme d in its articles and demonstratives derdie das diese ) whether Semitic morphology requiresoperations on form on one or two tiers [155ndash157] andhow many affixal position slots are required for Athebascanlanguages such asNavajo [158] A language that is particularlypainful for compositional theories is Yelı Dnye an Indo-Pacific language spoken on Rossel island (in the south-east ofPapuaNewGuinea)The language has a substantial inventoryof over a 1000 function words used for verb agreementTypically these words are monosyllables that simultaneouslyspecify values for negation tense aspect person and numberof subject deixis evidentiality associated motion and coun-terfactuality Furthermore verbs typically have no less thaneight different stems the use of which depends on particularconstellations of inflectional functions [159 160] Yelı Dnyethus provides a striking counterexample to the hypothesisthat a calculus of meaning would be paralleled by a calculusof form

Whereas the theory of linear discriminative learningradically rejects the idea that morphological processing isgrounded in compositionality as defined in mainstream for-mal linguistic theory it does allow for the conjoint expressionin form of lexical meanings and inflectional functions andfor forms to map onto such conjoint semantic vectors Butjust asmultiple inflectional functions can be expressed jointlyin a single word multiple words can map onto nonconjointsemantic vectors as was illustrated above for named entityrecognition John Wiggam can be understood to refer to avery specific person without this person being a composi-tional function of the proper name John and the surnameWiggam In both these cases there is no isomorphy betweeninflectional functions and lexical meanings on the one handand chunks of form on the other hand

Of course conjoint vectors for lexical meanings andinflectional functions can be implemented andmade to workonly because a Latin verb form such as sapıvissemus isunderstood when building the semantic space S to expresslexomes for person (first) number (plural) voice (active)

tense (pluperfect) mood (subjunctive) and lexical meaning(to know) (see [85] for computational modeling details)Although this could loosely be described as a decomposi-tional approach to inflected words we prefer not to use theterm lsquodecompositionalrsquo as in formal linguistics this termas outlined above has a very different meaning A moreadequate terminology would be that the system is conjointrealizational for production and conjoint inferential for com-prehension (we note here that we have focused on temporallyconjoint realization and inference leaving temporally disjointrealization and inference for sequences of words and possiblycompounds and fixed expression for further research)

Naive versus linear discriminative learning There isone important technical difference between the learningengine of the current study ldl and naive discriminativelearning (ndl) Because ndl makes the simplifying assump-tion that outcomes are orthogonal the weights from the cuesto a given outcome are independent of the weights from thecues to the other outcomes This independence assumptionwhich motivates the word naive in naive discriminativelearning makes it possible to very efficiently update networkweights during incremental learning In ldl by contrast thisindependence no longer holds learning is no longer naiveWe therefore refer to our networks not as ndl networksbut simply as linear discriminative learning networks Actualincremental updating of ldl networks is computationallymore costly but further numerical optimization is possibleand implementations are underway

Scaling An important property of the present approachis that good results are obtained already for datasets ofmodestsize Our semantic matrix is derived from the tasa corpuswhich with only 10 million words is dwarfed by the 28 billionwords of the ukWaC corpus used by Marelli and Baroni [60]more words than any individual speaker can ever encounterin their lifetime Similarly Arnold et al [18] report goodresults for word identification when models are trained on20 hours of speech which contrasts with the huge volumesof speech required for training deep learning networks forspeech recognition Although performance increases some-whatwithmore training data [161] it is clear that considerableheadway can be made with relatively moderate volumesof speech It thus becomes feasible to train models onfor instance Australian English and New Zealand Englishusing already existing speech corpora and to implementnetworks that make precise quantitative predictions for howunderstanding is mediated by listenersrsquo expectations abouttheir interlocutors (see eg [162])

In this study we have obtained good results with asemantic matrix with a dimensionality of around 4000lexomes By itself we find it surprising that a semantic vectorspace can be scaffolded with a mere 4000 words But thisfindingmay also shed light on the phenomenon of childhoodamnesia [163 164] Bauer and Larkina [165] pointed outthat young children have autobiographical memories thatthey may not remember a few years later They argue thatthe rate of forgetting diminishes as we grow older andstabilizes in adulthood Part of this process of forgettingmay relate to the changes in the semantic matrix Cruciallywe expect the correlational structure of lexomes to change

30 Complexity

considerably over time as experience accumulates If this isindeed the case the semantic vectors for the lexomes foryoung children are different from the corresponding lexomesfor older children which will again differ from those ofadults As a consequence autobiographical memories thatwere anchored to the lexomes at a young age can no longerbe accessed as the semantic vectors to which these memorieswere originally anchored no longer exist

No examplars Given how we estimate the transforma-tion matrices with which we navigate between form andmeaning it might seem that our approach is in essence animplementation of an exemplar-based theory of the mentallexicon After all for instance the C119886 matrix specifies foreach auditory word form (exemplar) its pertinent frequencyband summary features However the computational imple-mentation of the present study should not be confused withthe underlying algorithmic conceptualization The underly-ing conceptualization is that learning proceeds incrementallytrial by trial with the weights of transformation networksbeing updated with the learning rule of Widrow and Hoff[44] In the mean trial-by-trial updating with the Widrow-Hoff learning rule results in the same expected connectionweights as obtained by the present estimates using thegeneralized inverse The important conceptual advantage ofincremental learning is however that there is no need to storeexemplars By way of example for auditory comprehensionacoustic tokens are given with the input to the learningsystemThese tokens the ephemeral result of interactingwiththe environment leave their traces in the transformationmatrix F119886 but are not themselves stored The same holdsfor speech production We assume that speakers dynamicallyconstruct the semantic vectors from their past experiencesrather than retrieving these semantic vectors from somedictionary-like fixed representation familiar from currentfile systems The dynamic nature of these semantic vectorsemerged loud and clear from the modeling of novel complexwords for speech production under cross-validation In otherwords our hypothesis is that all input and output vectors areephemeral in the sense that they represent temporary statesof the cognitive system that are not themselves represented inthat system but that leave their traces in the transformationmatrices that jointly characterize and define the cognitivesystem that we approximate with the discriminative lexicon

This dynamic perspective on the mental lexicon asconsisting of several coupled networks that jointly bringthe cognitive system into states that in turn feed into fur-ther networks (not modeled here) for sentence integration(comprehension) and articulation (production) raises thequestion about the status of units in this theory Units playan important role when constructing numeric vectors forform and meaning Units for letter trigrams and phonetrigrams are of course context-enriched letter and phoneunits Content lexomes as well as lexomes for derivationaland inflectional functions are crutches for central semanticfunctions ranging from functions realizing relatively concreteonomasiological functions (lsquothis is a dog not a catrsquo) toabstract situational functions allowing entities events andstates to be specified on the dimensions of time spacequantity aspect etc However crucial to the central thrust of

the present study there are no morphological symbols (unitscombining form andmeaning) normorphological formunitssuch as stems and exponents of any kind in the modelAbove we have outlined the many linguistic considerationsthat have led us not to want to incorporate such units aspart of our theory Importantly as there are no hidden layersin our model it is not possible to seek for lsquosubsymbolicrsquoreflexes of morphological units in patterns of activation overhidden layers Here we depart from earlier connectionistmodels which rejected symbolic representations but retainedthe belief that there should be morphological representationalbeit subsymbolic ones Seidenberg and Gonnerman [166]for instance discuss morphological representations as beinglsquointerlevel representationsrsquo that are emergent reflections ofcorrelations among orthography phonology and semanticsThis is not to say that themore agglutinative a language is themore the present model will seem to the outside observer tobe operating withmorphemes But themore a language tendstowards fusional or polysyntheticmorphology the less strongthis impression will be

Model complexity Next consider the question of howdifficult lexical processing actually is Sidestepping the com-plexities of the neural embedding of lexical processing [126147] here we narrow down this question to algorithmiccomplexity State-of-the-art models in psychology [37 114167] implementmany layers of hierarchically organized unitsand many hold it for an established fact that such units arein some sense psychologically real (see eg [19 91 168])However empirical results can be equally well understoodwithout requiring the theoretical construct of the morpheme(see eg [47 54 58 166 169 170]) The present studyillustrates this point for lsquoboundaryrsquo effects in written andoral production When paths in the triphone graph branchuncertainty increases and processing slows down Althoughobserved lsquoboundaryrsquo effects are compatible with a post-Bloomfieldian lexicon the very same effects arise in ourmodel even though morphemes do not figure in any way inour computations

Importantly the simplicity of the linear mappings in ourmodel of the discriminative lexicon allows us to sidestep theproblem that typically arise in computationally implementedfull-scale hierarchical systems namely that errors arisingat lower levels propagate to higher levels The major stepsforward made in recent end-to-end models in languageengineering suggest that end-to-end cognitive models arealso likely to perform much better One might argue thatthe hidden layers of deep learning networks represent thetraditional lsquohidden layersrsquo of phonemes morphemes andword forms mediating between form and meaning in lin-guistic models and offshoots thereof in psychology Howeverthe case of baboon lexical learning [12 13] illustrates thatthis research strategy is not without risk and that simplerdiscriminative networks can substantially outperform deeplearning networks [15] We grant however that it is conceiv-able that model predictions will improve when the presentlinear networks are replaced by deep learning networks whenpresented with the same input and output representationsused here (see Zhao et al [171] for a discussion of lossfunctions for deep learning targeting numeric instead of

Complexity 31

categorical output vectors) What is unclear is whethersuch networks will improve our understandingmdashthe currentlinear networks offer the analyst connection weights betweeninput and output representations that are straightforwardlylinguistically interpretable Where we think deep learningnetworks really will come into their own is bridging the gapbetween for instance the cochlea and retina on the one handand the heuristic representations (letter trigrams) that wehave used as a starting point for our functional theory of themappings between form and meaning

Network flexibility An important challenge for deeplearning networks designed to model human lexical process-ing is to keep the networks flexible and open to rapid updat-ing This openness to continuous learning is required notonly by findings on categorization [40 41 41 42] includingphonetic categorization [172 173] but is also visible in timeseries of reaction times Baayen and Hendrix [17] reportedimproved prediction of visual lexical decision reaction timesin the British Lexicon Project when a naive discriminativelearning network is updated from trial to trial simulating thewithin-experiment learning that goes on as subjects proceedthrough the experiment In natural speech the phenomenonof phonetic convergence between interlocutors [174] likewisebears witness to a lexicon that is constantly recalibratedin interaction with its environment (for the many ways inwhich interlocutorsrsquo speech aligns see [175]) Likewise therapidity with which listeners adapt to foreign accents [176]within a few minutes shows that lexical networks exhibitfast local optimization As deep learning networks typicallyrequire huge numbers of cycles through the training dataonce trained they are at risk of not having the requiredflexibility for fast local adaptation This risk is reduced for(incremental) wide learning as illustrated by the phoneticconvergence evidenced by the model of Arnold et al [18] forauditory comprehension (a very similar point was made byLevelt [177] in his critique of connectionist models in thenineties)

Open questions This initial study necessarily leavesmany questions unanswered Compounds and inflectedderived words have not been addressed and morphologicaloperations such as reduplication infixation and nonconcate-native morphology provide further testing grounds for thepresent approach We also need to address the processingof compounds with multiple constituents ubiquitous inlanguages as different as German Mandarin and EstonianFurthermore in actual utterances words undergo assimila-tion at their boundaries and hence a crucial test case forthe discriminative lexicon is to model the production andcomprehension of words in multi-word utterances For ourproduction model we have also completely ignored stresssyllabification and pitch which however likely has renderedthe production task more difficult than necessary

A challenge for discriminative morphology is the devel-opment of proper semantic matrices For languages withrich morphology such as Estonian current morphologicalparsers will identify case-inflected words as nominativesgenitives or partitives (among others) but these labelsfor inflectional forms do not correspond to the semanticfunctions encoded in these forms (see eg [178 179]) What

is required are computational tools that detect the appropriateinflectional lexomes for these words in the sentences orutterances in which they are embedded

A related problem specifically for the speech productionmodel is how to predict the strongly reduced word formsthat are rampant in conversational speech [111] Our auditorycomprehension network F119886 is confronted with reduced formsin training and the study of Arnold et al [18] indicates thatpresent identification accuracymay approximate human per-formance for single-word recognition Whereas for auditorycomprehension we are making progress what is required forspeech production is a much more detailed understanding ofthe circumstances under which speakers actually use reducedword forms The factors driving reduction may be in partcaptured by the semantic vectors but we anticipate thataspects of the unfolding discourse play an important role aswell which brings us back to the unresolved question of howto best model not isolated words but words in context

An important challenge for our theory which formalizesincremental implicit learning is how to address and modelthe wealth of explicit knowledge that speakers have abouttheir language We play with the way words sound in poetrywe are perplexed by wordsrsquo meanings and reinterpret themso that they make more sense (eg the folk etymology [180]reflected in modern English crawfish a crustacean and nota fish originally Middle English crevis compare ecrevissein French) we teach morphology in schools and we evenfind that making morphological relations explicit may bebeneficial for aphasic patients [181] The rich culture of worduse cannot but interact with the implicit system but howexactly this interaction unfolds andwhat its consequences areat both sides of the consciousunconscious divide is presentlyunclear

Although many questions remain our model is remark-ably successful in capturing many aspects of lexical andmorphological processing in both comprehension and pro-duction Since themodel builds on a combination of linguisti-callymotivated lsquosmartrsquo low-level representations for form andmeaning and large but otherwise straightforward andmathe-matically well-understood two-layer linear networks (to beclear networks without any hidden layers) the conclusionseems justified that algorithmically the ldquomental lexiconrdquomaybe much simpler than previously thought

Appendix

A Vectors Matrices and MatrixMultiplication

Figure 12 presents two data points 119886 and 119887 with coordinates(3 4) and (-2 -3) Arrows drawn from the origin to thesepoints are shown in blue with a solid and dashed linerespectively We refer to these points as vectors Vectors aredenoted with lower case letters in bold and we place the 119909coordinate of a point next to the 119910 coordinate

a = (3 4) b = (minus2 minus3) (A1)

32 Complexity

minus4 minus2 0 2 4

minus4

minus2

02

4

x

y

a

b

x

y

Figure 12 Points 119886 and 119887 can be transformed into points 119901 and 119902 bya linear transformation

We can represent a and b jointly by placing them next to eachother in a matrix For matrices we use capital letters in boldfont

A = ( 3 4minus2 minus3) (A2)

In addition to the points 119886 and 119887 Figure 12 also shows twoother datapoints 119909 and 119910 The matrix for these two points is

B = (minus5 24 minus1) (A3)

Let us assume that points 119886 and 119887 represent the forms oftwo words 1205961 and 1205962 and that the points 119909 and 119910 representthe meanings of these two words (below we discuss in detailhow exactly wordsrsquo forms and meanings can be representedas vectors of numbers) As morphology is the study of therelation between wordsrsquo forms and their meanings we areinterested in how to transform points 119886 and b into points119909 and 119910 and likewise in how to transform points 119909 and119910 back into points 119886 and 119887 The first transformation isrequired for comprehension and the second transformationfor production

We begin with the transformation for comprehensionFormally we have matrix A which we want to transforminto matrix B A way of mapping A onto B is by means ofa linear transformation using matrix multiplication For 2 times2 matrices matrix multiplication is defined as follows(1198861 11988621198871 1198872) (1199091 11990921199101 1199102) = (11988611199091 + 11988621199101 11988611199092 + 1198862119910211988711199091 + 11988721199101 11988711199092 + 11988721199102) (A4)

Such a mapping of A onto B is given by a matrix F

F = ( 1 2minus2 minus1) (A5)

and it is straightforward to verify that indeed( 3 4minus2 minus3) ( 1 2minus2 minus1) = (minus5 24 minus1) (A6)

Using matrix notation we can write

AF = B (A7)

Given A and B how do we obtain F The answeris straightforward but we need two further concepts theidentity matrix and the matrix inverse For multiplication ofnumbers the identity multiplication is multiplication with 1For matrices multiplication with the identity matrix I leavesthe locations of the points unchangedThe identitymatrix hasones on the main diagonal and zeros elsewhere It is easilyverified that indeed(1 00 1) (1199091 11990921199101 1199102) = (1199091 11990921199101 1199102) (1 00 1) = (1199091 11990921199101 1199102) (A8)

For numbers the inverse of multiplication with 119904 is dividingby 119904 (1119904 ) (119904119909) = 119909 (A9)

For matrices the inverse of a square matrix X is that matrixXminus1 such that their product is the identity matrix

Xminus1X = XXminus1 = I (A10)

For nonsquare matrices the inverse Yminus1 is defined such that

Yminus1 (YX) = (XY) 119884minus1 = X (A11)

We find the square matrix F that maps the square matrices Aonto B as follows

AF = B

Aminus1AF = Aminus1B

IF = Aminus1B

F = Aminus1B(A12)

Since for the present example A and its inverse happen to beidentical (Aminus1 = A) we obtain (A6)

Fmaps wordsrsquo form vectors onto wordsrsquo semantic vectorsLet us now consider the reverse and see how we can obtainwordsrsquo form vectors from wordsrsquo semantic vectors That isgiven B we want to find that matrix G which maps B ontoA ie

BG = A (A13)

Solving this equation proceeds exactly as above

BG = A

Bminus1BG = Bminus1A

IG = Bminus1A

G = Bminus1A(A14)

Complexity 33

The inverse of B is

Bminus1 = (13 2343 53) (A15)

and hence we have

(13 2343 53) ( 3 4minus2 minus3) = (minus13 minus2323 13 ) = G (A16)

The inverse of a matrix needs not exist A square matrixthat does not have an inverse is referred to as a singularmatrix Almost all matrices with which we work in theremainder of this study are singular For singular matricesan approximation of the inverse can be used such as theMoore-Penrose generalized inverse (In R the generalizedinverse is available in theMASS package function ginv Thenumeric library that is used by ginv is LAPACK availableat httpwwwnetliborglapack) In this study wedenote the generalized inverse of a matrix X by X1015840

The examples presented here are restricted to 2 times 2matrices but themathematics generalize tomatrices of largerdimensions When multiplying two matrices X and Y theonly constraint is that the number of columns ofX is the sameas the number of rows ofYThe resultingmatrix has the samenumber of rows as X and the same number of columns as Y

In the present study the rows of matrices represent wordsand columns the features of thesewords For instance the rowvectors of A can represent wordsrsquo form features and the rowvectors of and B the corresponding semantic features Thefirst row vector of A is mapped onto the first row vector ofB as follows (3 4) ( 1 2minus2 minus1) = (minus5 2) (A17)

A transformationmatrix such as F can be represented as atwo-layer networkThe network corresponding to F is shownin Figure 13 When the input vector (3 4) is presented to thenetwork the nodes 1198941 and 1198942 are set to 3 and 4 respectively Toobtain the activations 1199001 = minus5 and 1199002 = 2 for the output vector(minus5 2) we cumulate the evidence at the input weighted bythe connection strengths specified on the edges of the graphThus for 1199001 we obtain the value 3 times 1 + 4 times minus2 = minus5 and for1199002 we have 3 times 2 + 4 times minus1 = 2 In other words the networkmaps the input vector (3 4) onto the output vector (minus5 2)

An important property of linear maps is that they areproductive We can present a novel form vector say

s = (2 2) (A18)

to the network and it will map this vector onto a novelsemantic vector Using matrices this new semantic vector isstraightforwardly calculated(2 2) ( 1 2minus2 minus1) = (minus2 2) (A19)

1 2 minus2 minus1

i1 i2

o1 o2

semantic vectors come out here

form features go in here

connection weightsare the transformationmatrix

Figure 13 The linear network corresponding to the transformationmatrix F There are two input nodes 1198941 and 1198942 (in blue)and twooutput nodes 1199001 and 1199002 in red When the input vector (3 4) ispresented to the network the value at 1199001 is 3 times 1 + 4 times minus2 = minus5and that at 1199002 is 3 times 2 + 4 times minus1 = 2 The network maps the point(3 4) onto the point (minus5 2)B Graph-Based Triphone Sequencing

The hypothesis underlying graph-based triphone sequencingis that in the directed graph that has triphones as verticesand an edge between any two vertices that overlap (iesegments 2 and 3 of triphone 1 are identical to segments 1and 2 of triphone 2) the path from a wordrsquos initial triphone(the triphone starting with ) to a wordrsquos final triphone (thetriphone ending with a ) is the path receiving the bestsupport of all possible paths from the initial to the finaltriphone Unfortunately the directed graph containing allvertices and edges is too large tomake computations tractablein a reasonable amount of computation time The followingalgorithm is a heuristic algorithm that first collects potentiallyrelevant vertices and edges and then calculates all pathsstarting fromany initial vertex using theall simple pathsfunction from the igraph package [127] As a first step allvertices that are supported by the stem and whole wordwith network support exceeding 01 are selected To theresulting set those vertices are added that are supported bythe affix which is conditional on these vertices having anetwork support exceeding 095 A directed graph can nowbe constructed and for each initial vertex all paths startingat these vertices are calculated The subset of those pathsreaching a final vertex is selected and the support for eachpath is calculated As longer paths trivially will have greatersupport the support for a path is weighted for its lengthsimply by dividing the raw support by the path length Pathsare ranked by weighted path support and the path withmaximal support is selected as the acoustic image drivingarticulation

It is possible that no path from an initial vertex to a finalvertex is found due to critical boundary triphones not beinginstantiated in the data on which the model was trainedThis happens when the model is assessed on novel inflected

34 Complexity

nobo

dyso

met

hing

ever

ythi

ngno

thin

gev

eryo

nean

ythi

ngan

yone

som

eone

hers

him his

via

than I

exce

ptou

tside

thei

rsw

hoev

erth

ence

min

ein

side

amon

gst its

thei

rou

rm

yyo

uyo

ur her

she

your

selv

esw

hate

ver us we it

they he

them m

em

yself

him

self

hers

elfou

rselv

esyo

urse

lfits

elfth

emse

lves

abou

tw

ithou

tw

ithin

beyo

ndam

ong

betw

een

thro

ugho

utdu

ring

since by

from in of

with at fo

r tobe

low

abov

e up offdo

wn

upon on

agai

nst

tow

ard

tow

ards

arou

ndne

aral

ong

unde

rbe

hind

besid

eth

roug

hon

toin

toov

erac

ross

nobodysomethingeverythingnothingeveryoneanythinganyonesomeonehershimhisviathanIexceptoutsidetheirswhoeverthencemineinsideamongstitstheirourmyyouyourhersheyourselveswhateverusweittheyhethemmemyselfhimselfherselfourselvesyourselfitselfthemselvesaboutwithoutwithinbeyondamongbetweenthroughoutduringsincebyfrominofwithatfortobelowaboveupoffdownupononagainsttowardtowardsaroundnearalongunderbehindbesidethroughontointooveracross

minus01 0 01 02Value

Color Keyand Histogram

050

015

0025

00C

ount

Figure 14 Heatmap for the correlations of the semantic vectors of pronouns and prepositions Note that quantifier pronouns form a groupof their own that prepositions and the other pronouns form distinguisable groups and that within groups further subclusters are visible egforwe and us she and her and across and over All diagonal elements represent correlations equal to 1 (this is not brought out by the heatmapwhich color-codes diagonal elements with the color for the highest correlation of the range we specified 025)

and derived words under cross-validationTherefore verticesand edges are added to the graph for all nonfinal verticesthat are at the end of any of the paths starting at any initialtriphone Such novel vertices and edges are assigned zeronetwork support Typically paths from the initial vertex tothe final vertex can now be constructed and path support isevaluated as above

For an algorithm that allows triphone paths to includecycles which may be the case in languages with much richer

morphology than English see Baayen et al [85] and for codethe R packageWpmWithLdl described therein

C A Heatmap for Function Words(Pronouns and Prepositions)

The heatmap for the correlations of the semantic vectors ofpronouns and prepositions is presented in Figure 14

Complexity 35

Data Availability

With the exception of the data from theDistributed Little RedHen Lab all primary data are publicly available from the citedsources

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The authors are indebted to Geoff Hollis Jessie Nixon ChrisWestbury and LuisMienhardt for their constructive feedbackon earlier versions of this manuscript This research wassupported by an ERC advancedGrant (no 742545) to the firstauthor

References

[1] G Frege ldquoBegriffsschrift a formula language modeled uponthat of arithmetic for pure thoughtrdquo in From Frege to GodelA Source Book in Mathematical Logic 1879-1931 pp 1ndash82 1967

[2] B RussellAn Inquiry intoMeaning andTruth Allen andUnwinLondon UK 1942

[3] R Montague ldquoThe proper treatment of quantification in ordi-nary englishrdquo in Approaches to Natural Language pp 221ndash242Springer 1973

[4] N Chomsky Syntactic Structures vol 33 Mouton The Hague1957

[5] W J M Levelt Formal Grammars in Linguistics And Psycholin-guistics Volume 1 An Introduction to the Theory of FormalLanguages and Automata Volume 2 Applications in LinguisticTheory Volume 3 Psycholinguistic Applications John BenjaminsPublishing 2008

[6] D F Kleinschmidt and T Florian Jaeger ldquoRobust speechperception Recognize the familiar generalize to the similar andadapt to the novelrdquo Psychological Review vol 122 no 2 pp 148ndash203 2015

[7] L Breiman ldquoRandom forestsrdquo Machine Learning vol 45 no 1pp 5ndash32 2001

[8] C Strobl A-L Boulesteix A Zeileis and T Hothorn ldquoBiasin random forest variable importance measures illustrationssources and a solutionrdquo BMC Bioinformatics vol 8 2007

[9] A Hannun C Case J Casper et al ldquoDeep speechScaling up end-to-end speech recognitionrdquo 2014httpsarxivorgabs14125567

[10] Z Tuske P Golik R Schluter and H Ney ldquoAcoustic modelingwith deep neural networks using raw time signal for lvcsrrdquo inProceedings of the INTERSPEECH pp 890ndash894 2014

[11] J Schmidhuber ldquoDeep learning in neural networks anoverviewrdquo Neural Networks vol 61 pp 85ndash117 2015

[12] T Hannagan J C Ziegler S Dufau J Fagot and J GraingerldquoDeep learning of orthographic representations in baboonsrdquoPLoS ONE vol 9 no 1 2014

[13] J Grainger S Dufau M Montant J C Ziegler and J FagotldquoOrthographic processing in baboons (Papio papio)rdquo Sciencevol 336 no 6078 pp 245ndash248 2012

[14] D Scarf K Boy A U Reinert J Devine O Gunturkun andM Colombo ldquoOrthographic processing in pigeons (columba

livia)rdquo Proceedings of the National Academy of Sciences vol 113no 40 pp 11272ndash11276 2016

[15] M Linke F Broker M Ramscar and R H Baayen ldquoArebaboons learning ldquoorthographicrdquo representations Probablynotrdquo PLoS ONE vol 12 no 8 2017

[16] D H Hubel and T N Wiesel ldquoReceptive fields binocularinteraction and functional architecture in the catrsquos visualcortexrdquo The Journal of Physiology vol 160 pp 106ndash154 1962

[17] R H Baayen and P Hendrix ldquoTwo-layer networks non-linear separation and human learningrdquo in From Semantics toDialectometry Festschrift in honor of JohnNerbonne Tributes 32M Wieling M Kroon G Van Noord and G Bouma Eds pp13ndash22 College Publications 2017

[18] D Arnold F Tomaschek K Sering F Lopez and R H BaayenldquoWords from spontaneous conversational speech can be rec-ognized with human-like accuracy by an error-driven learningalgorithm that discriminates between meanings straight fromsmart acoustic features bypassing the phoneme as recognitionunitrdquo PLoS ONE vol 12 no 4 2017

[19] P Zwitserlood ldquoProcessing and representation of morphologi-cal complexity in native language comprehension and produc-tionrdquo in The Construction of Words G E Booij Ed pp 583ndash602 Springer 2018

[20] G Booij ldquoConstruction morphologyrdquo Language and LinguisticsCompass vol 4 no 7 pp 543ndash555 2010

[21] J P Blevins ldquoWord-based morphologyrdquo Journal of Linguisticsvol 42 no 3 pp 531ndash573 2006

[22] G Stump Inflectional Morphology A Theory of ParadigmStructure Cambridge University Press 2001

[23] R Beard ldquoOn the extent and nature of irregularity in thelexiconrdquo Lingua vol 42 no 4 pp 305ndash341 1977

[24] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1974

[25] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1991

[26] C Hockett ldquoThe origin of speechrdquo Scientific American vol 203pp 89ndash97 1960

[27] M Erelt Estonian Language M Erelt Ed Estonian AcademyPublishers Tallinn Estonia 2003

[28] P H Matthews Grammatical Theory in the United States fromBloomfield to Chomsky Cambridge University Press Cam-bridge 1993

[29] N Chomsky andMHalleTheSound Pattern of English Harperand Row NY USA 1968

[30] T Landauer and S Dumais ldquoA solution to Platorsquos problem Thelatent semantic analysis theory of acquisition induction andrepresentation of knowledgerdquo Psychological Review vol 104 no2 pp 211ndash240 1997

[31] W Weaver ldquoTranslationrdquo in Machine Translation of LanguagesFourteen Essays W N Locke and A D Booth Eds pp 15ndash23MIT Press Cambridge 1955

[32] J R Firth Selected Papers of J R Firth 1952-59 IndianaUniversity Press 1968

[33] K Lund and C Burgess ldquoProducing high-dimensional seman-tic spaces from lexical co-occurrencerdquo Behavior Research Meth-ods Instruments and Computers vol 28 no 2 pp 203ndash2081996

[34] C Shaoul and C Westbury ldquoExploring lexical co-occurrencespace using HiDExrdquo Behavior Research Methods vol 42 no 2pp 393ndash413 2010

36 Complexity

[35] T Mikolov I Sutskever K Chen G Corrado and J DeanldquoDistributed representations of words and phrases and theircompositionalityrdquo inAdvances in Neural Information ProcessingSystems pp 3111ndash3119 2013

[36] A Roelofs ldquoThe WEAVER model of word-form encoding inspeech productionrdquo Cognition vol 64 no 3 pp 249ndash284 1997

[37] W J M Levelt A Roelofs and A S Meyer ldquoA theory of lexicalaccess in speech productionrdquoBehavioral and Brain Sciences vol22 no 1 pp 1ndash38 1999

[38] M Taft ldquoInteractive-activation as a Framework for Under-standing Morphological Processingrdquo Language and CognitiveProcesses vol 9 no 3 pp 271ndash294 1994

[39] R Schreuder and R H Baayen ldquoModeling morphologicalprocessingrdquo inMorphological Aspects of Language Processing LB Feldman Ed pp 131ndash154 Lawrence ErlbaumHillsdale NewJersey USA 1995

[40] B C Love D L Medin and T M Gureckis ldquoSUSTAIN ANetwork Model of Category Learningrdquo Psychological Reviewvol 111 no 2 pp 309ndash332 2004

[41] C J Marsolek ldquoWhat antipriming reveals about primingrdquoTrends in Cognitive Sciences vol 12 no 5 pp 176ndash181 2008

[42] M Ramscar and R Port ldquoCategorization (without categories)rdquoin Handbook of Cognitive Linguistics E Dabrowska and DDivjak Eds pp 75ndash99 De Gruyter Berlin Germany 2015

[43] R A Rescorla and A R Wagner ldquoA theory of Pavlovianconditioning Variations in the effectiveness of reinforcementand nonreinforcementrdquo in Classical Conditioning II CurrentResearch and Theory A H Black and W F Prokasy Eds pp64ndash99 Appleton Century Crofts NY USA 1972

[44] B Widrow and M E Hoff ldquoAdaptive switching circuitsrdquo 1960WESCON Convention Record Part IV pp 96ndash104 1960

[45] R H Baayen R Schreuder and R Sproat ldquoMorphology inthe Mental Lexicon A Computational Model for Visual WordRecognitionrdquo in Lexicon Development for Speech and LanguageProcessing F van Eynde and D Gibbon Eds pp 267ndash291Kluwer Academic Publishers 2000

[46] M Taft ldquoA morphological-decomposition model of lexicalrepresentationrdquo Linguistics vol 26 no 4 pp 657ndash667 1988

[47] MWHarmandM S Seidenberg ldquoComputing themeanings ofwords in reading Cooperative division of labor between visualand phonological processesrdquo Psychological Review vol 111 no3 pp 662ndash720 2004

[48] P C Trimmer J M McNamara A Houston and J A RMarshall ldquoDoes natural selection favour the Rescorla-Wagnerrulerdquo Journal of Theoretical Biology vol 302 pp 39ndash52 2012

[49] R R Miller R C Barnet and N J Grahame ldquoAssessment ofthe Rescorla-Wagner modelrdquo Psychological Bulletin vol 117 no3 pp 363ndash386 1995

[50] L J Kamin ldquoPredictability surprise attention and condition-ingrdquo in Punishment and Aversive Behavior B A Campbell andRMChurch Eds pp 276ndash296Appleton-Century-CroftsNYUSA 1969

[51] R A Rescorla ldquoPavlovian conditioning Itrsquos not what you thinkit isrdquo American Psychologist vol 43 no 3 pp 151ndash160 1988

[52] M Ramscar D Yarlett M Dye K Denny and K ThorpeldquoThe Effects of Feature-Label-Order and Their Implications forSymbolic Learningrdquo Cognitive Science vol 34 no 6 pp 909ndash957 2010

[53] M Ramscar and D Yarlett ldquoLinguistic self-correction in theabsence of feedback A new approach to the logical problem oflanguage acquisitionrdquo Cognitive Science vol 31 no 6 pp 927ndash960 2007

[54] R H Baayen P Milin D F ETHurđevic P Hendrix and MMarelli ldquoAn Amorphous Model for Morphological Process-ing in Visual Comprehension Based on Naive DiscriminativeLearningrdquo Psychological Review vol 118 no 3 pp 438ndash4812011

[55] F M Del Prado Martın R Bertram T Haikio R Schreuderand R H Baayen ldquoMorphological family size in a morphologi-cally rich languageThe case of Finnish compared to Dutch andHebrewrdquo Journal of Experimental Psychology LearningMemoryand Cognition vol 30 pp 1271ndash1278 2004

[56] P Milin D F Durdevic and F M Del Prado Martın ldquoThesimultaneous effects of inflectional paradigms and classes onlexical recognition Evidence from Serbianrdquo Journal of Memoryand Language vol 60 no 1 pp 50ndash64 2009

[57] B K Bergen ldquoThe psychological reality of phonaesthemesrdquoLanguage vol 80 no 2 pp 290ndash311 2004

[58] P Milin L B Feldman M Ramscar P Hendrix and R HBaayen ldquoDiscrimination in lexical decisionrdquo PLoS ONE vol 12no 2 2017

[59] R H Baayen P Milin and M Ramscar ldquoFrequency in lexicalprocessingrdquo Aphasiology vol 30 no 11 pp 1174ndash1220 2016

[60] M Marelli and M Baroni ldquoAffixation in semantic space Mod-eling morpheme meanings with compositional distributionalsemanticsrdquo Psychological Review vol 122 no 3 pp 485ndash5152015

[61] T Sering PMilin andRH Baayen ldquoLanguage comprehensionas amultiple label classification problemrdquo StatisticaNeerlandicapp 1ndash15 2018

[62] R Kaye and RWilson Linear Algebra Oxford University Press1998

[63] S H Ivens and B L Koslin Demands for Reading LiteracyRequire New Accountability Methods Touchstone Applied Sci-ence Associates 1991

[64] T K Landauer P W Foltz and D Laham ldquoAn introduction tolatent semantic analysisrdquoDiscourse Processes vol 25 no 2-3 pp259ndash284 1998

[65] H Schmid ldquoImprovements in Part-of-Speech Tagging with anApplication to Germanrdquo in Proceedings of the ACL SIGDAT-Workshop pp 13ndash25 Dublin Ireland 1995

[66] R H Baayen R Piepenbrock and L Gulikers The CELEXLexical Database (CD-ROM) Linguistic Data ConsortiumUniversity of Pennsylvania Philadelphia PA 1995

[67] J Mitchell and M Lapata ldquoVector-based models of semanticcompositionrdquo in Proceedings of the ACL pp 236ndash244 2008

[68] A Lazaridou M Marelli R Zamparelli and M BaronildquoCompositionally derived representations of morphologicallycomplex words in distributional semanticsrdquo in Proceedings ofthe ACL (1) pp 1517ndash1526 2013

[69] R Cotterell H Schutze and J Eisner ldquoMorphological smooth-ing and extrapolation of word embeddingsrdquo in Proceedings ofthe 54th Annual Meeting of the Association for ComputationalLinguistics vol 1 pp 1651ndash1660 2016

[70] B D Zeller S Pado and J Snajder ldquoTowards semantic valida-tion of a derivational lexiconrdquo in Proceedings of the COLING2014 the 25th International Conference on Computational Lin-guistics Technical Papers pp 1728ndash1739 2014

[71] H Marchand The Categories and Types of Present-Day EnglishWord Formation A Synchronic-Diachronic Approach BeckrsquoscheVerlagsbuchhandlung Munchen Germany 1969

[72] L Bauer EnglishWord-Formation Cambridge University PressCambridge 1983

Complexity 37

[73] I PlagWord-Formation in English CambridgeUniversity PressCambridge 2003

[74] R Beard Lexeme-Morpheme Base Morphology A GeneralTheory of Inflection and Word Formation State University ofNew York Press Albany NY USA 1995

[75] K Geeraert J Newman and R H Baayen ldquoIdiom VariationExperimental Data and a Blueprint of a Computational ModelrdquoTopics in Cognitive Science vol 9 no 3 pp 653ndash669 2017

[76] C Shaoul S Bitschau N Schilling et al ldquondl2 Naive discrim-inative learning an implementation in R R packagerdquo 2015

[77] P Milin D Divjak and R H Baayen ldquoA learning perspectiveon individual differences in skilled reading Exploring andexploiting orthographic and semantic discrimination cuesrdquoJournal of Experimental Psychology Learning Memory andCognition vol 43 no 11 pp 1730ndash1751 2017

[78] M Ramscar P Hendrix C Shaoul P Milin and R H BaayenldquoNonlinear dynamics of lifelong learning the myth of cognitivedeclinerdquo Topics in Cognitive Science vol 6 pp 5ndash42 2014

[79] M Ramscar C C Sun P Hendrix and R H Baayen ldquoTheMis-measurement of Mind Life-Span Changes in Paired-Associate-Learning Scores Reflect the ldquoCostrdquo of Learning Not CognitiveDeclinerdquo Psychological Science vol 28 no 8 pp 1171ndash1179 2017

[80] G desRosiers and D Ivison ldquoPaired associate learning Form 1and form 2 of the Wechsler memory scalerdquo Archives of ClinicalNeuropsychology vol 3 no 1 pp 47ndash67 1988

[81] E Bruni N K Tran andM Baroni ldquoMultimodal distributionalsemanticsrdquo Journal of Artificial Intelligence Research vol 49 pp1ndash47 2014

[82] W N Venables and B D RipleyModern Applied Statistics withS-PLUS Springer NY USA 2002

[83] A B Warriner V Kuperman and M Brysbaert ldquoNorms ofvalence arousal and dominance for 13915 English lemmasrdquoBehavior Research Methods vol 45 no 4 pp 1191ndash1207 2013

[84] D Kastovsky ldquoProductivity in word formationrdquo Linguistics vol24 no 3 pp 585ndash600 1986

[85] R H Baayen Y Chuang and J P Blevins ldquoInflectionalmorphology with linear mappingsrdquoTheMental Lexicon vol 13no 2 pp 232ndash270 2018

[86] Burnage CELEX A Guide for Users Centre for Lexical Infor-mation Nijmegen The Netherlands 1988

[87] C McBride-Chang ldquoModels of Speech Perception and Phono-logical Processing in Readingrdquo Child Development vol 67 no4 pp 1836ndash1856 1996

[88] G C Van Orden H Kloos et al ldquoThe question of phonologyand readingrdquo inThe Science of Reading A Handbook pp 61ndash782005

[89] D Jared and K OrsquoDonnell ldquoSkilled adult readers activate themeanings of high-frequency words using phonology Evidencefrom eye trackingrdquoMemoryampCognition vol 45 no 2 pp 334ndash346 2017

[90] G S Dell ldquoA Spreading-Activation Theory of Retrieval inSentence Productionrdquo Psychological Review vol 93 no 3 pp283ndash321 1986

[91] A Marantz ldquoNo escape from morphemes in morphologicalprocessingrdquo Language and Cognitive Processes vol 28 no 7 pp905ndash916 2013

[92] M Bozic W D Marslen-Wilson E A Stamatakis M HDavis and L K Tyler ldquoDifferentiating morphology formand meaning Neural correlates of morphological complexityrdquoCognitive Neuroscience vol 19 no 9 pp 1464ndash1475 2007

[93] R F Port andA P Leary ldquoAgainst formal phonologyrdquoLanguagevol 81 no 4 pp 927ndash964 2005

[94] J P Blevins ldquoStems and paradigmsrdquo Language vol 79 no 4 pp737ndash767 2003

[95] R Forsyth and D Holmes ldquoFeature-finding for test classifica-tionrdquo Literary and Linguistic Computing vol 11 no 4 pp 163ndash174 1996

[96] H Pham and R H Baayen ldquoVietnamese compounds showan anti-frequency effect in visual lexical decisionrdquo LanguageCognition and Neuroscience vol 30 no 9 pp 1077ndash1095 2015

[97] L Cohen and S Dehaene ldquoVentral and dorsal contributions toword readingrdquo inTheCognitive Neurosciences M S GazzanigaEd pp 789ndash804 Massachusetts Institute of Technology 2009

[98] E Keuleers P Lacey K Rastle and M Brysbaert ldquoThe BritishLexicon Project Lexical decision data for 28730 monosyllabicand disyllabic English wordsrdquo Behavior Research Methods vol44 no 1 pp 287ndash304 2012

[99] R K Wagner J K Torgesen and C A Rashotte ldquoDevel-opment of reading-related phonological processing abilitiesnew evidence of bidirectional causality from a latent variablelongitudinal studyrdquoDevelopmental Psychology vol 30 no 1 pp73ndash87 1994

[100] K F EWong andH-C Chen ldquoOrthographic and phonologicalprocessing in reading Chinese text Evidence from eye fixa-tionsrdquo Language and Cognitive Processes vol 14 no 5-6 pp461ndash480 1999

[101] R L Newman D Jared and C A Haigh ldquoDoes phonologyplay a role when skilled readers read high-frequency wordsEvidence from ERPsrdquo Language and Cognitive Processes vol 27no 9 pp 1361ndash1384 2012

[102] D Jared J Ashby S J Agauas and B A Levy ldquoPhonologicalactivation of word meanings in grade 5 readersrdquo Journal ofExperimental Psychology LearningMemory andCognition vol42 no 4 pp 524ndash541 2016

[103] T Bitan A Kaftory A Meiri-Leib Z Eviatar and O PelegldquoPhonological ambiguity modulates resolution of semanticambiguity during reading An fMRI study of Hebrewrdquo Neu-ropsychology vol 31 no 7 pp 759ndash777 2017

[104] D Jared and S Bainbridge ldquoReading homophone puns Evi-dence from eye trackingrdquo Canadian Journal of ExperimentalPsychology vol 71 no 1 pp 2ndash13 2017

[105] S Amenta M Marelli and S Sulpizio ldquoFrom sound tomeaning Phonology-to-Semantics mapping in visual wordrecognitionrdquo Psychonomic Bulletin amp Review vol 24 no 3 pp887ndash893 2017

[106] M Perrone-Bertolotti J Kujala J R Vidal et al ldquoHow silent issilent reading intracerebral evidence for top-down activationof temporal voice areas during readingrdquo The Journal of Neuro-science vol 32 no 49 pp 17554ndash17562 2012

[107] B Yao P Belin andC Scheepers ldquoSilent reading of direct versusindirect speech activates voice-selective areas in the auditorycortexrdquo Journal of Cognitive Neuroscience vol 23 no 10 pp3146ndash3152 2011

[108] M Coltheart B Curtis P Atkins and M Haller ldquoMod-els of Reading Aloud Dual-Route and Parallel-Distributed-Processing Approachesrdquo Psychological Review vol 100 no 4pp 589ndash608 1993

[109] M Coltheart ldquoModeling reading The dual-route approachrdquo inThe Science of Reading A Handbook pp 6ndash23 2005

38 Complexity

[110] M Pitt K Johnson E Hume S Kiesling and W RaymondldquoThe Buckeye corpus of conversational speech Labeling con-ventions and a test of transcriber reliabilityrdquo Speech Communi-cation vol 45 no 1 pp 89ndash95 2005

[111] K Johnson ldquoMassive reduction in conversational AmericanEnglishrdquo in Proceedings of the Spontaneous Speech Data andAnalysis Proceedings of the 1st Session of the 10th InternationalSymposium pp 29ndash54 The National International Institute forJapanese Language Tokyo Japan 2004

[112] C Phillips ldquoLevels of representation in the electrophysiology ofspeech perceptionrdquo Cognitive Science vol 25 no 5 pp 711ndash7312001

[113] R L Diehl A J Lotto and L L Holt ldquoSpeech perceptionrdquoAnnual Review of Psychology vol 55 pp 149ndash179 2004

[114] D Norris and J M McQueen ldquoShortlist B a Bayesian model ofcontinuous speech recognitionrdquo Psychological Review vol 115no 2 pp 357ndash395 2008

[115] S Hawkins ldquoRoles and representations of systematic finephonetic detail in speech understandingrdquo Journal of Phoneticsvol 31 no 3-4 pp 373ndash405 2003

[116] CCucchiarini andH Strik ldquoAutomatic Phonetic TranscriptionAn overviewrdquo in Proceedings of the 15th ICPhS pp 347ndash350Barcelona Spain 2003

[117] K JohnsonTheAuditoryPerceptual Basis for Speech Segmenta-tion Ohio State University Working Papers in Linguistics 1997

[118] H Fletcher ldquoAuditory patternsrdquo Reviews of Modern Physics vol12 no 1 pp 47ndash65 1940

[119] D Arnold ldquoAcousticndlcoder Coding sound files for use withndl R package version 101rdquo 2017

[120] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2016

[121] J Pickett and I Pollack ldquoIntelligibility of excerpts from fluentspeech Effects of rate of utterance and duration of excerptrdquoLanguage and Speech vol 6 pp 151ndash164 1963

[122] L Shockey ldquoPerception of reduced forms by non-native speak-ers of Englishrdquo in Sound Patterns of Spontaneous Speech DDuez Ed pp 97ndash100 ESCA Aix 1998

[123] M Ernestus H Baayen and R Schreuder ldquoThe recognition ofreduced word formsrdquo Brain and Language vol 81 no 1-3 pp162ndash173 2002

[124] R H Baayen C Shaoul J Willits and M Ramscar ldquoCom-prehension without segmentation a proof of concept withnaive discriminative learningrdquo Language Cognition and Neu-roscience vol 31 no 1 pp 106ndash128 2016

[125] C P Browman and L Goldstein ldquoArticulatory phonology Anoverviewrdquo Phonetica vol 49 no 3-4 pp 155ndash180 1992

[126] J A Tourville and F H Guenther ldquoThe DIVA model A neuraltheory of speech acquisition and productionrdquo Language andCognitive Processes vol 26 no 7 pp 952ndash981 2011

[127] G Csardi and T Nepusz ldquoThe igraph software package forcomplex network researchrdquo InterJournal Complex Systems vol1695 no 5 pp 1ndash9 2006

[128] M Ernestus and R H Baayen ldquoPredicting the unpredictableInterpreting neutralized segments in Dutchrdquo Language vol 79no 1 pp 5ndash38 2003

[129] R Weingarten G Nottbusch and U Will ldquoMorphemes sylla-bles and graphemes in written word productionrdquo inMultidisci-plinary Approaches to Speech Production T Pechmann and CHabel Eds pp 529ndash572 Mouton de Gruyter Berlin Germany2004

[130] R Bertram F E Toslashnnessen S Stromqvist J Hyona andP Niemi ldquoCascaded processing in written compound wordproductionrdquo Frontiers in Human Neuroscience vol 9 pp 1ndash102015

[131] T Cho ldquoEffects of morpheme boundaries on intergesturaltiming Evidence from Koreanrdquo Phonetica vol 58 no 3 pp129ndash162 2001

[132] S N Wood Generalized Additive Models Chapman ampHallCRC NY USA 2017

[133] M Seidenberg ldquoSublexical structures in visual word recogni-tion Access units or orthographic redundancyrdquo in Attentionand Performance XII M Coltheart Ed pp 245ndash264 LawrenceErlbaum Associates Hove 1987

[134] J M McQueen ldquoSegmentation of continuous speech usingphonotacticsrdquo Journal of Memory and Language vol 39 no 1pp 21ndash46 1998

[135] J B Hay Causes and Consequences of Word Structure Rout-ledge NY USA London UK 2003

[136] J B Hay ldquoFrom speech perception to morphology Affix-ordering revisitedrdquo Language vol 78 pp 527ndash555 2002

[137] J B Hay and R H Baayen ldquoPhonotactics parsing and produc-tivityrdquo Italian Journal of Linguistics vol 15 no 1 pp 99ndash1302003

[138] J Hay J Pierrehumbert and M Beckman ldquoSpeech perceptionwell-formedness and the statistics of the lexiconrdquo Papers inlaboratory phonology VI pp 58ndash74 2004

[139] M Ferro C Marzi and V Pirrelli ldquoA self-organizing modelof word storage and processing implications for morphologylearningrdquo Lingue e Linguaggio vol 10 no 2 pp 209ndash226 2011

[140] F ChersiM Ferro G Pezzulo andV Pirrelli ldquoTopological self-organization and prediction learning support both action andlexical chains in the brainrdquo Topics in Cognitive Science vol 6no 3 pp 476ndash491 2014

[141] V Pirrelli M Ferro and C Marzi ldquoComputational complexityof abstractive morphologyrdquo in Understanding and MeasuringMorphological Complexity M Bearman D Brown and G GCorbett Eds pp 141ndash166 Oxford University Press 2015

[142] C Marzi M Ferro and V Pirrelli ldquoIs inflectional irregularitydysfunctional to human processingrdquo in Abstract Booklet TheMental Lexicon 2018 V Kuperman Ed University of AlbertaEdmonton 2018

[143] E Keuleers M Stevens P Mandera and M Brysbaert ldquoWordknowledge in the crowd Measuring vocabulary size and wordprevalence in a massive online experimentrdquo The QuarterlyJournal of Experimental Psychology vol 68 no 8 pp 1665ndash16922015

[144] M Ernestus Voice Assimilation And Segment Reduction inCasual Dutch A Corpus-Based Study of the Phonology-PhoneticsInterface LOT Utrecht the Netherlands 2000

[145] R Kemps M Ernestus R Schreuder and H Baayen ldquoProcess-ing reducedword formsThe suffix restoration effectrdquoBrain andLanguage vol 90 no 1-3 pp 117ndash127 2004

[146] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) vol 1 pp 886ndash893 June 2005

[147] G Hickok ldquoThe architecture of speech production and the roleof the phoneme in speech processingrdquo Language Cognition andNeuroscience vol 29 no 1 pp 2ndash20 2014

[148] B V Tucker M Sims and R H Baayen Opposing Forceson Acoustic Duration Manuscript University of Alberta andUniversity of Tubingen 2018

Complexity 39

[149] I Plag J Homann and G Kunter ldquoHomophony and mor-phology The acoustics of word-final S in Englishrdquo Journal ofLinguistics vol 53 no 1 pp 181ndash216 2017

[150] F Tomaschek I Plag M Ernestus and R H BaayenModeling the Duration of Word-Final s in English withNaive Discriminative Learning Manuscript University ofSiegenTubingenNijmegen 2018

[151] R E Kalman ldquoA new approach to linear filtering and predictionproblemsrdquo Journal of Fluids Engineering vol 82 no 1 pp 35ndash451960

[152] B Russell ldquoOn denotingrdquo Mind vol 14 no 56 pp 479ndash4931905

[153] N Hornstein Logical Form FromGB toMinimalism Blackwell1995

[154] R Jackendoff Semantic StructuresMITPress Cambridge 1990[155] J J McCarthy ldquoA prosodic theory of non-concatenative mor-

phologyrdquo Linguistic Inquiry vol 12 pp 373ndash418 1981[156] A Ussishkin ldquoA Fixed Prosodic Theory of Nonconcatenative

Templaticmorphologyrdquo Natural Language amp Linguistic Theoryvol 23 no 1 pp 169ndash218 2005

[157] A Ussishkin ldquoAffix-favored contrast inequity and psycholin-guistic grounding for non-concatenative morphologyrdquo Mor-phology vol 16 no 1 pp 107ndash125 2006

[158] RW Young andWMorganTheNavajo Language A Grammarand Colloquial Dictionary University of New Mexico Press1980

[159] S C Levinson and A Majid ldquoThe island of time Yelı Dnye thelanguage of Rossel islandrdquo Frontiers in psychology vol 4 2013

[160] S C Levinson ldquoThe language of space in Yelı Dnyerdquo inGrammars of Space Explorations in Cognitive Diversity pp 157ndash203 Cambridge University Press 2006

[161] E Shafaei-Bajestan and R H Baayen ldquoWide Learning forAuditory Comprehensionrdquo in Proceedings of the Interspeech2018 2018

[162] J Hay and K Drager ldquoStuffed toys and speech perceptionrdquoLinguistics vol 48 no 4 pp 865ndash892 2010

[163] V Henri and C Henri ldquoOn our earliest recollections ofchildhoodrdquo Psychological Review vol 2 pp 215-216 1895

[164] S Freud ldquoChildhood and concealing memoriesrdquo in The BasicWritings of Sigmund Freud The Modern Library NY USA19051953

[165] P J Bauer and M Larkina ldquoThe onset of childhood amnesiain childhood A prospective investigation of the course anddeterminants of forgetting of early-life eventsrdquoMemory vol 22no 8 pp 907ndash924 2014

[166] M S Seidenberg and L M Gonnerman ldquoExplaining deriva-tional morphology as the convergence of codesrdquo Trends inCognitive Sciences vol 4 no 9 pp 353ndash361 2000

[167] M Coltheart K Rastle C Perry R Langdon and J ZieglerldquoDRC a dual route cascaded model of visual word recognitionand reading aloudrdquoPsychological Review vol 108 no 1 pp 204ndash256 2001

[168] M V Butz and E F Kutter How the Mind Comes into BeingIntroducing Cognitive Science from a Functional and Computa-tional Perspective Oxford University Press 2016

[169] M W Harm and M S Seidenberg ldquoPhonology readingacquisition and dyslexia Insights from connectionist modelsrdquoPsychological Review vol 106 no 3 pp 491ndash528 1999

[170] L M Gonnerman M S Seidenberg and E S AndersenldquoGraded semantic and phonological similarity effects in prim-ing Evidence for a distributed connectionist approach to

morphologyrdquo Journal of Experimental Psychology General vol136 no 2 pp 323ndash345 2007

[171] H Zhao O Gallo I Frosio and J Kautz ldquoLoss Functions forImage Restoration With Neural Networksrdquo IEEE Transactionson Computational Imaging vol 3 no 1 pp 47ndash57 2017

[172] C Clarke and P Luce ldquoPerceptual adaptation to speakercharacteristics Vot boundaries in stop voicing categorizationrdquoin Proceedings of the ISCA Workshop on Plasticity in SpeechPerception 2005

[173] DNorris JMMcQueen andACutler ldquoPerceptual learning inspeechrdquo Cognitive Psychology vol 47 no 2 pp 204ndash238 2003

[174] A Schweitzer and N Lewandowski ldquoSocial factors in conver-gence of f1 and f2 in spontaneous speechrdquo in Proceedings ofthe 10th International Seminar on Speech Production CologneGermany 2014

[175] M J Pickering and S Garrod ldquoToward a mechanistic psychol-ogy of dialoguerdquo Behavioral and Brain Sciences vol 27 no 2 pp169ndash190 2004

[176] C M Clarke and M F Garrett ldquoRapid adaptation to foreign-accented Englishrdquo The Journal of the Acoustical Society ofAmerica vol 116 no 6 pp 3647ndash3658 2004

[177] W J Levelt ldquoDie konnektionistische moderdquo Sprache UndKognition vol 10 no 2 pp 61ndash72 1991

[178] A Kostic ldquoInformational load constraints on processinginflected morphologyrdquo in Morphological Aspects of LanguageProcessing L B Feldman Ed Lawrence Erlbaum Inc Publish-ers New Jersey USA 1995

[179] J P BlevinsWord andParadigmMorphology OxfordUniversityPress 2016

[180] E Forstemann Uber Deutsche volksetymologie Zeitschrift furvergleichende Sprachforschung auf dem Gebiete des DeutschenGriechischen und Lateinischen 1852

[181] K Nault Morphological Therapy Protocal [Ph D thesis] Uni-versity of Alberta Edmonton 2010

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 20: The Discriminative Lexicon: A Unified Computational Model ...downloads.hindawi.com/journals/complexity/2019/4895891.pdf · as developed within computational linguistics and com-puter

20 Complexity

(C119879C)minus1 (C119879C) F = (C119879C)minus1 C119879SF = (C119879C)minus1 C119879S

(17)

In this way matrix inversion is required for a much smallermatrixC119879C which is a square matrix of size 40639 times 40639

To evaluate model performance we compared the esti-mated semantic vectors with the targeted semantic vectorsusing the Pearson correlation We therefore calculated the131673 times 131673 correlation matrix for all possible pairs ofestimated and targeted semantic vectors Precisionwas calcu-lated by comparing predicted vectors with the gold standardprovided by the targeted vectors Recognition was defined tobe successful if the correlation of the predicted vector withthe targeted gold vector was the highest of all the pairwisecorrelations of this predicted vector with any of the other goldsemantic vectors Precision defined as the proportion of cor-rect recognitions divided by the total number of audio tokenswas at 3361 For the correctly identified words the meancorrelation was 072 and for the incorrectly identified wordsit was 055 To place this in perspective a naive discrimi-nation model with discrete lexomes as output performed at12 and a deep convolution network Mozilla DeepSpeech(httpsgithubcommozillaDeepSpeech based onHannun et al [9]) performed at 6 The low performanceofMozilla DeepSpeech is due primarily due to its dependenceon a languagemodelWhenpresentedwith utterances insteadof single words it performs remarkably well

44 Discussion A problem with naive discriminative learn-ing that has been puzzling for a long time is that measuresbased on ndl performed well as predictors of processingtimes [54 58] whereas accuracy for lexical decisions waslow An accuracy of 27 for a model performing an 11480-classification task is perhaps reasonable but the lack ofprecision is unsatisfactory when the goal is to model humanvisual lexicality decisions Bymoving fromndl to ldlmodelaccuracy is substantially improved (to 59) At the sametime predictions for reaction times improved considerablyas well As lexicality decisions do not require word identifica-tion further improvement in predicting decision behavior isexpected to be possible by considering not only whether thepredicted semantic is closest to the targeted vector but alsomeasures such as how densely the space around the predictedsemantic vector is populated

Accuracy for auditory comprehension is lower for thedata we considered above at around 33 Interestingly aseries of studies indicates that recognizing isolated wordstaken out of running speech is a nontrivial task also forhuman listeners [121ndash123] Correct identification by nativespeakers of 1000 randomly sampled word tokens from aGerman corpus of spontaneous conversational speech rangedbetween 21 and 44 [18] For both human listeners andautomatic speech recognition systems recognition improvesconsiderably when words are presented in their naturalcontext Given that ldl with FBSFs performs very wellon isolated word recognition it seems worth investigating

further whether the present approach can be developed into afully-fledged model of auditory comprehension that can takefull utterances as input For a blueprint of how we plan toimplement such a model see Baayen et al [124]

5 Speech Production

This section examines whether we can predict wordsrsquo formsfrom the semantic vectors of S If this is possible for thepresent dataset with reasonable accuracy we have a proofof concept that discriminative morphology is feasible notonly for comprehension but also for speech productionThe first subsection introduces the computational imple-mentation The next subsection reports on the modelrsquosperformance which is evaluated first for monomorphemicwords then for inflected words and finally for derivedwords Section 53 provides further evidence for the pro-duction network by showing that as the support from thesemantics for the triphones becomes weaker the amount oftime required for articulating the corresponding segmentsincreases

51 Computational Implementation For a productionmodelsome representational format is required for the output thatin turn drives articulation In what follows we make useof triphones as output features Triphones capture part ofthe contextual dependencies that characterize speech andthat render problematic the phoneme as elementary unit ofa phonological calculus [93] Triphones are in many waysnot ideal in that they inherit the limitations that come withdiscrete units Other output features structured along thelines of gestural phonology [125] or time series of movementsof key articulators registered with electromagnetic articulog-raphy or ultrasound are on our list for further explorationFor now we use triphones as a convenience construct andwe will show that given the support from the semantics forthe triphones the sequence of phones can be reconstructedwith high accuracy As some models of speech productionassemble articulatory targets from phone segments (eg[126]) the present implementation can be seen as a front-endfor this type of model

Before putting the model to the test we first clarify theway the model works by means of our toy lexicon with thewords one two and three Above we introduced the semanticmatrix S (13) which we repeat here for convenience

S = one two threeonetwothree

( 100201031001

040110 ) (18)

as well as an C indicator matrix specifying which triphonesoccur in which words (12) As in what follows this matrixspecifies the triphones targeted for productionwe henceforthrefer to this matrix as the Tmatrix

Complexity 21

T

= wV wVn Vn tu tu Tr Tri rionetwothree

( 100100

100010

010001

001001 ) (19)

For production our interest is in thematrixG that transformsthe row vectors of S into the row vectors of T ie we need tosolve

SG = T (20)

Given G we can predict for any semantic vector s the vectorof triphones t that quantifies the support for the triphonesprovided by s simply by multiplying s with G

t = sG (21)

As before the transformation matrix G is straightforwardto estimate Let S1015840 denote the Moore-Penrose generalizedinverse of S Since

S1015840SG = S1015840T (22)

we have

G = S1015840T (23)

Given G we can predict the triphone matrix T from thesemantic matrix S

SG = T (24)

For the present example the inverse of S and S1015840 is

S1015840 = one two threeonetwothree

( 110minus021minus009minus029107minus008

minus041minus002104 ) (25)

and the transformation matrix G is

G = wV wVn Vn tu tu Tr Tri ri

onetwothree

( 110minus021minus009110minus021minus009

110minus021minus009minus029107minus008

minus029107minus008minus041minus002104

minus041minus002104minus041minus002104 ) (26)

For this simple example T is virtually identical to T Forrealistic data T will not be identical to T but will be anapproximation of it that is optimal in the least squares senseThe triphones with the strongest support are expected to bethe most likely to be the triphones defining a wordrsquos form

We made use of the S matrix which we derived fromthe tasa corpus as described in Section 3 The majority ofcolumns of the 23561 times 23561 matrix S show very smalldeviations from zero and hence are uninformative As beforewe reduced the number of columns of S by removing columnswith very low variance Here one option is to remove allcolumns with a variance below a preset threshold 120579 Howeverto avoid adding a threshold as a free parameter we setthe number of columns retained to the number of differenttriphones in a given dataset a number which is around 119899 =4500 In summary S denotes a 119908times119899 matrix that specifies foreach of 119908 words an 119899-dimensional semantic vector

52 Model Performance

521 Performance on Monolexomic Words We first exam-ined model performance on a dataset comprising monolex-omicwords that did not carry any inflectional exponentsThisdataset of 3987words comprised three irregular comparatives

(elder less andmore) two irregular superlatives (least most)28 irregular past tense forms and one irregular past participle(smelt) For this set of words we constructed a 3987 times 4446matrix of semantic vectors S119898 (a submatrix of the S matrixintroduced above in Section 3) and a 3987 times 4446 triphonematrix T119898 (a submatrix of T) The number of colums of S119898was set to the number of different triphones (the columndimension of T119898) Those column vectors of S119898 were retainedthat had the 4446 highest variances We then estimated thetransformation matrix G and used this matrix to predict thetriphones that define wordsrsquo forms

We evaluated model performance in two ways First weinspected whether the triphones with maximal support wereindeed the targeted triphones This turned out to be the casefor all words Targeted triphones had an activation valueclose to one and nontargeted triphones an activation closeto zero As the triphones are not ordered we also investi-gated whether the sequence of phones could be constructedcorrectly for these words To this end we set a thresholdof 099 extracted all triphones with an activation exceedingthis threshold and used the all simple paths (a path issimple if the vertices it visits are not visited more than once)function from the igraph package [127] to calculate all pathsstarting with any left-edge triphone in the set of extractedtriphones From the resulting set of paths we selected the

22 Complexity

longest path which invariably was perfectly aligned with thesequence of triphones that defines wordsrsquo forms

We also evaluated model performance with a secondmore general heuristic algorithm that also makes use of thesame algorithm from graph theory Our algorithm sets up agraph with vertices collected from the triphones that are bestsupported by the relevant semantic vectors and considers allpaths it can find that lead from an initial triphone to a finaltriphoneThis algorithm which is presented inmore detail inthe appendix and which is essential for novel complex wordsproduced the correct form for 3982 out of 3987 words Itselected a shorter form for five words int for intent lin forlinnen mis for mistress oint for ointment and pin for pippinThe correct forms were also found but ranked second due toa simple length penalty that is implemented in the algorithm

From these analyses it is clear that mapping nearly4000 semantic vectors on their corresponding triphone pathscan be accomplished with very high accuracy for Englishmonolexomic words The question to be addressed nextis how well this approach works for complex words Wefirst address inflected forms and limit ourselves here to theinflected variants of the present set of 3987 monolexomicwords

522 Performance on Inflected Words Following the classicdistinction between inflection and word formation inflectedwords did not receive semantic vectors of their own Never-theless we can create semantic vectors for inflected words byadding the semantic vector of an inflectional function to thesemantic vector of its base However there are several ways inwhich themapping frommeaning to form for inflectedwordscan be set up To explain this we need some further notation

Let S119898 and T119898 denote the submatrices of S and T thatcontain the semantic and triphone vectors of monolexomicwords Assume that a subset of 119896 of these monolexomicwords is attested in the training corpus with inflectionalfunction 119886 Let T119886 denote the matrix with the triphonevectors of these inflected words and let S119886 denote thecorresponding semantic vectors To obtain S119886 we take thepertinent submatrix S119898119886 from S119898 and add the semantic vectors119886 of the affix

S119886 = S119898119886 + i⨂ s119886 (27)

Here i is a unit vector of length 119896 and ⨂ is the generalizedKronecker product which in (27) stacks 119896 copies of s119886 row-wise As a first step we could define a separate mapping G119886for each inflectional function 119886

S119886G119886 = T119886 (28)

but in this set-up learning of inflected words does not benefitfrom the knowledge of the base words This can be remediedby a mapping for augmented matrices that contain the rowvectors for both base words and inflected words[S119898

S119886]G119886 = [T119898

T119886] (29)

The dimension of G119886 (length of semantic vector by lengthof triphone vector) remains the same so this option is notmore costly than the preceding one Nevertheless for eachinflectional function a separate large matrix is required Amuch more parsimonious solution is to build augmentedmatrices for base words and all inflected words jointly

[[[[[[[[[[S119898S1198861S1198862S119886119899

]]]]]]]]]]G = [[[[[[[[[[

T119898T1198861T1198862T119886119899

]]]]]]]]]](30)

The dimension of G is identical to that of G119886 but now allinflectional functions are dealt with by a single mappingIn what follows we report the results obtained with thismapping

We selected 6595 inflected variants which met the cri-terion that the frequency of the corresponding inflectionalfunction was at least 50 This resulted in a dataset with 91comparatives 97 superlatives 2401 plurals 1333 continuousforms (eg walking) 859 past tense forms and 1086 formsclassed as perfective (past participles) as well as 728 thirdperson verb forms (eg walks) Many forms can be analyzedas either past tenses or as past participles We followed theanalyses of the treetagger which resulted in a dataset inwhichboth inflectional functions are well-attested

Following (30) we obtained an (augmented) 10582 times5483 semantic matrix S whereas before we retained the 5483columns with the highest column variance The (augmented)triphone matrix T for this dataset had the same dimensions

Inspection of the activations of the triphones revealedthat targeted triphones had top activations for 85 of themonolexomic words and 86 of the inflected words Theproportion of words with at most one intruding triphone was97 for both monolexomic and inflected words The graph-based algorithm performed with an overall accuracy of 94accuracies broken down bymorphology revealed an accuracyof 99 for the monolexomic words and an accuracy of 92for inflected words One source of errors for the productionalgorithm is inconsistent coding in the celex database Forinstance the stem of prosper is coded as having a final schwafollowed by r but the inflected forms are coded without ther creating a mismatch between a partially rhotic stem andcompletely non-rhotic inflected variants

We next put model performance to a more stringent testby using 10-fold cross-validation for the inflected variantsFor each fold we trained on all stems and 90 of all inflectedforms and then evaluated performance on the 10of inflectedforms that were not seen in training In this way we can ascer-tain the extent to which our production system (network plusgraph-based algorithm for ordering triphones) is productiveWe excluded from the cross-validation procedure irregularinflected forms forms with celex phonological forms withinconsistent within-paradigm rhoticism as well as forms thestem of which was not available in the training set Thus

Complexity 23

cross-validation was carried out for a total of 6236 inflectedforms

As before the semantic vectors for inflected words wereobtained by addition of the corresponding content and inflec-tional semantic vectors For each training set we calculatedthe transformationmatrixG from the S andTmatrices of thattraining set For an out-of-bag inflected form in the test setwe calculated its semantic vector and multiplied this vectorwith the transformation matrix (using (21)) to obtain thepredicted triphone vector t

The proportion of forms that were predicted correctlywas 062 The proportion of forms ranked second was 025Forms that were incorrectly ranked first typically were otherinflected forms (including bare stems) that happened toreceive stronger support than the targeted form Such errorsare not uncommon in spoken English For instance inthe Buckeye corpus [110] closest is once reduced to closFurthermore Dell [90] classified forms such as concludementfor conclusion and he relax for he relaxes as (noncontextual)errors

The algorithm failed to produce the targeted form for3 of the cases Examples of the forms produced instead arethe past tense for blazed being realized with [zId] insteadof [zd] the plural mouths being predicted as having [Ts] ascoda rather than [Ds] and finest being reduced to finst Thevoiceless production formouth does not follow the dictionarynorm but is used as attested by on-line pronunciationdictionaries Furthermore the voicing alternation is partlyunpredictable (see eg [128] for final devoicing in Dutch)and hence model performance here is not unreasonable Wenext consider model accuracy for derived words

523 Performance on Derived Words When building thevector space model we distinguished between inflectionand word formation Inflected words did not receive theirown semantic vectors By contrast each derived word wasassigned its own lexome together with a lexome for itsderivational function Thus happiness was paired with twolexomes happiness and ness Since the semantic matrix forour dataset already contains semantic vectors for derivedwords we first investigated how well forms are predictedwhen derived words are assessed along with monolexomicwords without any further differentiation between the twoTo this end we constructed a semantic matrix S for 4885words (rows) by 4993 (columns) and constructed the trans-formation matrix G from this matrix and the correspondingtriphone matrix T The predicted form vectors of T =SG supported the targeted triphones above all other tri-phones without exception Furthermore the graph-basedalgorithm correctly reconstructed 99 of all forms withonly 5 cases where it assigned the correct form secondrank

Next we inspected how the algorithm performs whenthe semantic vectors of derived forms are obtained fromthe semantic vectors of their base words and those of theirderivational lexomes instead of using the semantic vectors ofthe derived words themselves To allow subsequent evalua-tion by cross-validation we selected those derived words that

contained an affix that occured at least 30 times in our data set(again (38) agent (177) ful (45) instrument (82) less(54) ly (127) and ness (57)) to a total of 770 complex words

We first combined these derived words with the 3987monolexomic words For the resulting 4885 words the Tand S matrices were constructed from which we derivedthe transformation matrix G and subsequently the matrixof predicted triphone strengths T The proportion of wordsfor which the targeted triphones were the best supportedtriphones was 096 and the graph algorithm performed withan accuracy of 989

In order to assess the productivity of the system weevaluated performance on derived words by means of 10-fold cross-validation For each fold we made sure that eachaffix was present proportionally to its frequency in the overalldataset and that a derived wordrsquos base word was included inthe training set

We first examined performance when the transformationmatrix is estimated from a semantic matrix that contains thesemantic vectors of the derived words themselves whereassemantic vectors for unseen derived words are obtained bysummation of the semantic vectors of base and affix It turnsout that this set-up results in a total failure In the presentframework the semantic vectors of derived words are tooidiosyncratic and too finely tuned to their own collocationalpreferences They are too scattered in semantic space tosupport a transformation matrix that supports the triphonesof both stem and affix

We then used exactly the same cross-validation procedureas outlined above for inflected words constructing semanticvectors for derived words from the semantic vectors of baseand affix and calculating the transformation matrix G fromthese summed vectors For unseen derivedwordsGwas usedto transform the semantic vectors for novel derived words(obtained by summing the vectors of base and affix) intotriphone space

For 75 of the derived words the graph algorithmreconstructed the targeted triphones For 14 of the derivednouns the targeted form was not retrieved These includecases such as resound which the model produced with [s]instead of the (unexpected) [z] sewer which the modelproduced with R instead of only R and tumbler where themodel used syllabic [l] instead of nonsyllabic [l] given in thecelex target form

53 Weak Links in the Triphone Graph and Delays in SpeechProduction An important property of the model is that thesupport for trigrams changes where a wordrsquos graph branchesout for different morphological variants This is illustratedin Figure 9 for the inflected form blending Support for thestem-final triphone End is still at 1 but then blend can endor continue as blends blended or blending The resultinguncertainty is reflected in the weights on the edges leavingEnd In this example the targeted ing form is driven by theinflectional semantic vector for continuous and hence theedge to ndI is best supported For other forms the edgeweights will be different and hence other paths will be bettersupported

24 Complexity

1

1

1051

024023

03

023

1

001 001

1

02

03

023

002

1 0051

bl

blE

lEn

EndndI

dIN

IN

INI

NIN

dId

Id

IdI

dII

IIN

ndz

dz

dzI

zIN

nd

EnI

nIN

lEI

EIN

Figure 9 The directed graph for blending Vertices represent triphones including triphones such as IIN that were posited by the graphalgorithm to bridge potentially relevant transitions that are not instantiated in the training data Edges are labelled with their edge weightsThe graph the edge weights of which are specific to the lexomes blend and continuous incorporates not only the path for blending butalso the paths for blend blends and blended

The uncertainty that arises at branching points in thetriphone graph is of interest in the light of several exper-imental results For instance inter keystroke intervals intyping become longer at syllable and morph boundaries[129 130] Evidence from articulography suggests variabilityis greater at morph boundaries [131] Longer keystroke exe-cution times and greater articulatory variability are exactlywhat is expected under reduced edge support We thereforeexamined whether the edge weights at the first branchingpoint are predictive for lexical processing To this end weinvestigated the acoustic duration of the segment at the centerof the first triphone with a reduced LDL edge weight (in thepresent example d in ndI henceforth lsquobranching segmentrsquo)for those words in our study that are attested in the Buckeyecorpus [110]

The dataset we extracted from the Buckeye corpus com-prised 15105 tokens of a total of 1327 word types collectedfrom 40 speakers For each of these words we calculatedthe relative duration of the branching segment calculatedby dividing segment duration by word duration For thepurposes of statistical evaluation the distribution of thisrelative duration was brought closer to normality bymeans ofa logarithmic transformationWe fitted a generalized additivemixed model [132] to log relative duration with randomintercepts for word and speaker log LDL edge weight aspredictor of interest and local speech rate (Phrase Rate)log neighborhood density (Ncount) and log word frequencyas control variables A summary of this model is presentedin Table 7 As expected an increase in LDL Edge Weight

goes hand in hand with a reduction in the duration of thebranching segment In other words when the edge weight isreduced production is slowed

The adverse effects of weaknesses in a wordrsquos path inthe graph where the path branches is of interest againstthe discussion about the function of weak links in diphonetransitions in the literature on comprehension [133ndash138] Forcomprehension it has been argued that bigram or diphonelsquotroughsrsquo ie low transitional probabilities in a chain of hightransitional probabilities provide points where sequencesare segmented and parsed into their constituents Fromthe perspective of discrimination learning however low-probability transitions function in exactly the opposite wayfor comprehension [54 124] Naive discriminative learningalso predicts that troughs should give rise to shorter process-ing times in comprehension but not because morphologicaldecomposition would proceed more effectively Since high-frequency boundary bigrams and diphones are typically usedword-internally across many words they have a low cuevalidity for thesewords Conversely low-frequency boundarybigrams are much more typical for very specific base+affixcombinations and hence are better discriminative cues thatafford enhanced activation of wordsrsquo lexomes This in turngives rise to faster processing (see also Ramscar et al [78] forthe discriminative function of low-frequency bigrams in low-frequency monolexomic words)

There is remarkable convergence between the directedgraphs for speech production such as illustrated in Figure 9and computational models using temporal self-organizing

Complexity 25

Table 7 Statistics for the partial effects of a generalized additive mixed model fitted to the relative duration of edge segments in the Buckeyecorpus The trend for log frequency is positive accelerating TPRS thin plate regression spline

A parametric coefficients Estimate Std Error t-value p-valueIntercept -19165 00592 -323908 lt 00001Log LDL Edge Weight -01358 00400 -33960 00007Phrase Rate 00047 00021 22449 00248Log Ncount 01114 00189 58837 lt 00001B smooth terms edf Refdf F-value p-valueTPRS log Frequency 19007 19050 75495 00004random intercepts word 11498839 13230000 314919 lt 00001random intercepts speaker 302995 390000 345579 lt 00001

maps (TSOMs [139ndash141]) TSOMs also use error-drivenlearning and in recent work attention is also drawn toweak edges in wordsrsquo paths in the TSOM and the conse-quences thereof for lexical processing [142] We are not usingTSOMs however one reason being that in our experiencethey do not scale up well to realistically sized lexiconsA second reason is that we are pursuing the hypothesisthat form serves meaning and that self-organization ofform by itself is not necessary This hypothesis is likelytoo strong especially as we do not provide a rationale forthe trigram and triphone units that we use to representaspects of form It is worth noting that spatial organizationof form features is not restricted to TSOMs Although inour approach the spatial organization of triphone features isleft unspecified such an organization can be easily enforced(see [85] for further details) by using an algorithm suchas graphopt which self-organizes the vertices of a graphinto a two-dimensional plane Figure 9 was obtained withthis algorithm (Graphopt was developed for the layout oflarge graphs httpwwwschmuhlorggraphopt andis implemented in the igraph package Graphopt uses basicprinciples of physics to iteratively determine an optimallayout Each node in the graph is given both mass and anelectric charge and edges between nodes are modeled asspringsThis sets up a system inwhich there are attracting andrepelling forces between the vertices of the graph and thisphysical system is simulated until it reaches an equilibrium)

54 Discussion Our production network reconstructsknown wordsrsquo forms with a high accuracy 999 formonolexomic words 92 for inflected words and 99 forderived words For novel complex words accuracy under10-fold cross-validation was 62 for inflected words and 75for derived words

The drop in performance for novel forms is perhapsunsurprising given that speakers understand many morewords than they themselves produce even though they hearnovel forms on a fairly regular basis as they proceed throughlife [79 143]

However we also encountered several technical problemsthat are not due to the algorithm but to the representationsthat the algorithm has had to work with First it is surprisingthat accuracy is as high as it is given that the semanticvectors are constructed from a small corpus with a very

simple discriminative algorithm Second we encounteredinconsistencies in the phonological forms retrieved from thecelex database inconsistencies that in part are due to theuse of discrete triphones Third many cases where the modelpredictions do not match the targeted triphone sequence thetargeted forms have a minor irregularity (eg resound with[z] instead of [s]) Fourth several of the typical errors that themodel makes are known kinds of speech errors or reducedforms that one might encounter in engaged conversationalspeech

It is noteworthy that themodel is almost completely data-driven The triphones are derived from celex and otherthan somemanual corrections for inconsistencies are derivedautomatically from wordsrsquo phone sequences The semanticvectors are based on the tasa corpus and were not in any wayoptimized for the production model Given the triphone andsemantic vectors the transformation matrix is completelydetermined No by-hand engineering of rules and exceptionsis required nor is it necessary to specify with hand-codedlinks what the first segment of a word is what its secondsegment is etc as in the weaver model [37] It is only in theheuristic graph algorithm that three thresholds are requiredin order to avoid that graphs become too large to remaincomputationally tractable

The speech production system that emerges from thisapproach comprises first of all an excellent memory forforms that have been encountered before Importantly thismemory is not a static memory but a dynamic one It is not arepository of stored forms but it reconstructs the forms fromthe semantics it is requested to encode For regular unseenforms however a second network is required that projects aregularized semantic space (obtained by accumulation of thesemantic vectors of content and inflectional or derivationalfunctions) onto the triphone output space Importantly itappears that no separate transformations or rules are requiredfor individual inflectional or derivational functions

Jointly the two networks both of which can also betrained incrementally using the learning rule ofWidrow-Hoff[44] define a dual route model attempts to build a singleintegrated network were not successfulThe semantic vectorsof derived words are too idiosyncratic to allow generalizationfor novel forms It is the property of the semantic vectorsof derived lexomes described in Section 323 namely thatthey are close to but not inside the cloud of their content

26 Complexity

lexomes which makes them productive Novel forms do notpartake in the idiosyncracies of lexicalized existing wordsbut instead remain bound to base and derivational lexomesIt is exactly this property that makes them pronounceableOnce produced a novel form will then gravitate as experi-ence accumulates towards the cloud of semantic vectors ofits morphological category meanwhile also developing itsown idiosyncracies in pronunciation including sometimeshighly reduced pronunciation variants (eg tyk for Dutchnatuurlijk ([natyrlk]) [111 144 145] It is perhaps possibleto merge the two routes into one but for this much morefine-grained semantic vectors are required that incorporateinformation about discourse and the speakerrsquos stance withrespect to the adressee (see eg [115] for detailed discussionof such factors)

6 Bringing in Time

Thus far we have not considered the role of time The audiosignal comes in over time longerwords are typically readwithmore than one fixation and likewise articulation is a temporalprocess In this section we briefly outline how time can bebrought into the model We do so by discussing the readingof proper names

Proper names (morphologically compounds) pose achallenge to compositional theories as the people referredto by names such as Richard Dawkins Richard NixonRichardThompson and SandraThompson are not obviouslysemantic composites of each other We therefore assignlexomes to names irrespective of whether the individu-als referred to are real or fictive alive or dead Further-more we assume that when the personal name and familyname receive their own fixations both names are under-stood as pertaining to the same named entity which istherefore coupled with its own unique lexome Thus forthe example sentences in Table 8 the letter trigrams ofJohn are paired with the lexome JohnClark and like-wise the letter trigrams of Clark are paired with this lex-ome

By way of illustration we obtained thematrix of semanticvectors training on 100 randomly ordered tokens of thesentences of Table 8 We then constructed a trigram cuematrix C specifying for each word (John Clark wrote )which trigrams it contains In parallel a matrix L specify-ing for each word its corresponding lexomes (JohnClarkJohnClark write ) was set up We then calculatedthe matrix F by solving CF = L and used F to cal-culate estimated (predicted) semantic vectors L Figure 10presents the correlations of the estimated semantic vectorsfor the word forms John John Welsch Janet and Clarkwith the targeted semantic vectors for the named entitiesJohnClark JohnWelsch JohnWiggam AnneHastieand JaneClark For the sequentially read words John andWelsch the semantic vector generated by John and thatgenerated by Welsch were summed to obtain the integratedsemantic vector for the composite name

Figure 10 upper left panel illustrates that upon readingthe personal name John there is considerable uncertainty

about which named entity is at issue When subsequentlythe family name is read (upper right panel) uncertainty isreduced and John Welsch now receives full support whereasother named entities have correlations close to zero or evennegative correlations As in the small world of the presenttoy example Janet is a unique personal name there is nouncertainty about what named entity is at issue when Janetis read For the family name Clark on its own by contrastJohn Clark and Janet Clark are both viable options

For the present example we assumed that every ortho-graphic word received one fixation However depending onwhether the eye lands far enough into the word namessuch as John Clark and compounds such as graph theorycan also be read with a single fixation For single-fixationreading learning events should comprise the joint trigrams ofthe two orthographic words which are then simultaneouslycalibrated against the targeted lexome (JohnClark graph-theory)

Obviously the true complexities of reading such asparafoveal preview are not captured by this example Never-theless as a rough first approximation this example showsa way forward for integrating time into the discriminativelexicon

7 General Discussion

We have presented a unified discrimination-driven modelfor visual and auditory word comprehension and wordproduction the discriminative lexicon The layout of thismodel is visualized in Figure 11 Input (cochlea and retina)and output (typing and speaking) systems are presented inlight gray the representations of themodel are shown in darkgray For auditory comprehension we make use of frequencyband summary (FBS) features low-level cues that we deriveautomatically from the speech signal The FBS features serveas input to the auditory network F119886 that maps vectors of FBSfeatures onto the semantic vectors of S For reading lettertrigrams represent the visual input at a level of granularitythat is optimal for a functional model of the lexicon (see [97]for a discussion of lower-level visual features) Trigrams arerelatively high-level features compared to the FBS featuresFor features implementing visual input at a much lower levelof visual granularity using histograms of oriented gradients[146] see Linke et al [15] Letter trigrams are mapped by thevisual network F119900 onto S

For reading we also implemented a network heredenoted by K119886 that maps trigrams onto auditory targets(auditory verbal images) represented by triphones Theseauditory triphones in turn are mapped by network H119886onto the semantic vectors S This network is motivated notonly by our observation that for predicting visual lexicaldecision latencies triphones outperform trigrams by a widemargin Network H119886 is also part of the control loop forspeech production see Hickok [147] for discussion of thiscontrol loop and the necessity of auditory targets in speechproduction Furthermore the acoustic durations with whichEnglish speakers realize the stem vowels of English verbs[148] as well as the duration of word final s in English [149]

Complexity 27

Table 8 Example sentences for the reading of proper names To keep the example simple inflectional lexomes are not taken into account

John Clark wrote great books about antsJohn Clark published a great book about dangerous antsJohn Welsch makes great photographs of airplanesJohn Welsch has a collection of great photographs of airplanes landingJohn Wiggam will build great harpsichordsJohn Wiggam will be a great expert in tuning harpsichordsAnne Hastie has taught statistics to great second year studentsAnne Hastie has taught probability to great first year studentsJanet Clark teaches graph theory to second year studentsJohnClark write great book about antJohnClark publish a great book about dangerous antJohnWelsch make great photograph of airplaneJohnWelsch have a collection of great photograph airplane landingJohnWiggam future build great harpsichordJohnWiggam future be a great expert in tuning harpsichordAnneHastie have teach statistics to great second year studentAnneHastie have teach probability to great first year studentJanetClark teach graphtheory to second year student

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

John

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

John Welsch

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

Janet

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

Clark

Pear

son

corr

elat

ion

minus0

50

00

51

0

Figure 10 Correlations of estimated semantic vectors for John JohnWelsch Janet andClarkwith the targeted semantic vectors of JohnClarkJohnWelsch JohnWiggam AnneHastie and JaneClark

can be predicted from ndl networks using auditory targets todiscriminate between lexomes [150]

Speech production is modeled by network G119886 whichmaps semantic vectors onto auditory targets which in turnserve as input for articulation Although current modelsof speech production assemble articulatory targets fromphonemes (see eg [126 147]) a possibility that is certainlycompatible with our model when phones are recontextual-ized as triphones we think that it is worthwhile investigatingwhether a network mapping semantic vectors onto vectors ofarticulatory parameters that in turn drive the articulators canbe made to work especially since many reduced forms arenot straightforwardly derivable from the phone sequences oftheir lsquocanonicalrsquo forms [111 144]

We have shown that the networks of our model havereasonable to high production and recognition accuraciesand also generate a wealth of well-supported predictions forlexical processing Central to all networks is the hypothesisthat the relation between form and meaning can be modeleddiscriminatively with large but otherwise surprisingly simplelinear networks the underlyingmathematics of which (linearalgebra) is well-understood Given the present results it isclear that the potential of linear algebra for understandingthe lexicon and lexical processing has thus far been severelyunderestimated (the model as outlined in Figure 11 is amodular one for reasons of computational convenience andinterpretabilityThe different networks probably interact (seefor some evidence [47]) One such interaction is explored

28 Complexity

typing speaking

orthographic targets

trigrams

auditory targets

triphones

semantic vectors

TASA

auditory cues

FBS features

orthographic cues

trigrams

cochlea retina

To Ta

Ha

GoGa

KaS

Fa Fo

CoCa

Figure 11 Overview of the discriminative lexicon Input and output systems are presented in light gray the representations of the model areshown in dark gray Each of these representations is a matrix with its own set of features Arcs are labeled with the discriminative networks(transformationmatrices) thatmapmatrices onto each otherThe arc from semantic vectors to orthographic vectors is in gray as thismappingis not explored in the present study

in Baayen et al [85] In their production model for Latininflection feedback from the projected triphone paths backto the semantics (synthesis by analysis) is allowed so thatof otherwise equally well-supported paths the path that bestexpresses the targeted semantics is selected For informationflows between systems in speech production see Hickok[147]) In what follows we touch upon a series of issues thatare relevant for evaluating our model

Incremental learning In the present study we esti-mated the weights on the connections of these networkswith matrix operations but importantly these weights canalso be estimated incrementally using the learning rule ofWidrow and Hoff [44] further improvements in accuracyare expected when using the Kalman filter [151] As allmatrices can be updated incrementally (and this holds aswell for the matrix with semantic vectors which are alsotime-variant) and in theory should be updated incrementally

whenever information about the order of learning events isavailable the present theory has potential for studying lexicalacquisition and the continuing development of the lexiconover the lifetime [79]

Morphology without compositional operations Wehave shown that our networks are predictive for a rangeof experimental findings Important from the perspective ofdiscriminative linguistics is that there are no compositionaloperations in the sense of Frege and Russell [1 2 152] Thework on compositionality in logic has deeply influencedformal linguistics (eg [3 153]) and has led to the beliefthat the ldquoarchitecture of the language facultyrdquo is groundedin a homomorphism between a calculus (or algebra) ofsyntactic representations and a calculus (or algebra) based onsemantic primitives Within this tradition compositionalityarises when rules combining representations of form arematched with rules combining representations of meaning

Complexity 29

The approach of Marelli and Baroni [60] who derive thesemantic vector of a complex word from the semantic vectorof its base and a dedicated affix-specific mapping from thepertinent base words to the corresponding derived wordsis in the spirit of compositionality in that a change in form(through affixation) goes hand in hand with a change inmeaning (modeled with a linear mapping in semantic space)

The perspective on morphological processing developedin the present study breaks with this tradition Althoughsemantic vectors for inflected words are obtained by summa-tion of the semantic vectors of the base word and those ofits inflectional functions (see [85] for modeling of the richverbal paradigms of Latin) the form vectors correspondingto the resulting semantic vectors are not obtained by thesummation of vectors for chunks of form nor by any otheroperations on forms Attempts to align chunks of wordsrsquo formwith their semantic structures mdash typically represented bygraphs constructed over semantic primitives (see eg [154])mdash are destined to end in quagmires of arbitrary decisionsabout the number of inflectional classes required (Estonian30 or 300) what the chunks should look like (does Germanhave a morpheme d in its articles and demonstratives derdie das diese ) whether Semitic morphology requiresoperations on form on one or two tiers [155ndash157] andhow many affixal position slots are required for Athebascanlanguages such asNavajo [158] A language that is particularlypainful for compositional theories is Yelı Dnye an Indo-Pacific language spoken on Rossel island (in the south-east ofPapuaNewGuinea)The language has a substantial inventoryof over a 1000 function words used for verb agreementTypically these words are monosyllables that simultaneouslyspecify values for negation tense aspect person and numberof subject deixis evidentiality associated motion and coun-terfactuality Furthermore verbs typically have no less thaneight different stems the use of which depends on particularconstellations of inflectional functions [159 160] Yelı Dnyethus provides a striking counterexample to the hypothesisthat a calculus of meaning would be paralleled by a calculusof form

Whereas the theory of linear discriminative learningradically rejects the idea that morphological processing isgrounded in compositionality as defined in mainstream for-mal linguistic theory it does allow for the conjoint expressionin form of lexical meanings and inflectional functions andfor forms to map onto such conjoint semantic vectors Butjust asmultiple inflectional functions can be expressed jointlyin a single word multiple words can map onto nonconjointsemantic vectors as was illustrated above for named entityrecognition John Wiggam can be understood to refer to avery specific person without this person being a composi-tional function of the proper name John and the surnameWiggam In both these cases there is no isomorphy betweeninflectional functions and lexical meanings on the one handand chunks of form on the other hand

Of course conjoint vectors for lexical meanings andinflectional functions can be implemented andmade to workonly because a Latin verb form such as sapıvissemus isunderstood when building the semantic space S to expresslexomes for person (first) number (plural) voice (active)

tense (pluperfect) mood (subjunctive) and lexical meaning(to know) (see [85] for computational modeling details)Although this could loosely be described as a decomposi-tional approach to inflected words we prefer not to use theterm lsquodecompositionalrsquo as in formal linguistics this termas outlined above has a very different meaning A moreadequate terminology would be that the system is conjointrealizational for production and conjoint inferential for com-prehension (we note here that we have focused on temporallyconjoint realization and inference leaving temporally disjointrealization and inference for sequences of words and possiblycompounds and fixed expression for further research)

Naive versus linear discriminative learning There isone important technical difference between the learningengine of the current study ldl and naive discriminativelearning (ndl) Because ndl makes the simplifying assump-tion that outcomes are orthogonal the weights from the cuesto a given outcome are independent of the weights from thecues to the other outcomes This independence assumptionwhich motivates the word naive in naive discriminativelearning makes it possible to very efficiently update networkweights during incremental learning In ldl by contrast thisindependence no longer holds learning is no longer naiveWe therefore refer to our networks not as ndl networksbut simply as linear discriminative learning networks Actualincremental updating of ldl networks is computationallymore costly but further numerical optimization is possibleand implementations are underway

Scaling An important property of the present approachis that good results are obtained already for datasets ofmodestsize Our semantic matrix is derived from the tasa corpuswhich with only 10 million words is dwarfed by the 28 billionwords of the ukWaC corpus used by Marelli and Baroni [60]more words than any individual speaker can ever encounterin their lifetime Similarly Arnold et al [18] report goodresults for word identification when models are trained on20 hours of speech which contrasts with the huge volumesof speech required for training deep learning networks forspeech recognition Although performance increases some-whatwithmore training data [161] it is clear that considerableheadway can be made with relatively moderate volumesof speech It thus becomes feasible to train models onfor instance Australian English and New Zealand Englishusing already existing speech corpora and to implementnetworks that make precise quantitative predictions for howunderstanding is mediated by listenersrsquo expectations abouttheir interlocutors (see eg [162])

In this study we have obtained good results with asemantic matrix with a dimensionality of around 4000lexomes By itself we find it surprising that a semantic vectorspace can be scaffolded with a mere 4000 words But thisfindingmay also shed light on the phenomenon of childhoodamnesia [163 164] Bauer and Larkina [165] pointed outthat young children have autobiographical memories thatthey may not remember a few years later They argue thatthe rate of forgetting diminishes as we grow older andstabilizes in adulthood Part of this process of forgettingmay relate to the changes in the semantic matrix Cruciallywe expect the correlational structure of lexomes to change

30 Complexity

considerably over time as experience accumulates If this isindeed the case the semantic vectors for the lexomes foryoung children are different from the corresponding lexomesfor older children which will again differ from those ofadults As a consequence autobiographical memories thatwere anchored to the lexomes at a young age can no longerbe accessed as the semantic vectors to which these memorieswere originally anchored no longer exist

No examplars Given how we estimate the transforma-tion matrices with which we navigate between form andmeaning it might seem that our approach is in essence animplementation of an exemplar-based theory of the mentallexicon After all for instance the C119886 matrix specifies foreach auditory word form (exemplar) its pertinent frequencyband summary features However the computational imple-mentation of the present study should not be confused withthe underlying algorithmic conceptualization The underly-ing conceptualization is that learning proceeds incrementallytrial by trial with the weights of transformation networksbeing updated with the learning rule of Widrow and Hoff[44] In the mean trial-by-trial updating with the Widrow-Hoff learning rule results in the same expected connectionweights as obtained by the present estimates using thegeneralized inverse The important conceptual advantage ofincremental learning is however that there is no need to storeexemplars By way of example for auditory comprehensionacoustic tokens are given with the input to the learningsystemThese tokens the ephemeral result of interactingwiththe environment leave their traces in the transformationmatrix F119886 but are not themselves stored The same holdsfor speech production We assume that speakers dynamicallyconstruct the semantic vectors from their past experiencesrather than retrieving these semantic vectors from somedictionary-like fixed representation familiar from currentfile systems The dynamic nature of these semantic vectorsemerged loud and clear from the modeling of novel complexwords for speech production under cross-validation In otherwords our hypothesis is that all input and output vectors areephemeral in the sense that they represent temporary statesof the cognitive system that are not themselves represented inthat system but that leave their traces in the transformationmatrices that jointly characterize and define the cognitivesystem that we approximate with the discriminative lexicon

This dynamic perspective on the mental lexicon asconsisting of several coupled networks that jointly bringthe cognitive system into states that in turn feed into fur-ther networks (not modeled here) for sentence integration(comprehension) and articulation (production) raises thequestion about the status of units in this theory Units playan important role when constructing numeric vectors forform and meaning Units for letter trigrams and phonetrigrams are of course context-enriched letter and phoneunits Content lexomes as well as lexomes for derivationaland inflectional functions are crutches for central semanticfunctions ranging from functions realizing relatively concreteonomasiological functions (lsquothis is a dog not a catrsquo) toabstract situational functions allowing entities events andstates to be specified on the dimensions of time spacequantity aspect etc However crucial to the central thrust of

the present study there are no morphological symbols (unitscombining form andmeaning) normorphological formunitssuch as stems and exponents of any kind in the modelAbove we have outlined the many linguistic considerationsthat have led us not to want to incorporate such units aspart of our theory Importantly as there are no hidden layersin our model it is not possible to seek for lsquosubsymbolicrsquoreflexes of morphological units in patterns of activation overhidden layers Here we depart from earlier connectionistmodels which rejected symbolic representations but retainedthe belief that there should be morphological representationalbeit subsymbolic ones Seidenberg and Gonnerman [166]for instance discuss morphological representations as beinglsquointerlevel representationsrsquo that are emergent reflections ofcorrelations among orthography phonology and semanticsThis is not to say that themore agglutinative a language is themore the present model will seem to the outside observer tobe operating withmorphemes But themore a language tendstowards fusional or polysyntheticmorphology the less strongthis impression will be

Model complexity Next consider the question of howdifficult lexical processing actually is Sidestepping the com-plexities of the neural embedding of lexical processing [126147] here we narrow down this question to algorithmiccomplexity State-of-the-art models in psychology [37 114167] implementmany layers of hierarchically organized unitsand many hold it for an established fact that such units arein some sense psychologically real (see eg [19 91 168])However empirical results can be equally well understoodwithout requiring the theoretical construct of the morpheme(see eg [47 54 58 166 169 170]) The present studyillustrates this point for lsquoboundaryrsquo effects in written andoral production When paths in the triphone graph branchuncertainty increases and processing slows down Althoughobserved lsquoboundaryrsquo effects are compatible with a post-Bloomfieldian lexicon the very same effects arise in ourmodel even though morphemes do not figure in any way inour computations

Importantly the simplicity of the linear mappings in ourmodel of the discriminative lexicon allows us to sidestep theproblem that typically arise in computationally implementedfull-scale hierarchical systems namely that errors arisingat lower levels propagate to higher levels The major stepsforward made in recent end-to-end models in languageengineering suggest that end-to-end cognitive models arealso likely to perform much better One might argue thatthe hidden layers of deep learning networks represent thetraditional lsquohidden layersrsquo of phonemes morphemes andword forms mediating between form and meaning in lin-guistic models and offshoots thereof in psychology Howeverthe case of baboon lexical learning [12 13] illustrates thatthis research strategy is not without risk and that simplerdiscriminative networks can substantially outperform deeplearning networks [15] We grant however that it is conceiv-able that model predictions will improve when the presentlinear networks are replaced by deep learning networks whenpresented with the same input and output representationsused here (see Zhao et al [171] for a discussion of lossfunctions for deep learning targeting numeric instead of

Complexity 31

categorical output vectors) What is unclear is whethersuch networks will improve our understandingmdashthe currentlinear networks offer the analyst connection weights betweeninput and output representations that are straightforwardlylinguistically interpretable Where we think deep learningnetworks really will come into their own is bridging the gapbetween for instance the cochlea and retina on the one handand the heuristic representations (letter trigrams) that wehave used as a starting point for our functional theory of themappings between form and meaning

Network flexibility An important challenge for deeplearning networks designed to model human lexical process-ing is to keep the networks flexible and open to rapid updat-ing This openness to continuous learning is required notonly by findings on categorization [40 41 41 42] includingphonetic categorization [172 173] but is also visible in timeseries of reaction times Baayen and Hendrix [17] reportedimproved prediction of visual lexical decision reaction timesin the British Lexicon Project when a naive discriminativelearning network is updated from trial to trial simulating thewithin-experiment learning that goes on as subjects proceedthrough the experiment In natural speech the phenomenonof phonetic convergence between interlocutors [174] likewisebears witness to a lexicon that is constantly recalibratedin interaction with its environment (for the many ways inwhich interlocutorsrsquo speech aligns see [175]) Likewise therapidity with which listeners adapt to foreign accents [176]within a few minutes shows that lexical networks exhibitfast local optimization As deep learning networks typicallyrequire huge numbers of cycles through the training dataonce trained they are at risk of not having the requiredflexibility for fast local adaptation This risk is reduced for(incremental) wide learning as illustrated by the phoneticconvergence evidenced by the model of Arnold et al [18] forauditory comprehension (a very similar point was made byLevelt [177] in his critique of connectionist models in thenineties)

Open questions This initial study necessarily leavesmany questions unanswered Compounds and inflectedderived words have not been addressed and morphologicaloperations such as reduplication infixation and nonconcate-native morphology provide further testing grounds for thepresent approach We also need to address the processingof compounds with multiple constituents ubiquitous inlanguages as different as German Mandarin and EstonianFurthermore in actual utterances words undergo assimila-tion at their boundaries and hence a crucial test case forthe discriminative lexicon is to model the production andcomprehension of words in multi-word utterances For ourproduction model we have also completely ignored stresssyllabification and pitch which however likely has renderedthe production task more difficult than necessary

A challenge for discriminative morphology is the devel-opment of proper semantic matrices For languages withrich morphology such as Estonian current morphologicalparsers will identify case-inflected words as nominativesgenitives or partitives (among others) but these labelsfor inflectional forms do not correspond to the semanticfunctions encoded in these forms (see eg [178 179]) What

is required are computational tools that detect the appropriateinflectional lexomes for these words in the sentences orutterances in which they are embedded

A related problem specifically for the speech productionmodel is how to predict the strongly reduced word formsthat are rampant in conversational speech [111] Our auditorycomprehension network F119886 is confronted with reduced formsin training and the study of Arnold et al [18] indicates thatpresent identification accuracymay approximate human per-formance for single-word recognition Whereas for auditorycomprehension we are making progress what is required forspeech production is a much more detailed understanding ofthe circumstances under which speakers actually use reducedword forms The factors driving reduction may be in partcaptured by the semantic vectors but we anticipate thataspects of the unfolding discourse play an important role aswell which brings us back to the unresolved question of howto best model not isolated words but words in context

An important challenge for our theory which formalizesincremental implicit learning is how to address and modelthe wealth of explicit knowledge that speakers have abouttheir language We play with the way words sound in poetrywe are perplexed by wordsrsquo meanings and reinterpret themso that they make more sense (eg the folk etymology [180]reflected in modern English crawfish a crustacean and nota fish originally Middle English crevis compare ecrevissein French) we teach morphology in schools and we evenfind that making morphological relations explicit may bebeneficial for aphasic patients [181] The rich culture of worduse cannot but interact with the implicit system but howexactly this interaction unfolds andwhat its consequences areat both sides of the consciousunconscious divide is presentlyunclear

Although many questions remain our model is remark-ably successful in capturing many aspects of lexical andmorphological processing in both comprehension and pro-duction Since themodel builds on a combination of linguisti-callymotivated lsquosmartrsquo low-level representations for form andmeaning and large but otherwise straightforward andmathe-matically well-understood two-layer linear networks (to beclear networks without any hidden layers) the conclusionseems justified that algorithmically the ldquomental lexiconrdquomaybe much simpler than previously thought

Appendix

A Vectors Matrices and MatrixMultiplication

Figure 12 presents two data points 119886 and 119887 with coordinates(3 4) and (-2 -3) Arrows drawn from the origin to thesepoints are shown in blue with a solid and dashed linerespectively We refer to these points as vectors Vectors aredenoted with lower case letters in bold and we place the 119909coordinate of a point next to the 119910 coordinate

a = (3 4) b = (minus2 minus3) (A1)

32 Complexity

minus4 minus2 0 2 4

minus4

minus2

02

4

x

y

a

b

x

y

Figure 12 Points 119886 and 119887 can be transformed into points 119901 and 119902 bya linear transformation

We can represent a and b jointly by placing them next to eachother in a matrix For matrices we use capital letters in boldfont

A = ( 3 4minus2 minus3) (A2)

In addition to the points 119886 and 119887 Figure 12 also shows twoother datapoints 119909 and 119910 The matrix for these two points is

B = (minus5 24 minus1) (A3)

Let us assume that points 119886 and 119887 represent the forms oftwo words 1205961 and 1205962 and that the points 119909 and 119910 representthe meanings of these two words (below we discuss in detailhow exactly wordsrsquo forms and meanings can be representedas vectors of numbers) As morphology is the study of therelation between wordsrsquo forms and their meanings we areinterested in how to transform points 119886 and b into points119909 and 119910 and likewise in how to transform points 119909 and119910 back into points 119886 and 119887 The first transformation isrequired for comprehension and the second transformationfor production

We begin with the transformation for comprehensionFormally we have matrix A which we want to transforminto matrix B A way of mapping A onto B is by means ofa linear transformation using matrix multiplication For 2 times2 matrices matrix multiplication is defined as follows(1198861 11988621198871 1198872) (1199091 11990921199101 1199102) = (11988611199091 + 11988621199101 11988611199092 + 1198862119910211988711199091 + 11988721199101 11988711199092 + 11988721199102) (A4)

Such a mapping of A onto B is given by a matrix F

F = ( 1 2minus2 minus1) (A5)

and it is straightforward to verify that indeed( 3 4minus2 minus3) ( 1 2minus2 minus1) = (minus5 24 minus1) (A6)

Using matrix notation we can write

AF = B (A7)

Given A and B how do we obtain F The answeris straightforward but we need two further concepts theidentity matrix and the matrix inverse For multiplication ofnumbers the identity multiplication is multiplication with 1For matrices multiplication with the identity matrix I leavesthe locations of the points unchangedThe identitymatrix hasones on the main diagonal and zeros elsewhere It is easilyverified that indeed(1 00 1) (1199091 11990921199101 1199102) = (1199091 11990921199101 1199102) (1 00 1) = (1199091 11990921199101 1199102) (A8)

For numbers the inverse of multiplication with 119904 is dividingby 119904 (1119904 ) (119904119909) = 119909 (A9)

For matrices the inverse of a square matrix X is that matrixXminus1 such that their product is the identity matrix

Xminus1X = XXminus1 = I (A10)

For nonsquare matrices the inverse Yminus1 is defined such that

Yminus1 (YX) = (XY) 119884minus1 = X (A11)

We find the square matrix F that maps the square matrices Aonto B as follows

AF = B

Aminus1AF = Aminus1B

IF = Aminus1B

F = Aminus1B(A12)

Since for the present example A and its inverse happen to beidentical (Aminus1 = A) we obtain (A6)

Fmaps wordsrsquo form vectors onto wordsrsquo semantic vectorsLet us now consider the reverse and see how we can obtainwordsrsquo form vectors from wordsrsquo semantic vectors That isgiven B we want to find that matrix G which maps B ontoA ie

BG = A (A13)

Solving this equation proceeds exactly as above

BG = A

Bminus1BG = Bminus1A

IG = Bminus1A

G = Bminus1A(A14)

Complexity 33

The inverse of B is

Bminus1 = (13 2343 53) (A15)

and hence we have

(13 2343 53) ( 3 4minus2 minus3) = (minus13 minus2323 13 ) = G (A16)

The inverse of a matrix needs not exist A square matrixthat does not have an inverse is referred to as a singularmatrix Almost all matrices with which we work in theremainder of this study are singular For singular matricesan approximation of the inverse can be used such as theMoore-Penrose generalized inverse (In R the generalizedinverse is available in theMASS package function ginv Thenumeric library that is used by ginv is LAPACK availableat httpwwwnetliborglapack) In this study wedenote the generalized inverse of a matrix X by X1015840

The examples presented here are restricted to 2 times 2matrices but themathematics generalize tomatrices of largerdimensions When multiplying two matrices X and Y theonly constraint is that the number of columns ofX is the sameas the number of rows ofYThe resultingmatrix has the samenumber of rows as X and the same number of columns as Y

In the present study the rows of matrices represent wordsand columns the features of thesewords For instance the rowvectors of A can represent wordsrsquo form features and the rowvectors of and B the corresponding semantic features Thefirst row vector of A is mapped onto the first row vector ofB as follows (3 4) ( 1 2minus2 minus1) = (minus5 2) (A17)

A transformationmatrix such as F can be represented as atwo-layer networkThe network corresponding to F is shownin Figure 13 When the input vector (3 4) is presented to thenetwork the nodes 1198941 and 1198942 are set to 3 and 4 respectively Toobtain the activations 1199001 = minus5 and 1199002 = 2 for the output vector(minus5 2) we cumulate the evidence at the input weighted bythe connection strengths specified on the edges of the graphThus for 1199001 we obtain the value 3 times 1 + 4 times minus2 = minus5 and for1199002 we have 3 times 2 + 4 times minus1 = 2 In other words the networkmaps the input vector (3 4) onto the output vector (minus5 2)

An important property of linear maps is that they areproductive We can present a novel form vector say

s = (2 2) (A18)

to the network and it will map this vector onto a novelsemantic vector Using matrices this new semantic vector isstraightforwardly calculated(2 2) ( 1 2minus2 minus1) = (minus2 2) (A19)

1 2 minus2 minus1

i1 i2

o1 o2

semantic vectors come out here

form features go in here

connection weightsare the transformationmatrix

Figure 13 The linear network corresponding to the transformationmatrix F There are two input nodes 1198941 and 1198942 (in blue)and twooutput nodes 1199001 and 1199002 in red When the input vector (3 4) ispresented to the network the value at 1199001 is 3 times 1 + 4 times minus2 = minus5and that at 1199002 is 3 times 2 + 4 times minus1 = 2 The network maps the point(3 4) onto the point (minus5 2)B Graph-Based Triphone Sequencing

The hypothesis underlying graph-based triphone sequencingis that in the directed graph that has triphones as verticesand an edge between any two vertices that overlap (iesegments 2 and 3 of triphone 1 are identical to segments 1and 2 of triphone 2) the path from a wordrsquos initial triphone(the triphone starting with ) to a wordrsquos final triphone (thetriphone ending with a ) is the path receiving the bestsupport of all possible paths from the initial to the finaltriphone Unfortunately the directed graph containing allvertices and edges is too large tomake computations tractablein a reasonable amount of computation time The followingalgorithm is a heuristic algorithm that first collects potentiallyrelevant vertices and edges and then calculates all pathsstarting fromany initial vertex using theall simple pathsfunction from the igraph package [127] As a first step allvertices that are supported by the stem and whole wordwith network support exceeding 01 are selected To theresulting set those vertices are added that are supported bythe affix which is conditional on these vertices having anetwork support exceeding 095 A directed graph can nowbe constructed and for each initial vertex all paths startingat these vertices are calculated The subset of those pathsreaching a final vertex is selected and the support for eachpath is calculated As longer paths trivially will have greatersupport the support for a path is weighted for its lengthsimply by dividing the raw support by the path length Pathsare ranked by weighted path support and the path withmaximal support is selected as the acoustic image drivingarticulation

It is possible that no path from an initial vertex to a finalvertex is found due to critical boundary triphones not beinginstantiated in the data on which the model was trainedThis happens when the model is assessed on novel inflected

34 Complexity

nobo

dyso

met

hing

ever

ythi

ngno

thin

gev

eryo

nean

ythi

ngan

yone

som

eone

hers

him his

via

than I

exce

ptou

tside

thei

rsw

hoev

erth

ence

min

ein

side

amon

gst its

thei

rou

rm

yyo

uyo

ur her

she

your

selv

esw

hate

ver us we it

they he

them m

em

yself

him

self

hers

elfou

rselv

esyo

urse

lfits

elfth

emse

lves

abou

tw

ithou

tw

ithin

beyo

ndam

ong

betw

een

thro

ugho

utdu

ring

since by

from in of

with at fo

r tobe

low

abov

e up offdo

wn

upon on

agai

nst

tow

ard

tow

ards

arou

ndne

aral

ong

unde

rbe

hind

besid

eth

roug

hon

toin

toov

erac

ross

nobodysomethingeverythingnothingeveryoneanythinganyonesomeonehershimhisviathanIexceptoutsidetheirswhoeverthencemineinsideamongstitstheirourmyyouyourhersheyourselveswhateverusweittheyhethemmemyselfhimselfherselfourselvesyourselfitselfthemselvesaboutwithoutwithinbeyondamongbetweenthroughoutduringsincebyfrominofwithatfortobelowaboveupoffdownupononagainsttowardtowardsaroundnearalongunderbehindbesidethroughontointooveracross

minus01 0 01 02Value

Color Keyand Histogram

050

015

0025

00C

ount

Figure 14 Heatmap for the correlations of the semantic vectors of pronouns and prepositions Note that quantifier pronouns form a groupof their own that prepositions and the other pronouns form distinguisable groups and that within groups further subclusters are visible egforwe and us she and her and across and over All diagonal elements represent correlations equal to 1 (this is not brought out by the heatmapwhich color-codes diagonal elements with the color for the highest correlation of the range we specified 025)

and derived words under cross-validationTherefore verticesand edges are added to the graph for all nonfinal verticesthat are at the end of any of the paths starting at any initialtriphone Such novel vertices and edges are assigned zeronetwork support Typically paths from the initial vertex tothe final vertex can now be constructed and path support isevaluated as above

For an algorithm that allows triphone paths to includecycles which may be the case in languages with much richer

morphology than English see Baayen et al [85] and for codethe R packageWpmWithLdl described therein

C A Heatmap for Function Words(Pronouns and Prepositions)

The heatmap for the correlations of the semantic vectors ofpronouns and prepositions is presented in Figure 14

Complexity 35

Data Availability

With the exception of the data from theDistributed Little RedHen Lab all primary data are publicly available from the citedsources

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The authors are indebted to Geoff Hollis Jessie Nixon ChrisWestbury and LuisMienhardt for their constructive feedbackon earlier versions of this manuscript This research wassupported by an ERC advancedGrant (no 742545) to the firstauthor

References

[1] G Frege ldquoBegriffsschrift a formula language modeled uponthat of arithmetic for pure thoughtrdquo in From Frege to GodelA Source Book in Mathematical Logic 1879-1931 pp 1ndash82 1967

[2] B RussellAn Inquiry intoMeaning andTruth Allen andUnwinLondon UK 1942

[3] R Montague ldquoThe proper treatment of quantification in ordi-nary englishrdquo in Approaches to Natural Language pp 221ndash242Springer 1973

[4] N Chomsky Syntactic Structures vol 33 Mouton The Hague1957

[5] W J M Levelt Formal Grammars in Linguistics And Psycholin-guistics Volume 1 An Introduction to the Theory of FormalLanguages and Automata Volume 2 Applications in LinguisticTheory Volume 3 Psycholinguistic Applications John BenjaminsPublishing 2008

[6] D F Kleinschmidt and T Florian Jaeger ldquoRobust speechperception Recognize the familiar generalize to the similar andadapt to the novelrdquo Psychological Review vol 122 no 2 pp 148ndash203 2015

[7] L Breiman ldquoRandom forestsrdquo Machine Learning vol 45 no 1pp 5ndash32 2001

[8] C Strobl A-L Boulesteix A Zeileis and T Hothorn ldquoBiasin random forest variable importance measures illustrationssources and a solutionrdquo BMC Bioinformatics vol 8 2007

[9] A Hannun C Case J Casper et al ldquoDeep speechScaling up end-to-end speech recognitionrdquo 2014httpsarxivorgabs14125567

[10] Z Tuske P Golik R Schluter and H Ney ldquoAcoustic modelingwith deep neural networks using raw time signal for lvcsrrdquo inProceedings of the INTERSPEECH pp 890ndash894 2014

[11] J Schmidhuber ldquoDeep learning in neural networks anoverviewrdquo Neural Networks vol 61 pp 85ndash117 2015

[12] T Hannagan J C Ziegler S Dufau J Fagot and J GraingerldquoDeep learning of orthographic representations in baboonsrdquoPLoS ONE vol 9 no 1 2014

[13] J Grainger S Dufau M Montant J C Ziegler and J FagotldquoOrthographic processing in baboons (Papio papio)rdquo Sciencevol 336 no 6078 pp 245ndash248 2012

[14] D Scarf K Boy A U Reinert J Devine O Gunturkun andM Colombo ldquoOrthographic processing in pigeons (columba

livia)rdquo Proceedings of the National Academy of Sciences vol 113no 40 pp 11272ndash11276 2016

[15] M Linke F Broker M Ramscar and R H Baayen ldquoArebaboons learning ldquoorthographicrdquo representations Probablynotrdquo PLoS ONE vol 12 no 8 2017

[16] D H Hubel and T N Wiesel ldquoReceptive fields binocularinteraction and functional architecture in the catrsquos visualcortexrdquo The Journal of Physiology vol 160 pp 106ndash154 1962

[17] R H Baayen and P Hendrix ldquoTwo-layer networks non-linear separation and human learningrdquo in From Semantics toDialectometry Festschrift in honor of JohnNerbonne Tributes 32M Wieling M Kroon G Van Noord and G Bouma Eds pp13ndash22 College Publications 2017

[18] D Arnold F Tomaschek K Sering F Lopez and R H BaayenldquoWords from spontaneous conversational speech can be rec-ognized with human-like accuracy by an error-driven learningalgorithm that discriminates between meanings straight fromsmart acoustic features bypassing the phoneme as recognitionunitrdquo PLoS ONE vol 12 no 4 2017

[19] P Zwitserlood ldquoProcessing and representation of morphologi-cal complexity in native language comprehension and produc-tionrdquo in The Construction of Words G E Booij Ed pp 583ndash602 Springer 2018

[20] G Booij ldquoConstruction morphologyrdquo Language and LinguisticsCompass vol 4 no 7 pp 543ndash555 2010

[21] J P Blevins ldquoWord-based morphologyrdquo Journal of Linguisticsvol 42 no 3 pp 531ndash573 2006

[22] G Stump Inflectional Morphology A Theory of ParadigmStructure Cambridge University Press 2001

[23] R Beard ldquoOn the extent and nature of irregularity in thelexiconrdquo Lingua vol 42 no 4 pp 305ndash341 1977

[24] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1974

[25] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1991

[26] C Hockett ldquoThe origin of speechrdquo Scientific American vol 203pp 89ndash97 1960

[27] M Erelt Estonian Language M Erelt Ed Estonian AcademyPublishers Tallinn Estonia 2003

[28] P H Matthews Grammatical Theory in the United States fromBloomfield to Chomsky Cambridge University Press Cam-bridge 1993

[29] N Chomsky andMHalleTheSound Pattern of English Harperand Row NY USA 1968

[30] T Landauer and S Dumais ldquoA solution to Platorsquos problem Thelatent semantic analysis theory of acquisition induction andrepresentation of knowledgerdquo Psychological Review vol 104 no2 pp 211ndash240 1997

[31] W Weaver ldquoTranslationrdquo in Machine Translation of LanguagesFourteen Essays W N Locke and A D Booth Eds pp 15ndash23MIT Press Cambridge 1955

[32] J R Firth Selected Papers of J R Firth 1952-59 IndianaUniversity Press 1968

[33] K Lund and C Burgess ldquoProducing high-dimensional seman-tic spaces from lexical co-occurrencerdquo Behavior Research Meth-ods Instruments and Computers vol 28 no 2 pp 203ndash2081996

[34] C Shaoul and C Westbury ldquoExploring lexical co-occurrencespace using HiDExrdquo Behavior Research Methods vol 42 no 2pp 393ndash413 2010

36 Complexity

[35] T Mikolov I Sutskever K Chen G Corrado and J DeanldquoDistributed representations of words and phrases and theircompositionalityrdquo inAdvances in Neural Information ProcessingSystems pp 3111ndash3119 2013

[36] A Roelofs ldquoThe WEAVER model of word-form encoding inspeech productionrdquo Cognition vol 64 no 3 pp 249ndash284 1997

[37] W J M Levelt A Roelofs and A S Meyer ldquoA theory of lexicalaccess in speech productionrdquoBehavioral and Brain Sciences vol22 no 1 pp 1ndash38 1999

[38] M Taft ldquoInteractive-activation as a Framework for Under-standing Morphological Processingrdquo Language and CognitiveProcesses vol 9 no 3 pp 271ndash294 1994

[39] R Schreuder and R H Baayen ldquoModeling morphologicalprocessingrdquo inMorphological Aspects of Language Processing LB Feldman Ed pp 131ndash154 Lawrence ErlbaumHillsdale NewJersey USA 1995

[40] B C Love D L Medin and T M Gureckis ldquoSUSTAIN ANetwork Model of Category Learningrdquo Psychological Reviewvol 111 no 2 pp 309ndash332 2004

[41] C J Marsolek ldquoWhat antipriming reveals about primingrdquoTrends in Cognitive Sciences vol 12 no 5 pp 176ndash181 2008

[42] M Ramscar and R Port ldquoCategorization (without categories)rdquoin Handbook of Cognitive Linguistics E Dabrowska and DDivjak Eds pp 75ndash99 De Gruyter Berlin Germany 2015

[43] R A Rescorla and A R Wagner ldquoA theory of Pavlovianconditioning Variations in the effectiveness of reinforcementand nonreinforcementrdquo in Classical Conditioning II CurrentResearch and Theory A H Black and W F Prokasy Eds pp64ndash99 Appleton Century Crofts NY USA 1972

[44] B Widrow and M E Hoff ldquoAdaptive switching circuitsrdquo 1960WESCON Convention Record Part IV pp 96ndash104 1960

[45] R H Baayen R Schreuder and R Sproat ldquoMorphology inthe Mental Lexicon A Computational Model for Visual WordRecognitionrdquo in Lexicon Development for Speech and LanguageProcessing F van Eynde and D Gibbon Eds pp 267ndash291Kluwer Academic Publishers 2000

[46] M Taft ldquoA morphological-decomposition model of lexicalrepresentationrdquo Linguistics vol 26 no 4 pp 657ndash667 1988

[47] MWHarmandM S Seidenberg ldquoComputing themeanings ofwords in reading Cooperative division of labor between visualand phonological processesrdquo Psychological Review vol 111 no3 pp 662ndash720 2004

[48] P C Trimmer J M McNamara A Houston and J A RMarshall ldquoDoes natural selection favour the Rescorla-Wagnerrulerdquo Journal of Theoretical Biology vol 302 pp 39ndash52 2012

[49] R R Miller R C Barnet and N J Grahame ldquoAssessment ofthe Rescorla-Wagner modelrdquo Psychological Bulletin vol 117 no3 pp 363ndash386 1995

[50] L J Kamin ldquoPredictability surprise attention and condition-ingrdquo in Punishment and Aversive Behavior B A Campbell andRMChurch Eds pp 276ndash296Appleton-Century-CroftsNYUSA 1969

[51] R A Rescorla ldquoPavlovian conditioning Itrsquos not what you thinkit isrdquo American Psychologist vol 43 no 3 pp 151ndash160 1988

[52] M Ramscar D Yarlett M Dye K Denny and K ThorpeldquoThe Effects of Feature-Label-Order and Their Implications forSymbolic Learningrdquo Cognitive Science vol 34 no 6 pp 909ndash957 2010

[53] M Ramscar and D Yarlett ldquoLinguistic self-correction in theabsence of feedback A new approach to the logical problem oflanguage acquisitionrdquo Cognitive Science vol 31 no 6 pp 927ndash960 2007

[54] R H Baayen P Milin D F ETHurđevic P Hendrix and MMarelli ldquoAn Amorphous Model for Morphological Process-ing in Visual Comprehension Based on Naive DiscriminativeLearningrdquo Psychological Review vol 118 no 3 pp 438ndash4812011

[55] F M Del Prado Martın R Bertram T Haikio R Schreuderand R H Baayen ldquoMorphological family size in a morphologi-cally rich languageThe case of Finnish compared to Dutch andHebrewrdquo Journal of Experimental Psychology LearningMemoryand Cognition vol 30 pp 1271ndash1278 2004

[56] P Milin D F Durdevic and F M Del Prado Martın ldquoThesimultaneous effects of inflectional paradigms and classes onlexical recognition Evidence from Serbianrdquo Journal of Memoryand Language vol 60 no 1 pp 50ndash64 2009

[57] B K Bergen ldquoThe psychological reality of phonaesthemesrdquoLanguage vol 80 no 2 pp 290ndash311 2004

[58] P Milin L B Feldman M Ramscar P Hendrix and R HBaayen ldquoDiscrimination in lexical decisionrdquo PLoS ONE vol 12no 2 2017

[59] R H Baayen P Milin and M Ramscar ldquoFrequency in lexicalprocessingrdquo Aphasiology vol 30 no 11 pp 1174ndash1220 2016

[60] M Marelli and M Baroni ldquoAffixation in semantic space Mod-eling morpheme meanings with compositional distributionalsemanticsrdquo Psychological Review vol 122 no 3 pp 485ndash5152015

[61] T Sering PMilin andRH Baayen ldquoLanguage comprehensionas amultiple label classification problemrdquo StatisticaNeerlandicapp 1ndash15 2018

[62] R Kaye and RWilson Linear Algebra Oxford University Press1998

[63] S H Ivens and B L Koslin Demands for Reading LiteracyRequire New Accountability Methods Touchstone Applied Sci-ence Associates 1991

[64] T K Landauer P W Foltz and D Laham ldquoAn introduction tolatent semantic analysisrdquoDiscourse Processes vol 25 no 2-3 pp259ndash284 1998

[65] H Schmid ldquoImprovements in Part-of-Speech Tagging with anApplication to Germanrdquo in Proceedings of the ACL SIGDAT-Workshop pp 13ndash25 Dublin Ireland 1995

[66] R H Baayen R Piepenbrock and L Gulikers The CELEXLexical Database (CD-ROM) Linguistic Data ConsortiumUniversity of Pennsylvania Philadelphia PA 1995

[67] J Mitchell and M Lapata ldquoVector-based models of semanticcompositionrdquo in Proceedings of the ACL pp 236ndash244 2008

[68] A Lazaridou M Marelli R Zamparelli and M BaronildquoCompositionally derived representations of morphologicallycomplex words in distributional semanticsrdquo in Proceedings ofthe ACL (1) pp 1517ndash1526 2013

[69] R Cotterell H Schutze and J Eisner ldquoMorphological smooth-ing and extrapolation of word embeddingsrdquo in Proceedings ofthe 54th Annual Meeting of the Association for ComputationalLinguistics vol 1 pp 1651ndash1660 2016

[70] B D Zeller S Pado and J Snajder ldquoTowards semantic valida-tion of a derivational lexiconrdquo in Proceedings of the COLING2014 the 25th International Conference on Computational Lin-guistics Technical Papers pp 1728ndash1739 2014

[71] H Marchand The Categories and Types of Present-Day EnglishWord Formation A Synchronic-Diachronic Approach BeckrsquoscheVerlagsbuchhandlung Munchen Germany 1969

[72] L Bauer EnglishWord-Formation Cambridge University PressCambridge 1983

Complexity 37

[73] I PlagWord-Formation in English CambridgeUniversity PressCambridge 2003

[74] R Beard Lexeme-Morpheme Base Morphology A GeneralTheory of Inflection and Word Formation State University ofNew York Press Albany NY USA 1995

[75] K Geeraert J Newman and R H Baayen ldquoIdiom VariationExperimental Data and a Blueprint of a Computational ModelrdquoTopics in Cognitive Science vol 9 no 3 pp 653ndash669 2017

[76] C Shaoul S Bitschau N Schilling et al ldquondl2 Naive discrim-inative learning an implementation in R R packagerdquo 2015

[77] P Milin D Divjak and R H Baayen ldquoA learning perspectiveon individual differences in skilled reading Exploring andexploiting orthographic and semantic discrimination cuesrdquoJournal of Experimental Psychology Learning Memory andCognition vol 43 no 11 pp 1730ndash1751 2017

[78] M Ramscar P Hendrix C Shaoul P Milin and R H BaayenldquoNonlinear dynamics of lifelong learning the myth of cognitivedeclinerdquo Topics in Cognitive Science vol 6 pp 5ndash42 2014

[79] M Ramscar C C Sun P Hendrix and R H Baayen ldquoTheMis-measurement of Mind Life-Span Changes in Paired-Associate-Learning Scores Reflect the ldquoCostrdquo of Learning Not CognitiveDeclinerdquo Psychological Science vol 28 no 8 pp 1171ndash1179 2017

[80] G desRosiers and D Ivison ldquoPaired associate learning Form 1and form 2 of the Wechsler memory scalerdquo Archives of ClinicalNeuropsychology vol 3 no 1 pp 47ndash67 1988

[81] E Bruni N K Tran andM Baroni ldquoMultimodal distributionalsemanticsrdquo Journal of Artificial Intelligence Research vol 49 pp1ndash47 2014

[82] W N Venables and B D RipleyModern Applied Statistics withS-PLUS Springer NY USA 2002

[83] A B Warriner V Kuperman and M Brysbaert ldquoNorms ofvalence arousal and dominance for 13915 English lemmasrdquoBehavior Research Methods vol 45 no 4 pp 1191ndash1207 2013

[84] D Kastovsky ldquoProductivity in word formationrdquo Linguistics vol24 no 3 pp 585ndash600 1986

[85] R H Baayen Y Chuang and J P Blevins ldquoInflectionalmorphology with linear mappingsrdquoTheMental Lexicon vol 13no 2 pp 232ndash270 2018

[86] Burnage CELEX A Guide for Users Centre for Lexical Infor-mation Nijmegen The Netherlands 1988

[87] C McBride-Chang ldquoModels of Speech Perception and Phono-logical Processing in Readingrdquo Child Development vol 67 no4 pp 1836ndash1856 1996

[88] G C Van Orden H Kloos et al ldquoThe question of phonologyand readingrdquo inThe Science of Reading A Handbook pp 61ndash782005

[89] D Jared and K OrsquoDonnell ldquoSkilled adult readers activate themeanings of high-frequency words using phonology Evidencefrom eye trackingrdquoMemoryampCognition vol 45 no 2 pp 334ndash346 2017

[90] G S Dell ldquoA Spreading-Activation Theory of Retrieval inSentence Productionrdquo Psychological Review vol 93 no 3 pp283ndash321 1986

[91] A Marantz ldquoNo escape from morphemes in morphologicalprocessingrdquo Language and Cognitive Processes vol 28 no 7 pp905ndash916 2013

[92] M Bozic W D Marslen-Wilson E A Stamatakis M HDavis and L K Tyler ldquoDifferentiating morphology formand meaning Neural correlates of morphological complexityrdquoCognitive Neuroscience vol 19 no 9 pp 1464ndash1475 2007

[93] R F Port andA P Leary ldquoAgainst formal phonologyrdquoLanguagevol 81 no 4 pp 927ndash964 2005

[94] J P Blevins ldquoStems and paradigmsrdquo Language vol 79 no 4 pp737ndash767 2003

[95] R Forsyth and D Holmes ldquoFeature-finding for test classifica-tionrdquo Literary and Linguistic Computing vol 11 no 4 pp 163ndash174 1996

[96] H Pham and R H Baayen ldquoVietnamese compounds showan anti-frequency effect in visual lexical decisionrdquo LanguageCognition and Neuroscience vol 30 no 9 pp 1077ndash1095 2015

[97] L Cohen and S Dehaene ldquoVentral and dorsal contributions toword readingrdquo inTheCognitive Neurosciences M S GazzanigaEd pp 789ndash804 Massachusetts Institute of Technology 2009

[98] E Keuleers P Lacey K Rastle and M Brysbaert ldquoThe BritishLexicon Project Lexical decision data for 28730 monosyllabicand disyllabic English wordsrdquo Behavior Research Methods vol44 no 1 pp 287ndash304 2012

[99] R K Wagner J K Torgesen and C A Rashotte ldquoDevel-opment of reading-related phonological processing abilitiesnew evidence of bidirectional causality from a latent variablelongitudinal studyrdquoDevelopmental Psychology vol 30 no 1 pp73ndash87 1994

[100] K F EWong andH-C Chen ldquoOrthographic and phonologicalprocessing in reading Chinese text Evidence from eye fixa-tionsrdquo Language and Cognitive Processes vol 14 no 5-6 pp461ndash480 1999

[101] R L Newman D Jared and C A Haigh ldquoDoes phonologyplay a role when skilled readers read high-frequency wordsEvidence from ERPsrdquo Language and Cognitive Processes vol 27no 9 pp 1361ndash1384 2012

[102] D Jared J Ashby S J Agauas and B A Levy ldquoPhonologicalactivation of word meanings in grade 5 readersrdquo Journal ofExperimental Psychology LearningMemory andCognition vol42 no 4 pp 524ndash541 2016

[103] T Bitan A Kaftory A Meiri-Leib Z Eviatar and O PelegldquoPhonological ambiguity modulates resolution of semanticambiguity during reading An fMRI study of Hebrewrdquo Neu-ropsychology vol 31 no 7 pp 759ndash777 2017

[104] D Jared and S Bainbridge ldquoReading homophone puns Evi-dence from eye trackingrdquo Canadian Journal of ExperimentalPsychology vol 71 no 1 pp 2ndash13 2017

[105] S Amenta M Marelli and S Sulpizio ldquoFrom sound tomeaning Phonology-to-Semantics mapping in visual wordrecognitionrdquo Psychonomic Bulletin amp Review vol 24 no 3 pp887ndash893 2017

[106] M Perrone-Bertolotti J Kujala J R Vidal et al ldquoHow silent issilent reading intracerebral evidence for top-down activationof temporal voice areas during readingrdquo The Journal of Neuro-science vol 32 no 49 pp 17554ndash17562 2012

[107] B Yao P Belin andC Scheepers ldquoSilent reading of direct versusindirect speech activates voice-selective areas in the auditorycortexrdquo Journal of Cognitive Neuroscience vol 23 no 10 pp3146ndash3152 2011

[108] M Coltheart B Curtis P Atkins and M Haller ldquoMod-els of Reading Aloud Dual-Route and Parallel-Distributed-Processing Approachesrdquo Psychological Review vol 100 no 4pp 589ndash608 1993

[109] M Coltheart ldquoModeling reading The dual-route approachrdquo inThe Science of Reading A Handbook pp 6ndash23 2005

38 Complexity

[110] M Pitt K Johnson E Hume S Kiesling and W RaymondldquoThe Buckeye corpus of conversational speech Labeling con-ventions and a test of transcriber reliabilityrdquo Speech Communi-cation vol 45 no 1 pp 89ndash95 2005

[111] K Johnson ldquoMassive reduction in conversational AmericanEnglishrdquo in Proceedings of the Spontaneous Speech Data andAnalysis Proceedings of the 1st Session of the 10th InternationalSymposium pp 29ndash54 The National International Institute forJapanese Language Tokyo Japan 2004

[112] C Phillips ldquoLevels of representation in the electrophysiology ofspeech perceptionrdquo Cognitive Science vol 25 no 5 pp 711ndash7312001

[113] R L Diehl A J Lotto and L L Holt ldquoSpeech perceptionrdquoAnnual Review of Psychology vol 55 pp 149ndash179 2004

[114] D Norris and J M McQueen ldquoShortlist B a Bayesian model ofcontinuous speech recognitionrdquo Psychological Review vol 115no 2 pp 357ndash395 2008

[115] S Hawkins ldquoRoles and representations of systematic finephonetic detail in speech understandingrdquo Journal of Phoneticsvol 31 no 3-4 pp 373ndash405 2003

[116] CCucchiarini andH Strik ldquoAutomatic Phonetic TranscriptionAn overviewrdquo in Proceedings of the 15th ICPhS pp 347ndash350Barcelona Spain 2003

[117] K JohnsonTheAuditoryPerceptual Basis for Speech Segmenta-tion Ohio State University Working Papers in Linguistics 1997

[118] H Fletcher ldquoAuditory patternsrdquo Reviews of Modern Physics vol12 no 1 pp 47ndash65 1940

[119] D Arnold ldquoAcousticndlcoder Coding sound files for use withndl R package version 101rdquo 2017

[120] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2016

[121] J Pickett and I Pollack ldquoIntelligibility of excerpts from fluentspeech Effects of rate of utterance and duration of excerptrdquoLanguage and Speech vol 6 pp 151ndash164 1963

[122] L Shockey ldquoPerception of reduced forms by non-native speak-ers of Englishrdquo in Sound Patterns of Spontaneous Speech DDuez Ed pp 97ndash100 ESCA Aix 1998

[123] M Ernestus H Baayen and R Schreuder ldquoThe recognition ofreduced word formsrdquo Brain and Language vol 81 no 1-3 pp162ndash173 2002

[124] R H Baayen C Shaoul J Willits and M Ramscar ldquoCom-prehension without segmentation a proof of concept withnaive discriminative learningrdquo Language Cognition and Neu-roscience vol 31 no 1 pp 106ndash128 2016

[125] C P Browman and L Goldstein ldquoArticulatory phonology Anoverviewrdquo Phonetica vol 49 no 3-4 pp 155ndash180 1992

[126] J A Tourville and F H Guenther ldquoThe DIVA model A neuraltheory of speech acquisition and productionrdquo Language andCognitive Processes vol 26 no 7 pp 952ndash981 2011

[127] G Csardi and T Nepusz ldquoThe igraph software package forcomplex network researchrdquo InterJournal Complex Systems vol1695 no 5 pp 1ndash9 2006

[128] M Ernestus and R H Baayen ldquoPredicting the unpredictableInterpreting neutralized segments in Dutchrdquo Language vol 79no 1 pp 5ndash38 2003

[129] R Weingarten G Nottbusch and U Will ldquoMorphemes sylla-bles and graphemes in written word productionrdquo inMultidisci-plinary Approaches to Speech Production T Pechmann and CHabel Eds pp 529ndash572 Mouton de Gruyter Berlin Germany2004

[130] R Bertram F E Toslashnnessen S Stromqvist J Hyona andP Niemi ldquoCascaded processing in written compound wordproductionrdquo Frontiers in Human Neuroscience vol 9 pp 1ndash102015

[131] T Cho ldquoEffects of morpheme boundaries on intergesturaltiming Evidence from Koreanrdquo Phonetica vol 58 no 3 pp129ndash162 2001

[132] S N Wood Generalized Additive Models Chapman ampHallCRC NY USA 2017

[133] M Seidenberg ldquoSublexical structures in visual word recogni-tion Access units or orthographic redundancyrdquo in Attentionand Performance XII M Coltheart Ed pp 245ndash264 LawrenceErlbaum Associates Hove 1987

[134] J M McQueen ldquoSegmentation of continuous speech usingphonotacticsrdquo Journal of Memory and Language vol 39 no 1pp 21ndash46 1998

[135] J B Hay Causes and Consequences of Word Structure Rout-ledge NY USA London UK 2003

[136] J B Hay ldquoFrom speech perception to morphology Affix-ordering revisitedrdquo Language vol 78 pp 527ndash555 2002

[137] J B Hay and R H Baayen ldquoPhonotactics parsing and produc-tivityrdquo Italian Journal of Linguistics vol 15 no 1 pp 99ndash1302003

[138] J Hay J Pierrehumbert and M Beckman ldquoSpeech perceptionwell-formedness and the statistics of the lexiconrdquo Papers inlaboratory phonology VI pp 58ndash74 2004

[139] M Ferro C Marzi and V Pirrelli ldquoA self-organizing modelof word storage and processing implications for morphologylearningrdquo Lingue e Linguaggio vol 10 no 2 pp 209ndash226 2011

[140] F ChersiM Ferro G Pezzulo andV Pirrelli ldquoTopological self-organization and prediction learning support both action andlexical chains in the brainrdquo Topics in Cognitive Science vol 6no 3 pp 476ndash491 2014

[141] V Pirrelli M Ferro and C Marzi ldquoComputational complexityof abstractive morphologyrdquo in Understanding and MeasuringMorphological Complexity M Bearman D Brown and G GCorbett Eds pp 141ndash166 Oxford University Press 2015

[142] C Marzi M Ferro and V Pirrelli ldquoIs inflectional irregularitydysfunctional to human processingrdquo in Abstract Booklet TheMental Lexicon 2018 V Kuperman Ed University of AlbertaEdmonton 2018

[143] E Keuleers M Stevens P Mandera and M Brysbaert ldquoWordknowledge in the crowd Measuring vocabulary size and wordprevalence in a massive online experimentrdquo The QuarterlyJournal of Experimental Psychology vol 68 no 8 pp 1665ndash16922015

[144] M Ernestus Voice Assimilation And Segment Reduction inCasual Dutch A Corpus-Based Study of the Phonology-PhoneticsInterface LOT Utrecht the Netherlands 2000

[145] R Kemps M Ernestus R Schreuder and H Baayen ldquoProcess-ing reducedword formsThe suffix restoration effectrdquoBrain andLanguage vol 90 no 1-3 pp 117ndash127 2004

[146] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) vol 1 pp 886ndash893 June 2005

[147] G Hickok ldquoThe architecture of speech production and the roleof the phoneme in speech processingrdquo Language Cognition andNeuroscience vol 29 no 1 pp 2ndash20 2014

[148] B V Tucker M Sims and R H Baayen Opposing Forceson Acoustic Duration Manuscript University of Alberta andUniversity of Tubingen 2018

Complexity 39

[149] I Plag J Homann and G Kunter ldquoHomophony and mor-phology The acoustics of word-final S in Englishrdquo Journal ofLinguistics vol 53 no 1 pp 181ndash216 2017

[150] F Tomaschek I Plag M Ernestus and R H BaayenModeling the Duration of Word-Final s in English withNaive Discriminative Learning Manuscript University ofSiegenTubingenNijmegen 2018

[151] R E Kalman ldquoA new approach to linear filtering and predictionproblemsrdquo Journal of Fluids Engineering vol 82 no 1 pp 35ndash451960

[152] B Russell ldquoOn denotingrdquo Mind vol 14 no 56 pp 479ndash4931905

[153] N Hornstein Logical Form FromGB toMinimalism Blackwell1995

[154] R Jackendoff Semantic StructuresMITPress Cambridge 1990[155] J J McCarthy ldquoA prosodic theory of non-concatenative mor-

phologyrdquo Linguistic Inquiry vol 12 pp 373ndash418 1981[156] A Ussishkin ldquoA Fixed Prosodic Theory of Nonconcatenative

Templaticmorphologyrdquo Natural Language amp Linguistic Theoryvol 23 no 1 pp 169ndash218 2005

[157] A Ussishkin ldquoAffix-favored contrast inequity and psycholin-guistic grounding for non-concatenative morphologyrdquo Mor-phology vol 16 no 1 pp 107ndash125 2006

[158] RW Young andWMorganTheNavajo Language A Grammarand Colloquial Dictionary University of New Mexico Press1980

[159] S C Levinson and A Majid ldquoThe island of time Yelı Dnye thelanguage of Rossel islandrdquo Frontiers in psychology vol 4 2013

[160] S C Levinson ldquoThe language of space in Yelı Dnyerdquo inGrammars of Space Explorations in Cognitive Diversity pp 157ndash203 Cambridge University Press 2006

[161] E Shafaei-Bajestan and R H Baayen ldquoWide Learning forAuditory Comprehensionrdquo in Proceedings of the Interspeech2018 2018

[162] J Hay and K Drager ldquoStuffed toys and speech perceptionrdquoLinguistics vol 48 no 4 pp 865ndash892 2010

[163] V Henri and C Henri ldquoOn our earliest recollections ofchildhoodrdquo Psychological Review vol 2 pp 215-216 1895

[164] S Freud ldquoChildhood and concealing memoriesrdquo in The BasicWritings of Sigmund Freud The Modern Library NY USA19051953

[165] P J Bauer and M Larkina ldquoThe onset of childhood amnesiain childhood A prospective investigation of the course anddeterminants of forgetting of early-life eventsrdquoMemory vol 22no 8 pp 907ndash924 2014

[166] M S Seidenberg and L M Gonnerman ldquoExplaining deriva-tional morphology as the convergence of codesrdquo Trends inCognitive Sciences vol 4 no 9 pp 353ndash361 2000

[167] M Coltheart K Rastle C Perry R Langdon and J ZieglerldquoDRC a dual route cascaded model of visual word recognitionand reading aloudrdquoPsychological Review vol 108 no 1 pp 204ndash256 2001

[168] M V Butz and E F Kutter How the Mind Comes into BeingIntroducing Cognitive Science from a Functional and Computa-tional Perspective Oxford University Press 2016

[169] M W Harm and M S Seidenberg ldquoPhonology readingacquisition and dyslexia Insights from connectionist modelsrdquoPsychological Review vol 106 no 3 pp 491ndash528 1999

[170] L M Gonnerman M S Seidenberg and E S AndersenldquoGraded semantic and phonological similarity effects in prim-ing Evidence for a distributed connectionist approach to

morphologyrdquo Journal of Experimental Psychology General vol136 no 2 pp 323ndash345 2007

[171] H Zhao O Gallo I Frosio and J Kautz ldquoLoss Functions forImage Restoration With Neural Networksrdquo IEEE Transactionson Computational Imaging vol 3 no 1 pp 47ndash57 2017

[172] C Clarke and P Luce ldquoPerceptual adaptation to speakercharacteristics Vot boundaries in stop voicing categorizationrdquoin Proceedings of the ISCA Workshop on Plasticity in SpeechPerception 2005

[173] DNorris JMMcQueen andACutler ldquoPerceptual learning inspeechrdquo Cognitive Psychology vol 47 no 2 pp 204ndash238 2003

[174] A Schweitzer and N Lewandowski ldquoSocial factors in conver-gence of f1 and f2 in spontaneous speechrdquo in Proceedings ofthe 10th International Seminar on Speech Production CologneGermany 2014

[175] M J Pickering and S Garrod ldquoToward a mechanistic psychol-ogy of dialoguerdquo Behavioral and Brain Sciences vol 27 no 2 pp169ndash190 2004

[176] C M Clarke and M F Garrett ldquoRapid adaptation to foreign-accented Englishrdquo The Journal of the Acoustical Society ofAmerica vol 116 no 6 pp 3647ndash3658 2004

[177] W J Levelt ldquoDie konnektionistische moderdquo Sprache UndKognition vol 10 no 2 pp 61ndash72 1991

[178] A Kostic ldquoInformational load constraints on processinginflected morphologyrdquo in Morphological Aspects of LanguageProcessing L B Feldman Ed Lawrence Erlbaum Inc Publish-ers New Jersey USA 1995

[179] J P BlevinsWord andParadigmMorphology OxfordUniversityPress 2016

[180] E Forstemann Uber Deutsche volksetymologie Zeitschrift furvergleichende Sprachforschung auf dem Gebiete des DeutschenGriechischen und Lateinischen 1852

[181] K Nault Morphological Therapy Protocal [Ph D thesis] Uni-versity of Alberta Edmonton 2010

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 21: The Discriminative Lexicon: A Unified Computational Model ...downloads.hindawi.com/journals/complexity/2019/4895891.pdf · as developed within computational linguistics and com-puter

Complexity 21

T

= wV wVn Vn tu tu Tr Tri rionetwothree

( 100100

100010

010001

001001 ) (19)

For production our interest is in thematrixG that transformsthe row vectors of S into the row vectors of T ie we need tosolve

SG = T (20)

Given G we can predict for any semantic vector s the vectorof triphones t that quantifies the support for the triphonesprovided by s simply by multiplying s with G

t = sG (21)

As before the transformation matrix G is straightforwardto estimate Let S1015840 denote the Moore-Penrose generalizedinverse of S Since

S1015840SG = S1015840T (22)

we have

G = S1015840T (23)

Given G we can predict the triphone matrix T from thesemantic matrix S

SG = T (24)

For the present example the inverse of S and S1015840 is

S1015840 = one two threeonetwothree

( 110minus021minus009minus029107minus008

minus041minus002104 ) (25)

and the transformation matrix G is

G = wV wVn Vn tu tu Tr Tri ri

onetwothree

( 110minus021minus009110minus021minus009

110minus021minus009minus029107minus008

minus029107minus008minus041minus002104

minus041minus002104minus041minus002104 ) (26)

For this simple example T is virtually identical to T Forrealistic data T will not be identical to T but will be anapproximation of it that is optimal in the least squares senseThe triphones with the strongest support are expected to bethe most likely to be the triphones defining a wordrsquos form

We made use of the S matrix which we derived fromthe tasa corpus as described in Section 3 The majority ofcolumns of the 23561 times 23561 matrix S show very smalldeviations from zero and hence are uninformative As beforewe reduced the number of columns of S by removing columnswith very low variance Here one option is to remove allcolumns with a variance below a preset threshold 120579 Howeverto avoid adding a threshold as a free parameter we setthe number of columns retained to the number of differenttriphones in a given dataset a number which is around 119899 =4500 In summary S denotes a 119908times119899 matrix that specifies foreach of 119908 words an 119899-dimensional semantic vector

52 Model Performance

521 Performance on Monolexomic Words We first exam-ined model performance on a dataset comprising monolex-omicwords that did not carry any inflectional exponentsThisdataset of 3987words comprised three irregular comparatives

(elder less andmore) two irregular superlatives (least most)28 irregular past tense forms and one irregular past participle(smelt) For this set of words we constructed a 3987 times 4446matrix of semantic vectors S119898 (a submatrix of the S matrixintroduced above in Section 3) and a 3987 times 4446 triphonematrix T119898 (a submatrix of T) The number of colums of S119898was set to the number of different triphones (the columndimension of T119898) Those column vectors of S119898 were retainedthat had the 4446 highest variances We then estimated thetransformation matrix G and used this matrix to predict thetriphones that define wordsrsquo forms

We evaluated model performance in two ways First weinspected whether the triphones with maximal support wereindeed the targeted triphones This turned out to be the casefor all words Targeted triphones had an activation valueclose to one and nontargeted triphones an activation closeto zero As the triphones are not ordered we also investi-gated whether the sequence of phones could be constructedcorrectly for these words To this end we set a thresholdof 099 extracted all triphones with an activation exceedingthis threshold and used the all simple paths (a path issimple if the vertices it visits are not visited more than once)function from the igraph package [127] to calculate all pathsstarting with any left-edge triphone in the set of extractedtriphones From the resulting set of paths we selected the

22 Complexity

longest path which invariably was perfectly aligned with thesequence of triphones that defines wordsrsquo forms

We also evaluated model performance with a secondmore general heuristic algorithm that also makes use of thesame algorithm from graph theory Our algorithm sets up agraph with vertices collected from the triphones that are bestsupported by the relevant semantic vectors and considers allpaths it can find that lead from an initial triphone to a finaltriphoneThis algorithm which is presented inmore detail inthe appendix and which is essential for novel complex wordsproduced the correct form for 3982 out of 3987 words Itselected a shorter form for five words int for intent lin forlinnen mis for mistress oint for ointment and pin for pippinThe correct forms were also found but ranked second due toa simple length penalty that is implemented in the algorithm

From these analyses it is clear that mapping nearly4000 semantic vectors on their corresponding triphone pathscan be accomplished with very high accuracy for Englishmonolexomic words The question to be addressed nextis how well this approach works for complex words Wefirst address inflected forms and limit ourselves here to theinflected variants of the present set of 3987 monolexomicwords

522 Performance on Inflected Words Following the classicdistinction between inflection and word formation inflectedwords did not receive semantic vectors of their own Never-theless we can create semantic vectors for inflected words byadding the semantic vector of an inflectional function to thesemantic vector of its base However there are several ways inwhich themapping frommeaning to form for inflectedwordscan be set up To explain this we need some further notation

Let S119898 and T119898 denote the submatrices of S and T thatcontain the semantic and triphone vectors of monolexomicwords Assume that a subset of 119896 of these monolexomicwords is attested in the training corpus with inflectionalfunction 119886 Let T119886 denote the matrix with the triphonevectors of these inflected words and let S119886 denote thecorresponding semantic vectors To obtain S119886 we take thepertinent submatrix S119898119886 from S119898 and add the semantic vectors119886 of the affix

S119886 = S119898119886 + i⨂ s119886 (27)

Here i is a unit vector of length 119896 and ⨂ is the generalizedKronecker product which in (27) stacks 119896 copies of s119886 row-wise As a first step we could define a separate mapping G119886for each inflectional function 119886

S119886G119886 = T119886 (28)

but in this set-up learning of inflected words does not benefitfrom the knowledge of the base words This can be remediedby a mapping for augmented matrices that contain the rowvectors for both base words and inflected words[S119898

S119886]G119886 = [T119898

T119886] (29)

The dimension of G119886 (length of semantic vector by lengthof triphone vector) remains the same so this option is notmore costly than the preceding one Nevertheless for eachinflectional function a separate large matrix is required Amuch more parsimonious solution is to build augmentedmatrices for base words and all inflected words jointly

[[[[[[[[[[S119898S1198861S1198862S119886119899

]]]]]]]]]]G = [[[[[[[[[[

T119898T1198861T1198862T119886119899

]]]]]]]]]](30)

The dimension of G is identical to that of G119886 but now allinflectional functions are dealt with by a single mappingIn what follows we report the results obtained with thismapping

We selected 6595 inflected variants which met the cri-terion that the frequency of the corresponding inflectionalfunction was at least 50 This resulted in a dataset with 91comparatives 97 superlatives 2401 plurals 1333 continuousforms (eg walking) 859 past tense forms and 1086 formsclassed as perfective (past participles) as well as 728 thirdperson verb forms (eg walks) Many forms can be analyzedas either past tenses or as past participles We followed theanalyses of the treetagger which resulted in a dataset inwhichboth inflectional functions are well-attested

Following (30) we obtained an (augmented) 10582 times5483 semantic matrix S whereas before we retained the 5483columns with the highest column variance The (augmented)triphone matrix T for this dataset had the same dimensions

Inspection of the activations of the triphones revealedthat targeted triphones had top activations for 85 of themonolexomic words and 86 of the inflected words Theproportion of words with at most one intruding triphone was97 for both monolexomic and inflected words The graph-based algorithm performed with an overall accuracy of 94accuracies broken down bymorphology revealed an accuracyof 99 for the monolexomic words and an accuracy of 92for inflected words One source of errors for the productionalgorithm is inconsistent coding in the celex database Forinstance the stem of prosper is coded as having a final schwafollowed by r but the inflected forms are coded without ther creating a mismatch between a partially rhotic stem andcompletely non-rhotic inflected variants

We next put model performance to a more stringent testby using 10-fold cross-validation for the inflected variantsFor each fold we trained on all stems and 90 of all inflectedforms and then evaluated performance on the 10of inflectedforms that were not seen in training In this way we can ascer-tain the extent to which our production system (network plusgraph-based algorithm for ordering triphones) is productiveWe excluded from the cross-validation procedure irregularinflected forms forms with celex phonological forms withinconsistent within-paradigm rhoticism as well as forms thestem of which was not available in the training set Thus

Complexity 23

cross-validation was carried out for a total of 6236 inflectedforms

As before the semantic vectors for inflected words wereobtained by addition of the corresponding content and inflec-tional semantic vectors For each training set we calculatedthe transformationmatrixG from the S andTmatrices of thattraining set For an out-of-bag inflected form in the test setwe calculated its semantic vector and multiplied this vectorwith the transformation matrix (using (21)) to obtain thepredicted triphone vector t

The proportion of forms that were predicted correctlywas 062 The proportion of forms ranked second was 025Forms that were incorrectly ranked first typically were otherinflected forms (including bare stems) that happened toreceive stronger support than the targeted form Such errorsare not uncommon in spoken English For instance inthe Buckeye corpus [110] closest is once reduced to closFurthermore Dell [90] classified forms such as concludementfor conclusion and he relax for he relaxes as (noncontextual)errors

The algorithm failed to produce the targeted form for3 of the cases Examples of the forms produced instead arethe past tense for blazed being realized with [zId] insteadof [zd] the plural mouths being predicted as having [Ts] ascoda rather than [Ds] and finest being reduced to finst Thevoiceless production formouth does not follow the dictionarynorm but is used as attested by on-line pronunciationdictionaries Furthermore the voicing alternation is partlyunpredictable (see eg [128] for final devoicing in Dutch)and hence model performance here is not unreasonable Wenext consider model accuracy for derived words

523 Performance on Derived Words When building thevector space model we distinguished between inflectionand word formation Inflected words did not receive theirown semantic vectors By contrast each derived word wasassigned its own lexome together with a lexome for itsderivational function Thus happiness was paired with twolexomes happiness and ness Since the semantic matrix forour dataset already contains semantic vectors for derivedwords we first investigated how well forms are predictedwhen derived words are assessed along with monolexomicwords without any further differentiation between the twoTo this end we constructed a semantic matrix S for 4885words (rows) by 4993 (columns) and constructed the trans-formation matrix G from this matrix and the correspondingtriphone matrix T The predicted form vectors of T =SG supported the targeted triphones above all other tri-phones without exception Furthermore the graph-basedalgorithm correctly reconstructed 99 of all forms withonly 5 cases where it assigned the correct form secondrank

Next we inspected how the algorithm performs whenthe semantic vectors of derived forms are obtained fromthe semantic vectors of their base words and those of theirderivational lexomes instead of using the semantic vectors ofthe derived words themselves To allow subsequent evalua-tion by cross-validation we selected those derived words that

contained an affix that occured at least 30 times in our data set(again (38) agent (177) ful (45) instrument (82) less(54) ly (127) and ness (57)) to a total of 770 complex words

We first combined these derived words with the 3987monolexomic words For the resulting 4885 words the Tand S matrices were constructed from which we derivedthe transformation matrix G and subsequently the matrixof predicted triphone strengths T The proportion of wordsfor which the targeted triphones were the best supportedtriphones was 096 and the graph algorithm performed withan accuracy of 989

In order to assess the productivity of the system weevaluated performance on derived words by means of 10-fold cross-validation For each fold we made sure that eachaffix was present proportionally to its frequency in the overalldataset and that a derived wordrsquos base word was included inthe training set

We first examined performance when the transformationmatrix is estimated from a semantic matrix that contains thesemantic vectors of the derived words themselves whereassemantic vectors for unseen derived words are obtained bysummation of the semantic vectors of base and affix It turnsout that this set-up results in a total failure In the presentframework the semantic vectors of derived words are tooidiosyncratic and too finely tuned to their own collocationalpreferences They are too scattered in semantic space tosupport a transformation matrix that supports the triphonesof both stem and affix

We then used exactly the same cross-validation procedureas outlined above for inflected words constructing semanticvectors for derived words from the semantic vectors of baseand affix and calculating the transformation matrix G fromthese summed vectors For unseen derivedwordsGwas usedto transform the semantic vectors for novel derived words(obtained by summing the vectors of base and affix) intotriphone space

For 75 of the derived words the graph algorithmreconstructed the targeted triphones For 14 of the derivednouns the targeted form was not retrieved These includecases such as resound which the model produced with [s]instead of the (unexpected) [z] sewer which the modelproduced with R instead of only R and tumbler where themodel used syllabic [l] instead of nonsyllabic [l] given in thecelex target form

53 Weak Links in the Triphone Graph and Delays in SpeechProduction An important property of the model is that thesupport for trigrams changes where a wordrsquos graph branchesout for different morphological variants This is illustratedin Figure 9 for the inflected form blending Support for thestem-final triphone End is still at 1 but then blend can endor continue as blends blended or blending The resultinguncertainty is reflected in the weights on the edges leavingEnd In this example the targeted ing form is driven by theinflectional semantic vector for continuous and hence theedge to ndI is best supported For other forms the edgeweights will be different and hence other paths will be bettersupported

24 Complexity

1

1

1051

024023

03

023

1

001 001

1

02

03

023

002

1 0051

bl

blE

lEn

EndndI

dIN

IN

INI

NIN

dId

Id

IdI

dII

IIN

ndz

dz

dzI

zIN

nd

EnI

nIN

lEI

EIN

Figure 9 The directed graph for blending Vertices represent triphones including triphones such as IIN that were posited by the graphalgorithm to bridge potentially relevant transitions that are not instantiated in the training data Edges are labelled with their edge weightsThe graph the edge weights of which are specific to the lexomes blend and continuous incorporates not only the path for blending butalso the paths for blend blends and blended

The uncertainty that arises at branching points in thetriphone graph is of interest in the light of several exper-imental results For instance inter keystroke intervals intyping become longer at syllable and morph boundaries[129 130] Evidence from articulography suggests variabilityis greater at morph boundaries [131] Longer keystroke exe-cution times and greater articulatory variability are exactlywhat is expected under reduced edge support We thereforeexamined whether the edge weights at the first branchingpoint are predictive for lexical processing To this end weinvestigated the acoustic duration of the segment at the centerof the first triphone with a reduced LDL edge weight (in thepresent example d in ndI henceforth lsquobranching segmentrsquo)for those words in our study that are attested in the Buckeyecorpus [110]

The dataset we extracted from the Buckeye corpus com-prised 15105 tokens of a total of 1327 word types collectedfrom 40 speakers For each of these words we calculatedthe relative duration of the branching segment calculatedby dividing segment duration by word duration For thepurposes of statistical evaluation the distribution of thisrelative duration was brought closer to normality bymeans ofa logarithmic transformationWe fitted a generalized additivemixed model [132] to log relative duration with randomintercepts for word and speaker log LDL edge weight aspredictor of interest and local speech rate (Phrase Rate)log neighborhood density (Ncount) and log word frequencyas control variables A summary of this model is presentedin Table 7 As expected an increase in LDL Edge Weight

goes hand in hand with a reduction in the duration of thebranching segment In other words when the edge weight isreduced production is slowed

The adverse effects of weaknesses in a wordrsquos path inthe graph where the path branches is of interest againstthe discussion about the function of weak links in diphonetransitions in the literature on comprehension [133ndash138] Forcomprehension it has been argued that bigram or diphonelsquotroughsrsquo ie low transitional probabilities in a chain of hightransitional probabilities provide points where sequencesare segmented and parsed into their constituents Fromthe perspective of discrimination learning however low-probability transitions function in exactly the opposite wayfor comprehension [54 124] Naive discriminative learningalso predicts that troughs should give rise to shorter process-ing times in comprehension but not because morphologicaldecomposition would proceed more effectively Since high-frequency boundary bigrams and diphones are typically usedword-internally across many words they have a low cuevalidity for thesewords Conversely low-frequency boundarybigrams are much more typical for very specific base+affixcombinations and hence are better discriminative cues thatafford enhanced activation of wordsrsquo lexomes This in turngives rise to faster processing (see also Ramscar et al [78] forthe discriminative function of low-frequency bigrams in low-frequency monolexomic words)

There is remarkable convergence between the directedgraphs for speech production such as illustrated in Figure 9and computational models using temporal self-organizing

Complexity 25

Table 7 Statistics for the partial effects of a generalized additive mixed model fitted to the relative duration of edge segments in the Buckeyecorpus The trend for log frequency is positive accelerating TPRS thin plate regression spline

A parametric coefficients Estimate Std Error t-value p-valueIntercept -19165 00592 -323908 lt 00001Log LDL Edge Weight -01358 00400 -33960 00007Phrase Rate 00047 00021 22449 00248Log Ncount 01114 00189 58837 lt 00001B smooth terms edf Refdf F-value p-valueTPRS log Frequency 19007 19050 75495 00004random intercepts word 11498839 13230000 314919 lt 00001random intercepts speaker 302995 390000 345579 lt 00001

maps (TSOMs [139ndash141]) TSOMs also use error-drivenlearning and in recent work attention is also drawn toweak edges in wordsrsquo paths in the TSOM and the conse-quences thereof for lexical processing [142] We are not usingTSOMs however one reason being that in our experiencethey do not scale up well to realistically sized lexiconsA second reason is that we are pursuing the hypothesisthat form serves meaning and that self-organization ofform by itself is not necessary This hypothesis is likelytoo strong especially as we do not provide a rationale forthe trigram and triphone units that we use to representaspects of form It is worth noting that spatial organizationof form features is not restricted to TSOMs Although inour approach the spatial organization of triphone features isleft unspecified such an organization can be easily enforced(see [85] for further details) by using an algorithm suchas graphopt which self-organizes the vertices of a graphinto a two-dimensional plane Figure 9 was obtained withthis algorithm (Graphopt was developed for the layout oflarge graphs httpwwwschmuhlorggraphopt andis implemented in the igraph package Graphopt uses basicprinciples of physics to iteratively determine an optimallayout Each node in the graph is given both mass and anelectric charge and edges between nodes are modeled asspringsThis sets up a system inwhich there are attracting andrepelling forces between the vertices of the graph and thisphysical system is simulated until it reaches an equilibrium)

54 Discussion Our production network reconstructsknown wordsrsquo forms with a high accuracy 999 formonolexomic words 92 for inflected words and 99 forderived words For novel complex words accuracy under10-fold cross-validation was 62 for inflected words and 75for derived words

The drop in performance for novel forms is perhapsunsurprising given that speakers understand many morewords than they themselves produce even though they hearnovel forms on a fairly regular basis as they proceed throughlife [79 143]

However we also encountered several technical problemsthat are not due to the algorithm but to the representationsthat the algorithm has had to work with First it is surprisingthat accuracy is as high as it is given that the semanticvectors are constructed from a small corpus with a very

simple discriminative algorithm Second we encounteredinconsistencies in the phonological forms retrieved from thecelex database inconsistencies that in part are due to theuse of discrete triphones Third many cases where the modelpredictions do not match the targeted triphone sequence thetargeted forms have a minor irregularity (eg resound with[z] instead of [s]) Fourth several of the typical errors that themodel makes are known kinds of speech errors or reducedforms that one might encounter in engaged conversationalspeech

It is noteworthy that themodel is almost completely data-driven The triphones are derived from celex and otherthan somemanual corrections for inconsistencies are derivedautomatically from wordsrsquo phone sequences The semanticvectors are based on the tasa corpus and were not in any wayoptimized for the production model Given the triphone andsemantic vectors the transformation matrix is completelydetermined No by-hand engineering of rules and exceptionsis required nor is it necessary to specify with hand-codedlinks what the first segment of a word is what its secondsegment is etc as in the weaver model [37] It is only in theheuristic graph algorithm that three thresholds are requiredin order to avoid that graphs become too large to remaincomputationally tractable

The speech production system that emerges from thisapproach comprises first of all an excellent memory forforms that have been encountered before Importantly thismemory is not a static memory but a dynamic one It is not arepository of stored forms but it reconstructs the forms fromthe semantics it is requested to encode For regular unseenforms however a second network is required that projects aregularized semantic space (obtained by accumulation of thesemantic vectors of content and inflectional or derivationalfunctions) onto the triphone output space Importantly itappears that no separate transformations or rules are requiredfor individual inflectional or derivational functions

Jointly the two networks both of which can also betrained incrementally using the learning rule ofWidrow-Hoff[44] define a dual route model attempts to build a singleintegrated network were not successfulThe semantic vectorsof derived words are too idiosyncratic to allow generalizationfor novel forms It is the property of the semantic vectorsof derived lexomes described in Section 323 namely thatthey are close to but not inside the cloud of their content

26 Complexity

lexomes which makes them productive Novel forms do notpartake in the idiosyncracies of lexicalized existing wordsbut instead remain bound to base and derivational lexomesIt is exactly this property that makes them pronounceableOnce produced a novel form will then gravitate as experi-ence accumulates towards the cloud of semantic vectors ofits morphological category meanwhile also developing itsown idiosyncracies in pronunciation including sometimeshighly reduced pronunciation variants (eg tyk for Dutchnatuurlijk ([natyrlk]) [111 144 145] It is perhaps possibleto merge the two routes into one but for this much morefine-grained semantic vectors are required that incorporateinformation about discourse and the speakerrsquos stance withrespect to the adressee (see eg [115] for detailed discussionof such factors)

6 Bringing in Time

Thus far we have not considered the role of time The audiosignal comes in over time longerwords are typically readwithmore than one fixation and likewise articulation is a temporalprocess In this section we briefly outline how time can bebrought into the model We do so by discussing the readingof proper names

Proper names (morphologically compounds) pose achallenge to compositional theories as the people referredto by names such as Richard Dawkins Richard NixonRichardThompson and SandraThompson are not obviouslysemantic composites of each other We therefore assignlexomes to names irrespective of whether the individu-als referred to are real or fictive alive or dead Further-more we assume that when the personal name and familyname receive their own fixations both names are under-stood as pertaining to the same named entity which istherefore coupled with its own unique lexome Thus forthe example sentences in Table 8 the letter trigrams ofJohn are paired with the lexome JohnClark and like-wise the letter trigrams of Clark are paired with this lex-ome

By way of illustration we obtained thematrix of semanticvectors training on 100 randomly ordered tokens of thesentences of Table 8 We then constructed a trigram cuematrix C specifying for each word (John Clark wrote )which trigrams it contains In parallel a matrix L specify-ing for each word its corresponding lexomes (JohnClarkJohnClark write ) was set up We then calculatedthe matrix F by solving CF = L and used F to cal-culate estimated (predicted) semantic vectors L Figure 10presents the correlations of the estimated semantic vectorsfor the word forms John John Welsch Janet and Clarkwith the targeted semantic vectors for the named entitiesJohnClark JohnWelsch JohnWiggam AnneHastieand JaneClark For the sequentially read words John andWelsch the semantic vector generated by John and thatgenerated by Welsch were summed to obtain the integratedsemantic vector for the composite name

Figure 10 upper left panel illustrates that upon readingthe personal name John there is considerable uncertainty

about which named entity is at issue When subsequentlythe family name is read (upper right panel) uncertainty isreduced and John Welsch now receives full support whereasother named entities have correlations close to zero or evennegative correlations As in the small world of the presenttoy example Janet is a unique personal name there is nouncertainty about what named entity is at issue when Janetis read For the family name Clark on its own by contrastJohn Clark and Janet Clark are both viable options

For the present example we assumed that every ortho-graphic word received one fixation However depending onwhether the eye lands far enough into the word namessuch as John Clark and compounds such as graph theorycan also be read with a single fixation For single-fixationreading learning events should comprise the joint trigrams ofthe two orthographic words which are then simultaneouslycalibrated against the targeted lexome (JohnClark graph-theory)

Obviously the true complexities of reading such asparafoveal preview are not captured by this example Never-theless as a rough first approximation this example showsa way forward for integrating time into the discriminativelexicon

7 General Discussion

We have presented a unified discrimination-driven modelfor visual and auditory word comprehension and wordproduction the discriminative lexicon The layout of thismodel is visualized in Figure 11 Input (cochlea and retina)and output (typing and speaking) systems are presented inlight gray the representations of themodel are shown in darkgray For auditory comprehension we make use of frequencyband summary (FBS) features low-level cues that we deriveautomatically from the speech signal The FBS features serveas input to the auditory network F119886 that maps vectors of FBSfeatures onto the semantic vectors of S For reading lettertrigrams represent the visual input at a level of granularitythat is optimal for a functional model of the lexicon (see [97]for a discussion of lower-level visual features) Trigrams arerelatively high-level features compared to the FBS featuresFor features implementing visual input at a much lower levelof visual granularity using histograms of oriented gradients[146] see Linke et al [15] Letter trigrams are mapped by thevisual network F119900 onto S

For reading we also implemented a network heredenoted by K119886 that maps trigrams onto auditory targets(auditory verbal images) represented by triphones Theseauditory triphones in turn are mapped by network H119886onto the semantic vectors S This network is motivated notonly by our observation that for predicting visual lexicaldecision latencies triphones outperform trigrams by a widemargin Network H119886 is also part of the control loop forspeech production see Hickok [147] for discussion of thiscontrol loop and the necessity of auditory targets in speechproduction Furthermore the acoustic durations with whichEnglish speakers realize the stem vowels of English verbs[148] as well as the duration of word final s in English [149]

Complexity 27

Table 8 Example sentences for the reading of proper names To keep the example simple inflectional lexomes are not taken into account

John Clark wrote great books about antsJohn Clark published a great book about dangerous antsJohn Welsch makes great photographs of airplanesJohn Welsch has a collection of great photographs of airplanes landingJohn Wiggam will build great harpsichordsJohn Wiggam will be a great expert in tuning harpsichordsAnne Hastie has taught statistics to great second year studentsAnne Hastie has taught probability to great first year studentsJanet Clark teaches graph theory to second year studentsJohnClark write great book about antJohnClark publish a great book about dangerous antJohnWelsch make great photograph of airplaneJohnWelsch have a collection of great photograph airplane landingJohnWiggam future build great harpsichordJohnWiggam future be a great expert in tuning harpsichordAnneHastie have teach statistics to great second year studentAnneHastie have teach probability to great first year studentJanetClark teach graphtheory to second year student

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

John

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

John Welsch

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

Janet

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

Clark

Pear

son

corr

elat

ion

minus0

50

00

51

0

Figure 10 Correlations of estimated semantic vectors for John JohnWelsch Janet andClarkwith the targeted semantic vectors of JohnClarkJohnWelsch JohnWiggam AnneHastie and JaneClark

can be predicted from ndl networks using auditory targets todiscriminate between lexomes [150]

Speech production is modeled by network G119886 whichmaps semantic vectors onto auditory targets which in turnserve as input for articulation Although current modelsof speech production assemble articulatory targets fromphonemes (see eg [126 147]) a possibility that is certainlycompatible with our model when phones are recontextual-ized as triphones we think that it is worthwhile investigatingwhether a network mapping semantic vectors onto vectors ofarticulatory parameters that in turn drive the articulators canbe made to work especially since many reduced forms arenot straightforwardly derivable from the phone sequences oftheir lsquocanonicalrsquo forms [111 144]

We have shown that the networks of our model havereasonable to high production and recognition accuraciesand also generate a wealth of well-supported predictions forlexical processing Central to all networks is the hypothesisthat the relation between form and meaning can be modeleddiscriminatively with large but otherwise surprisingly simplelinear networks the underlyingmathematics of which (linearalgebra) is well-understood Given the present results it isclear that the potential of linear algebra for understandingthe lexicon and lexical processing has thus far been severelyunderestimated (the model as outlined in Figure 11 is amodular one for reasons of computational convenience andinterpretabilityThe different networks probably interact (seefor some evidence [47]) One such interaction is explored

28 Complexity

typing speaking

orthographic targets

trigrams

auditory targets

triphones

semantic vectors

TASA

auditory cues

FBS features

orthographic cues

trigrams

cochlea retina

To Ta

Ha

GoGa

KaS

Fa Fo

CoCa

Figure 11 Overview of the discriminative lexicon Input and output systems are presented in light gray the representations of the model areshown in dark gray Each of these representations is a matrix with its own set of features Arcs are labeled with the discriminative networks(transformationmatrices) thatmapmatrices onto each otherThe arc from semantic vectors to orthographic vectors is in gray as thismappingis not explored in the present study

in Baayen et al [85] In their production model for Latininflection feedback from the projected triphone paths backto the semantics (synthesis by analysis) is allowed so thatof otherwise equally well-supported paths the path that bestexpresses the targeted semantics is selected For informationflows between systems in speech production see Hickok[147]) In what follows we touch upon a series of issues thatare relevant for evaluating our model

Incremental learning In the present study we esti-mated the weights on the connections of these networkswith matrix operations but importantly these weights canalso be estimated incrementally using the learning rule ofWidrow and Hoff [44] further improvements in accuracyare expected when using the Kalman filter [151] As allmatrices can be updated incrementally (and this holds aswell for the matrix with semantic vectors which are alsotime-variant) and in theory should be updated incrementally

whenever information about the order of learning events isavailable the present theory has potential for studying lexicalacquisition and the continuing development of the lexiconover the lifetime [79]

Morphology without compositional operations Wehave shown that our networks are predictive for a rangeof experimental findings Important from the perspective ofdiscriminative linguistics is that there are no compositionaloperations in the sense of Frege and Russell [1 2 152] Thework on compositionality in logic has deeply influencedformal linguistics (eg [3 153]) and has led to the beliefthat the ldquoarchitecture of the language facultyrdquo is groundedin a homomorphism between a calculus (or algebra) ofsyntactic representations and a calculus (or algebra) based onsemantic primitives Within this tradition compositionalityarises when rules combining representations of form arematched with rules combining representations of meaning

Complexity 29

The approach of Marelli and Baroni [60] who derive thesemantic vector of a complex word from the semantic vectorof its base and a dedicated affix-specific mapping from thepertinent base words to the corresponding derived wordsis in the spirit of compositionality in that a change in form(through affixation) goes hand in hand with a change inmeaning (modeled with a linear mapping in semantic space)

The perspective on morphological processing developedin the present study breaks with this tradition Althoughsemantic vectors for inflected words are obtained by summa-tion of the semantic vectors of the base word and those ofits inflectional functions (see [85] for modeling of the richverbal paradigms of Latin) the form vectors correspondingto the resulting semantic vectors are not obtained by thesummation of vectors for chunks of form nor by any otheroperations on forms Attempts to align chunks of wordsrsquo formwith their semantic structures mdash typically represented bygraphs constructed over semantic primitives (see eg [154])mdash are destined to end in quagmires of arbitrary decisionsabout the number of inflectional classes required (Estonian30 or 300) what the chunks should look like (does Germanhave a morpheme d in its articles and demonstratives derdie das diese ) whether Semitic morphology requiresoperations on form on one or two tiers [155ndash157] andhow many affixal position slots are required for Athebascanlanguages such asNavajo [158] A language that is particularlypainful for compositional theories is Yelı Dnye an Indo-Pacific language spoken on Rossel island (in the south-east ofPapuaNewGuinea)The language has a substantial inventoryof over a 1000 function words used for verb agreementTypically these words are monosyllables that simultaneouslyspecify values for negation tense aspect person and numberof subject deixis evidentiality associated motion and coun-terfactuality Furthermore verbs typically have no less thaneight different stems the use of which depends on particularconstellations of inflectional functions [159 160] Yelı Dnyethus provides a striking counterexample to the hypothesisthat a calculus of meaning would be paralleled by a calculusof form

Whereas the theory of linear discriminative learningradically rejects the idea that morphological processing isgrounded in compositionality as defined in mainstream for-mal linguistic theory it does allow for the conjoint expressionin form of lexical meanings and inflectional functions andfor forms to map onto such conjoint semantic vectors Butjust asmultiple inflectional functions can be expressed jointlyin a single word multiple words can map onto nonconjointsemantic vectors as was illustrated above for named entityrecognition John Wiggam can be understood to refer to avery specific person without this person being a composi-tional function of the proper name John and the surnameWiggam In both these cases there is no isomorphy betweeninflectional functions and lexical meanings on the one handand chunks of form on the other hand

Of course conjoint vectors for lexical meanings andinflectional functions can be implemented andmade to workonly because a Latin verb form such as sapıvissemus isunderstood when building the semantic space S to expresslexomes for person (first) number (plural) voice (active)

tense (pluperfect) mood (subjunctive) and lexical meaning(to know) (see [85] for computational modeling details)Although this could loosely be described as a decomposi-tional approach to inflected words we prefer not to use theterm lsquodecompositionalrsquo as in formal linguistics this termas outlined above has a very different meaning A moreadequate terminology would be that the system is conjointrealizational for production and conjoint inferential for com-prehension (we note here that we have focused on temporallyconjoint realization and inference leaving temporally disjointrealization and inference for sequences of words and possiblycompounds and fixed expression for further research)

Naive versus linear discriminative learning There isone important technical difference between the learningengine of the current study ldl and naive discriminativelearning (ndl) Because ndl makes the simplifying assump-tion that outcomes are orthogonal the weights from the cuesto a given outcome are independent of the weights from thecues to the other outcomes This independence assumptionwhich motivates the word naive in naive discriminativelearning makes it possible to very efficiently update networkweights during incremental learning In ldl by contrast thisindependence no longer holds learning is no longer naiveWe therefore refer to our networks not as ndl networksbut simply as linear discriminative learning networks Actualincremental updating of ldl networks is computationallymore costly but further numerical optimization is possibleand implementations are underway

Scaling An important property of the present approachis that good results are obtained already for datasets ofmodestsize Our semantic matrix is derived from the tasa corpuswhich with only 10 million words is dwarfed by the 28 billionwords of the ukWaC corpus used by Marelli and Baroni [60]more words than any individual speaker can ever encounterin their lifetime Similarly Arnold et al [18] report goodresults for word identification when models are trained on20 hours of speech which contrasts with the huge volumesof speech required for training deep learning networks forspeech recognition Although performance increases some-whatwithmore training data [161] it is clear that considerableheadway can be made with relatively moderate volumesof speech It thus becomes feasible to train models onfor instance Australian English and New Zealand Englishusing already existing speech corpora and to implementnetworks that make precise quantitative predictions for howunderstanding is mediated by listenersrsquo expectations abouttheir interlocutors (see eg [162])

In this study we have obtained good results with asemantic matrix with a dimensionality of around 4000lexomes By itself we find it surprising that a semantic vectorspace can be scaffolded with a mere 4000 words But thisfindingmay also shed light on the phenomenon of childhoodamnesia [163 164] Bauer and Larkina [165] pointed outthat young children have autobiographical memories thatthey may not remember a few years later They argue thatthe rate of forgetting diminishes as we grow older andstabilizes in adulthood Part of this process of forgettingmay relate to the changes in the semantic matrix Cruciallywe expect the correlational structure of lexomes to change

30 Complexity

considerably over time as experience accumulates If this isindeed the case the semantic vectors for the lexomes foryoung children are different from the corresponding lexomesfor older children which will again differ from those ofadults As a consequence autobiographical memories thatwere anchored to the lexomes at a young age can no longerbe accessed as the semantic vectors to which these memorieswere originally anchored no longer exist

No examplars Given how we estimate the transforma-tion matrices with which we navigate between form andmeaning it might seem that our approach is in essence animplementation of an exemplar-based theory of the mentallexicon After all for instance the C119886 matrix specifies foreach auditory word form (exemplar) its pertinent frequencyband summary features However the computational imple-mentation of the present study should not be confused withthe underlying algorithmic conceptualization The underly-ing conceptualization is that learning proceeds incrementallytrial by trial with the weights of transformation networksbeing updated with the learning rule of Widrow and Hoff[44] In the mean trial-by-trial updating with the Widrow-Hoff learning rule results in the same expected connectionweights as obtained by the present estimates using thegeneralized inverse The important conceptual advantage ofincremental learning is however that there is no need to storeexemplars By way of example for auditory comprehensionacoustic tokens are given with the input to the learningsystemThese tokens the ephemeral result of interactingwiththe environment leave their traces in the transformationmatrix F119886 but are not themselves stored The same holdsfor speech production We assume that speakers dynamicallyconstruct the semantic vectors from their past experiencesrather than retrieving these semantic vectors from somedictionary-like fixed representation familiar from currentfile systems The dynamic nature of these semantic vectorsemerged loud and clear from the modeling of novel complexwords for speech production under cross-validation In otherwords our hypothesis is that all input and output vectors areephemeral in the sense that they represent temporary statesof the cognitive system that are not themselves represented inthat system but that leave their traces in the transformationmatrices that jointly characterize and define the cognitivesystem that we approximate with the discriminative lexicon

This dynamic perspective on the mental lexicon asconsisting of several coupled networks that jointly bringthe cognitive system into states that in turn feed into fur-ther networks (not modeled here) for sentence integration(comprehension) and articulation (production) raises thequestion about the status of units in this theory Units playan important role when constructing numeric vectors forform and meaning Units for letter trigrams and phonetrigrams are of course context-enriched letter and phoneunits Content lexomes as well as lexomes for derivationaland inflectional functions are crutches for central semanticfunctions ranging from functions realizing relatively concreteonomasiological functions (lsquothis is a dog not a catrsquo) toabstract situational functions allowing entities events andstates to be specified on the dimensions of time spacequantity aspect etc However crucial to the central thrust of

the present study there are no morphological symbols (unitscombining form andmeaning) normorphological formunitssuch as stems and exponents of any kind in the modelAbove we have outlined the many linguistic considerationsthat have led us not to want to incorporate such units aspart of our theory Importantly as there are no hidden layersin our model it is not possible to seek for lsquosubsymbolicrsquoreflexes of morphological units in patterns of activation overhidden layers Here we depart from earlier connectionistmodels which rejected symbolic representations but retainedthe belief that there should be morphological representationalbeit subsymbolic ones Seidenberg and Gonnerman [166]for instance discuss morphological representations as beinglsquointerlevel representationsrsquo that are emergent reflections ofcorrelations among orthography phonology and semanticsThis is not to say that themore agglutinative a language is themore the present model will seem to the outside observer tobe operating withmorphemes But themore a language tendstowards fusional or polysyntheticmorphology the less strongthis impression will be

Model complexity Next consider the question of howdifficult lexical processing actually is Sidestepping the com-plexities of the neural embedding of lexical processing [126147] here we narrow down this question to algorithmiccomplexity State-of-the-art models in psychology [37 114167] implementmany layers of hierarchically organized unitsand many hold it for an established fact that such units arein some sense psychologically real (see eg [19 91 168])However empirical results can be equally well understoodwithout requiring the theoretical construct of the morpheme(see eg [47 54 58 166 169 170]) The present studyillustrates this point for lsquoboundaryrsquo effects in written andoral production When paths in the triphone graph branchuncertainty increases and processing slows down Althoughobserved lsquoboundaryrsquo effects are compatible with a post-Bloomfieldian lexicon the very same effects arise in ourmodel even though morphemes do not figure in any way inour computations

Importantly the simplicity of the linear mappings in ourmodel of the discriminative lexicon allows us to sidestep theproblem that typically arise in computationally implementedfull-scale hierarchical systems namely that errors arisingat lower levels propagate to higher levels The major stepsforward made in recent end-to-end models in languageengineering suggest that end-to-end cognitive models arealso likely to perform much better One might argue thatthe hidden layers of deep learning networks represent thetraditional lsquohidden layersrsquo of phonemes morphemes andword forms mediating between form and meaning in lin-guistic models and offshoots thereof in psychology Howeverthe case of baboon lexical learning [12 13] illustrates thatthis research strategy is not without risk and that simplerdiscriminative networks can substantially outperform deeplearning networks [15] We grant however that it is conceiv-able that model predictions will improve when the presentlinear networks are replaced by deep learning networks whenpresented with the same input and output representationsused here (see Zhao et al [171] for a discussion of lossfunctions for deep learning targeting numeric instead of

Complexity 31

categorical output vectors) What is unclear is whethersuch networks will improve our understandingmdashthe currentlinear networks offer the analyst connection weights betweeninput and output representations that are straightforwardlylinguistically interpretable Where we think deep learningnetworks really will come into their own is bridging the gapbetween for instance the cochlea and retina on the one handand the heuristic representations (letter trigrams) that wehave used as a starting point for our functional theory of themappings between form and meaning

Network flexibility An important challenge for deeplearning networks designed to model human lexical process-ing is to keep the networks flexible and open to rapid updat-ing This openness to continuous learning is required notonly by findings on categorization [40 41 41 42] includingphonetic categorization [172 173] but is also visible in timeseries of reaction times Baayen and Hendrix [17] reportedimproved prediction of visual lexical decision reaction timesin the British Lexicon Project when a naive discriminativelearning network is updated from trial to trial simulating thewithin-experiment learning that goes on as subjects proceedthrough the experiment In natural speech the phenomenonof phonetic convergence between interlocutors [174] likewisebears witness to a lexicon that is constantly recalibratedin interaction with its environment (for the many ways inwhich interlocutorsrsquo speech aligns see [175]) Likewise therapidity with which listeners adapt to foreign accents [176]within a few minutes shows that lexical networks exhibitfast local optimization As deep learning networks typicallyrequire huge numbers of cycles through the training dataonce trained they are at risk of not having the requiredflexibility for fast local adaptation This risk is reduced for(incremental) wide learning as illustrated by the phoneticconvergence evidenced by the model of Arnold et al [18] forauditory comprehension (a very similar point was made byLevelt [177] in his critique of connectionist models in thenineties)

Open questions This initial study necessarily leavesmany questions unanswered Compounds and inflectedderived words have not been addressed and morphologicaloperations such as reduplication infixation and nonconcate-native morphology provide further testing grounds for thepresent approach We also need to address the processingof compounds with multiple constituents ubiquitous inlanguages as different as German Mandarin and EstonianFurthermore in actual utterances words undergo assimila-tion at their boundaries and hence a crucial test case forthe discriminative lexicon is to model the production andcomprehension of words in multi-word utterances For ourproduction model we have also completely ignored stresssyllabification and pitch which however likely has renderedthe production task more difficult than necessary

A challenge for discriminative morphology is the devel-opment of proper semantic matrices For languages withrich morphology such as Estonian current morphologicalparsers will identify case-inflected words as nominativesgenitives or partitives (among others) but these labelsfor inflectional forms do not correspond to the semanticfunctions encoded in these forms (see eg [178 179]) What

is required are computational tools that detect the appropriateinflectional lexomes for these words in the sentences orutterances in which they are embedded

A related problem specifically for the speech productionmodel is how to predict the strongly reduced word formsthat are rampant in conversational speech [111] Our auditorycomprehension network F119886 is confronted with reduced formsin training and the study of Arnold et al [18] indicates thatpresent identification accuracymay approximate human per-formance for single-word recognition Whereas for auditorycomprehension we are making progress what is required forspeech production is a much more detailed understanding ofthe circumstances under which speakers actually use reducedword forms The factors driving reduction may be in partcaptured by the semantic vectors but we anticipate thataspects of the unfolding discourse play an important role aswell which brings us back to the unresolved question of howto best model not isolated words but words in context

An important challenge for our theory which formalizesincremental implicit learning is how to address and modelthe wealth of explicit knowledge that speakers have abouttheir language We play with the way words sound in poetrywe are perplexed by wordsrsquo meanings and reinterpret themso that they make more sense (eg the folk etymology [180]reflected in modern English crawfish a crustacean and nota fish originally Middle English crevis compare ecrevissein French) we teach morphology in schools and we evenfind that making morphological relations explicit may bebeneficial for aphasic patients [181] The rich culture of worduse cannot but interact with the implicit system but howexactly this interaction unfolds andwhat its consequences areat both sides of the consciousunconscious divide is presentlyunclear

Although many questions remain our model is remark-ably successful in capturing many aspects of lexical andmorphological processing in both comprehension and pro-duction Since themodel builds on a combination of linguisti-callymotivated lsquosmartrsquo low-level representations for form andmeaning and large but otherwise straightforward andmathe-matically well-understood two-layer linear networks (to beclear networks without any hidden layers) the conclusionseems justified that algorithmically the ldquomental lexiconrdquomaybe much simpler than previously thought

Appendix

A Vectors Matrices and MatrixMultiplication

Figure 12 presents two data points 119886 and 119887 with coordinates(3 4) and (-2 -3) Arrows drawn from the origin to thesepoints are shown in blue with a solid and dashed linerespectively We refer to these points as vectors Vectors aredenoted with lower case letters in bold and we place the 119909coordinate of a point next to the 119910 coordinate

a = (3 4) b = (minus2 minus3) (A1)

32 Complexity

minus4 minus2 0 2 4

minus4

minus2

02

4

x

y

a

b

x

y

Figure 12 Points 119886 and 119887 can be transformed into points 119901 and 119902 bya linear transformation

We can represent a and b jointly by placing them next to eachother in a matrix For matrices we use capital letters in boldfont

A = ( 3 4minus2 minus3) (A2)

In addition to the points 119886 and 119887 Figure 12 also shows twoother datapoints 119909 and 119910 The matrix for these two points is

B = (minus5 24 minus1) (A3)

Let us assume that points 119886 and 119887 represent the forms oftwo words 1205961 and 1205962 and that the points 119909 and 119910 representthe meanings of these two words (below we discuss in detailhow exactly wordsrsquo forms and meanings can be representedas vectors of numbers) As morphology is the study of therelation between wordsrsquo forms and their meanings we areinterested in how to transform points 119886 and b into points119909 and 119910 and likewise in how to transform points 119909 and119910 back into points 119886 and 119887 The first transformation isrequired for comprehension and the second transformationfor production

We begin with the transformation for comprehensionFormally we have matrix A which we want to transforminto matrix B A way of mapping A onto B is by means ofa linear transformation using matrix multiplication For 2 times2 matrices matrix multiplication is defined as follows(1198861 11988621198871 1198872) (1199091 11990921199101 1199102) = (11988611199091 + 11988621199101 11988611199092 + 1198862119910211988711199091 + 11988721199101 11988711199092 + 11988721199102) (A4)

Such a mapping of A onto B is given by a matrix F

F = ( 1 2minus2 minus1) (A5)

and it is straightforward to verify that indeed( 3 4minus2 minus3) ( 1 2minus2 minus1) = (minus5 24 minus1) (A6)

Using matrix notation we can write

AF = B (A7)

Given A and B how do we obtain F The answeris straightforward but we need two further concepts theidentity matrix and the matrix inverse For multiplication ofnumbers the identity multiplication is multiplication with 1For matrices multiplication with the identity matrix I leavesthe locations of the points unchangedThe identitymatrix hasones on the main diagonal and zeros elsewhere It is easilyverified that indeed(1 00 1) (1199091 11990921199101 1199102) = (1199091 11990921199101 1199102) (1 00 1) = (1199091 11990921199101 1199102) (A8)

For numbers the inverse of multiplication with 119904 is dividingby 119904 (1119904 ) (119904119909) = 119909 (A9)

For matrices the inverse of a square matrix X is that matrixXminus1 such that their product is the identity matrix

Xminus1X = XXminus1 = I (A10)

For nonsquare matrices the inverse Yminus1 is defined such that

Yminus1 (YX) = (XY) 119884minus1 = X (A11)

We find the square matrix F that maps the square matrices Aonto B as follows

AF = B

Aminus1AF = Aminus1B

IF = Aminus1B

F = Aminus1B(A12)

Since for the present example A and its inverse happen to beidentical (Aminus1 = A) we obtain (A6)

Fmaps wordsrsquo form vectors onto wordsrsquo semantic vectorsLet us now consider the reverse and see how we can obtainwordsrsquo form vectors from wordsrsquo semantic vectors That isgiven B we want to find that matrix G which maps B ontoA ie

BG = A (A13)

Solving this equation proceeds exactly as above

BG = A

Bminus1BG = Bminus1A

IG = Bminus1A

G = Bminus1A(A14)

Complexity 33

The inverse of B is

Bminus1 = (13 2343 53) (A15)

and hence we have

(13 2343 53) ( 3 4minus2 minus3) = (minus13 minus2323 13 ) = G (A16)

The inverse of a matrix needs not exist A square matrixthat does not have an inverse is referred to as a singularmatrix Almost all matrices with which we work in theremainder of this study are singular For singular matricesan approximation of the inverse can be used such as theMoore-Penrose generalized inverse (In R the generalizedinverse is available in theMASS package function ginv Thenumeric library that is used by ginv is LAPACK availableat httpwwwnetliborglapack) In this study wedenote the generalized inverse of a matrix X by X1015840

The examples presented here are restricted to 2 times 2matrices but themathematics generalize tomatrices of largerdimensions When multiplying two matrices X and Y theonly constraint is that the number of columns ofX is the sameas the number of rows ofYThe resultingmatrix has the samenumber of rows as X and the same number of columns as Y

In the present study the rows of matrices represent wordsand columns the features of thesewords For instance the rowvectors of A can represent wordsrsquo form features and the rowvectors of and B the corresponding semantic features Thefirst row vector of A is mapped onto the first row vector ofB as follows (3 4) ( 1 2minus2 minus1) = (minus5 2) (A17)

A transformationmatrix such as F can be represented as atwo-layer networkThe network corresponding to F is shownin Figure 13 When the input vector (3 4) is presented to thenetwork the nodes 1198941 and 1198942 are set to 3 and 4 respectively Toobtain the activations 1199001 = minus5 and 1199002 = 2 for the output vector(minus5 2) we cumulate the evidence at the input weighted bythe connection strengths specified on the edges of the graphThus for 1199001 we obtain the value 3 times 1 + 4 times minus2 = minus5 and for1199002 we have 3 times 2 + 4 times minus1 = 2 In other words the networkmaps the input vector (3 4) onto the output vector (minus5 2)

An important property of linear maps is that they areproductive We can present a novel form vector say

s = (2 2) (A18)

to the network and it will map this vector onto a novelsemantic vector Using matrices this new semantic vector isstraightforwardly calculated(2 2) ( 1 2minus2 minus1) = (minus2 2) (A19)

1 2 minus2 minus1

i1 i2

o1 o2

semantic vectors come out here

form features go in here

connection weightsare the transformationmatrix

Figure 13 The linear network corresponding to the transformationmatrix F There are two input nodes 1198941 and 1198942 (in blue)and twooutput nodes 1199001 and 1199002 in red When the input vector (3 4) ispresented to the network the value at 1199001 is 3 times 1 + 4 times minus2 = minus5and that at 1199002 is 3 times 2 + 4 times minus1 = 2 The network maps the point(3 4) onto the point (minus5 2)B Graph-Based Triphone Sequencing

The hypothesis underlying graph-based triphone sequencingis that in the directed graph that has triphones as verticesand an edge between any two vertices that overlap (iesegments 2 and 3 of triphone 1 are identical to segments 1and 2 of triphone 2) the path from a wordrsquos initial triphone(the triphone starting with ) to a wordrsquos final triphone (thetriphone ending with a ) is the path receiving the bestsupport of all possible paths from the initial to the finaltriphone Unfortunately the directed graph containing allvertices and edges is too large tomake computations tractablein a reasonable amount of computation time The followingalgorithm is a heuristic algorithm that first collects potentiallyrelevant vertices and edges and then calculates all pathsstarting fromany initial vertex using theall simple pathsfunction from the igraph package [127] As a first step allvertices that are supported by the stem and whole wordwith network support exceeding 01 are selected To theresulting set those vertices are added that are supported bythe affix which is conditional on these vertices having anetwork support exceeding 095 A directed graph can nowbe constructed and for each initial vertex all paths startingat these vertices are calculated The subset of those pathsreaching a final vertex is selected and the support for eachpath is calculated As longer paths trivially will have greatersupport the support for a path is weighted for its lengthsimply by dividing the raw support by the path length Pathsare ranked by weighted path support and the path withmaximal support is selected as the acoustic image drivingarticulation

It is possible that no path from an initial vertex to a finalvertex is found due to critical boundary triphones not beinginstantiated in the data on which the model was trainedThis happens when the model is assessed on novel inflected

34 Complexity

nobo

dyso

met

hing

ever

ythi

ngno

thin

gev

eryo

nean

ythi

ngan

yone

som

eone

hers

him his

via

than I

exce

ptou

tside

thei

rsw

hoev

erth

ence

min

ein

side

amon

gst its

thei

rou

rm

yyo

uyo

ur her

she

your

selv

esw

hate

ver us we it

they he

them m

em

yself

him

self

hers

elfou

rselv

esyo

urse

lfits

elfth

emse

lves

abou

tw

ithou

tw

ithin

beyo

ndam

ong

betw

een

thro

ugho

utdu

ring

since by

from in of

with at fo

r tobe

low

abov

e up offdo

wn

upon on

agai

nst

tow

ard

tow

ards

arou

ndne

aral

ong

unde

rbe

hind

besid

eth

roug

hon

toin

toov

erac

ross

nobodysomethingeverythingnothingeveryoneanythinganyonesomeonehershimhisviathanIexceptoutsidetheirswhoeverthencemineinsideamongstitstheirourmyyouyourhersheyourselveswhateverusweittheyhethemmemyselfhimselfherselfourselvesyourselfitselfthemselvesaboutwithoutwithinbeyondamongbetweenthroughoutduringsincebyfrominofwithatfortobelowaboveupoffdownupononagainsttowardtowardsaroundnearalongunderbehindbesidethroughontointooveracross

minus01 0 01 02Value

Color Keyand Histogram

050

015

0025

00C

ount

Figure 14 Heatmap for the correlations of the semantic vectors of pronouns and prepositions Note that quantifier pronouns form a groupof their own that prepositions and the other pronouns form distinguisable groups and that within groups further subclusters are visible egforwe and us she and her and across and over All diagonal elements represent correlations equal to 1 (this is not brought out by the heatmapwhich color-codes diagonal elements with the color for the highest correlation of the range we specified 025)

and derived words under cross-validationTherefore verticesand edges are added to the graph for all nonfinal verticesthat are at the end of any of the paths starting at any initialtriphone Such novel vertices and edges are assigned zeronetwork support Typically paths from the initial vertex tothe final vertex can now be constructed and path support isevaluated as above

For an algorithm that allows triphone paths to includecycles which may be the case in languages with much richer

morphology than English see Baayen et al [85] and for codethe R packageWpmWithLdl described therein

C A Heatmap for Function Words(Pronouns and Prepositions)

The heatmap for the correlations of the semantic vectors ofpronouns and prepositions is presented in Figure 14

Complexity 35

Data Availability

With the exception of the data from theDistributed Little RedHen Lab all primary data are publicly available from the citedsources

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The authors are indebted to Geoff Hollis Jessie Nixon ChrisWestbury and LuisMienhardt for their constructive feedbackon earlier versions of this manuscript This research wassupported by an ERC advancedGrant (no 742545) to the firstauthor

References

[1] G Frege ldquoBegriffsschrift a formula language modeled uponthat of arithmetic for pure thoughtrdquo in From Frege to GodelA Source Book in Mathematical Logic 1879-1931 pp 1ndash82 1967

[2] B RussellAn Inquiry intoMeaning andTruth Allen andUnwinLondon UK 1942

[3] R Montague ldquoThe proper treatment of quantification in ordi-nary englishrdquo in Approaches to Natural Language pp 221ndash242Springer 1973

[4] N Chomsky Syntactic Structures vol 33 Mouton The Hague1957

[5] W J M Levelt Formal Grammars in Linguistics And Psycholin-guistics Volume 1 An Introduction to the Theory of FormalLanguages and Automata Volume 2 Applications in LinguisticTheory Volume 3 Psycholinguistic Applications John BenjaminsPublishing 2008

[6] D F Kleinschmidt and T Florian Jaeger ldquoRobust speechperception Recognize the familiar generalize to the similar andadapt to the novelrdquo Psychological Review vol 122 no 2 pp 148ndash203 2015

[7] L Breiman ldquoRandom forestsrdquo Machine Learning vol 45 no 1pp 5ndash32 2001

[8] C Strobl A-L Boulesteix A Zeileis and T Hothorn ldquoBiasin random forest variable importance measures illustrationssources and a solutionrdquo BMC Bioinformatics vol 8 2007

[9] A Hannun C Case J Casper et al ldquoDeep speechScaling up end-to-end speech recognitionrdquo 2014httpsarxivorgabs14125567

[10] Z Tuske P Golik R Schluter and H Ney ldquoAcoustic modelingwith deep neural networks using raw time signal for lvcsrrdquo inProceedings of the INTERSPEECH pp 890ndash894 2014

[11] J Schmidhuber ldquoDeep learning in neural networks anoverviewrdquo Neural Networks vol 61 pp 85ndash117 2015

[12] T Hannagan J C Ziegler S Dufau J Fagot and J GraingerldquoDeep learning of orthographic representations in baboonsrdquoPLoS ONE vol 9 no 1 2014

[13] J Grainger S Dufau M Montant J C Ziegler and J FagotldquoOrthographic processing in baboons (Papio papio)rdquo Sciencevol 336 no 6078 pp 245ndash248 2012

[14] D Scarf K Boy A U Reinert J Devine O Gunturkun andM Colombo ldquoOrthographic processing in pigeons (columba

livia)rdquo Proceedings of the National Academy of Sciences vol 113no 40 pp 11272ndash11276 2016

[15] M Linke F Broker M Ramscar and R H Baayen ldquoArebaboons learning ldquoorthographicrdquo representations Probablynotrdquo PLoS ONE vol 12 no 8 2017

[16] D H Hubel and T N Wiesel ldquoReceptive fields binocularinteraction and functional architecture in the catrsquos visualcortexrdquo The Journal of Physiology vol 160 pp 106ndash154 1962

[17] R H Baayen and P Hendrix ldquoTwo-layer networks non-linear separation and human learningrdquo in From Semantics toDialectometry Festschrift in honor of JohnNerbonne Tributes 32M Wieling M Kroon G Van Noord and G Bouma Eds pp13ndash22 College Publications 2017

[18] D Arnold F Tomaschek K Sering F Lopez and R H BaayenldquoWords from spontaneous conversational speech can be rec-ognized with human-like accuracy by an error-driven learningalgorithm that discriminates between meanings straight fromsmart acoustic features bypassing the phoneme as recognitionunitrdquo PLoS ONE vol 12 no 4 2017

[19] P Zwitserlood ldquoProcessing and representation of morphologi-cal complexity in native language comprehension and produc-tionrdquo in The Construction of Words G E Booij Ed pp 583ndash602 Springer 2018

[20] G Booij ldquoConstruction morphologyrdquo Language and LinguisticsCompass vol 4 no 7 pp 543ndash555 2010

[21] J P Blevins ldquoWord-based morphologyrdquo Journal of Linguisticsvol 42 no 3 pp 531ndash573 2006

[22] G Stump Inflectional Morphology A Theory of ParadigmStructure Cambridge University Press 2001

[23] R Beard ldquoOn the extent and nature of irregularity in thelexiconrdquo Lingua vol 42 no 4 pp 305ndash341 1977

[24] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1974

[25] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1991

[26] C Hockett ldquoThe origin of speechrdquo Scientific American vol 203pp 89ndash97 1960

[27] M Erelt Estonian Language M Erelt Ed Estonian AcademyPublishers Tallinn Estonia 2003

[28] P H Matthews Grammatical Theory in the United States fromBloomfield to Chomsky Cambridge University Press Cam-bridge 1993

[29] N Chomsky andMHalleTheSound Pattern of English Harperand Row NY USA 1968

[30] T Landauer and S Dumais ldquoA solution to Platorsquos problem Thelatent semantic analysis theory of acquisition induction andrepresentation of knowledgerdquo Psychological Review vol 104 no2 pp 211ndash240 1997

[31] W Weaver ldquoTranslationrdquo in Machine Translation of LanguagesFourteen Essays W N Locke and A D Booth Eds pp 15ndash23MIT Press Cambridge 1955

[32] J R Firth Selected Papers of J R Firth 1952-59 IndianaUniversity Press 1968

[33] K Lund and C Burgess ldquoProducing high-dimensional seman-tic spaces from lexical co-occurrencerdquo Behavior Research Meth-ods Instruments and Computers vol 28 no 2 pp 203ndash2081996

[34] C Shaoul and C Westbury ldquoExploring lexical co-occurrencespace using HiDExrdquo Behavior Research Methods vol 42 no 2pp 393ndash413 2010

36 Complexity

[35] T Mikolov I Sutskever K Chen G Corrado and J DeanldquoDistributed representations of words and phrases and theircompositionalityrdquo inAdvances in Neural Information ProcessingSystems pp 3111ndash3119 2013

[36] A Roelofs ldquoThe WEAVER model of word-form encoding inspeech productionrdquo Cognition vol 64 no 3 pp 249ndash284 1997

[37] W J M Levelt A Roelofs and A S Meyer ldquoA theory of lexicalaccess in speech productionrdquoBehavioral and Brain Sciences vol22 no 1 pp 1ndash38 1999

[38] M Taft ldquoInteractive-activation as a Framework for Under-standing Morphological Processingrdquo Language and CognitiveProcesses vol 9 no 3 pp 271ndash294 1994

[39] R Schreuder and R H Baayen ldquoModeling morphologicalprocessingrdquo inMorphological Aspects of Language Processing LB Feldman Ed pp 131ndash154 Lawrence ErlbaumHillsdale NewJersey USA 1995

[40] B C Love D L Medin and T M Gureckis ldquoSUSTAIN ANetwork Model of Category Learningrdquo Psychological Reviewvol 111 no 2 pp 309ndash332 2004

[41] C J Marsolek ldquoWhat antipriming reveals about primingrdquoTrends in Cognitive Sciences vol 12 no 5 pp 176ndash181 2008

[42] M Ramscar and R Port ldquoCategorization (without categories)rdquoin Handbook of Cognitive Linguistics E Dabrowska and DDivjak Eds pp 75ndash99 De Gruyter Berlin Germany 2015

[43] R A Rescorla and A R Wagner ldquoA theory of Pavlovianconditioning Variations in the effectiveness of reinforcementand nonreinforcementrdquo in Classical Conditioning II CurrentResearch and Theory A H Black and W F Prokasy Eds pp64ndash99 Appleton Century Crofts NY USA 1972

[44] B Widrow and M E Hoff ldquoAdaptive switching circuitsrdquo 1960WESCON Convention Record Part IV pp 96ndash104 1960

[45] R H Baayen R Schreuder and R Sproat ldquoMorphology inthe Mental Lexicon A Computational Model for Visual WordRecognitionrdquo in Lexicon Development for Speech and LanguageProcessing F van Eynde and D Gibbon Eds pp 267ndash291Kluwer Academic Publishers 2000

[46] M Taft ldquoA morphological-decomposition model of lexicalrepresentationrdquo Linguistics vol 26 no 4 pp 657ndash667 1988

[47] MWHarmandM S Seidenberg ldquoComputing themeanings ofwords in reading Cooperative division of labor between visualand phonological processesrdquo Psychological Review vol 111 no3 pp 662ndash720 2004

[48] P C Trimmer J M McNamara A Houston and J A RMarshall ldquoDoes natural selection favour the Rescorla-Wagnerrulerdquo Journal of Theoretical Biology vol 302 pp 39ndash52 2012

[49] R R Miller R C Barnet and N J Grahame ldquoAssessment ofthe Rescorla-Wagner modelrdquo Psychological Bulletin vol 117 no3 pp 363ndash386 1995

[50] L J Kamin ldquoPredictability surprise attention and condition-ingrdquo in Punishment and Aversive Behavior B A Campbell andRMChurch Eds pp 276ndash296Appleton-Century-CroftsNYUSA 1969

[51] R A Rescorla ldquoPavlovian conditioning Itrsquos not what you thinkit isrdquo American Psychologist vol 43 no 3 pp 151ndash160 1988

[52] M Ramscar D Yarlett M Dye K Denny and K ThorpeldquoThe Effects of Feature-Label-Order and Their Implications forSymbolic Learningrdquo Cognitive Science vol 34 no 6 pp 909ndash957 2010

[53] M Ramscar and D Yarlett ldquoLinguistic self-correction in theabsence of feedback A new approach to the logical problem oflanguage acquisitionrdquo Cognitive Science vol 31 no 6 pp 927ndash960 2007

[54] R H Baayen P Milin D F ETHurđevic P Hendrix and MMarelli ldquoAn Amorphous Model for Morphological Process-ing in Visual Comprehension Based on Naive DiscriminativeLearningrdquo Psychological Review vol 118 no 3 pp 438ndash4812011

[55] F M Del Prado Martın R Bertram T Haikio R Schreuderand R H Baayen ldquoMorphological family size in a morphologi-cally rich languageThe case of Finnish compared to Dutch andHebrewrdquo Journal of Experimental Psychology LearningMemoryand Cognition vol 30 pp 1271ndash1278 2004

[56] P Milin D F Durdevic and F M Del Prado Martın ldquoThesimultaneous effects of inflectional paradigms and classes onlexical recognition Evidence from Serbianrdquo Journal of Memoryand Language vol 60 no 1 pp 50ndash64 2009

[57] B K Bergen ldquoThe psychological reality of phonaesthemesrdquoLanguage vol 80 no 2 pp 290ndash311 2004

[58] P Milin L B Feldman M Ramscar P Hendrix and R HBaayen ldquoDiscrimination in lexical decisionrdquo PLoS ONE vol 12no 2 2017

[59] R H Baayen P Milin and M Ramscar ldquoFrequency in lexicalprocessingrdquo Aphasiology vol 30 no 11 pp 1174ndash1220 2016

[60] M Marelli and M Baroni ldquoAffixation in semantic space Mod-eling morpheme meanings with compositional distributionalsemanticsrdquo Psychological Review vol 122 no 3 pp 485ndash5152015

[61] T Sering PMilin andRH Baayen ldquoLanguage comprehensionas amultiple label classification problemrdquo StatisticaNeerlandicapp 1ndash15 2018

[62] R Kaye and RWilson Linear Algebra Oxford University Press1998

[63] S H Ivens and B L Koslin Demands for Reading LiteracyRequire New Accountability Methods Touchstone Applied Sci-ence Associates 1991

[64] T K Landauer P W Foltz and D Laham ldquoAn introduction tolatent semantic analysisrdquoDiscourse Processes vol 25 no 2-3 pp259ndash284 1998

[65] H Schmid ldquoImprovements in Part-of-Speech Tagging with anApplication to Germanrdquo in Proceedings of the ACL SIGDAT-Workshop pp 13ndash25 Dublin Ireland 1995

[66] R H Baayen R Piepenbrock and L Gulikers The CELEXLexical Database (CD-ROM) Linguistic Data ConsortiumUniversity of Pennsylvania Philadelphia PA 1995

[67] J Mitchell and M Lapata ldquoVector-based models of semanticcompositionrdquo in Proceedings of the ACL pp 236ndash244 2008

[68] A Lazaridou M Marelli R Zamparelli and M BaronildquoCompositionally derived representations of morphologicallycomplex words in distributional semanticsrdquo in Proceedings ofthe ACL (1) pp 1517ndash1526 2013

[69] R Cotterell H Schutze and J Eisner ldquoMorphological smooth-ing and extrapolation of word embeddingsrdquo in Proceedings ofthe 54th Annual Meeting of the Association for ComputationalLinguistics vol 1 pp 1651ndash1660 2016

[70] B D Zeller S Pado and J Snajder ldquoTowards semantic valida-tion of a derivational lexiconrdquo in Proceedings of the COLING2014 the 25th International Conference on Computational Lin-guistics Technical Papers pp 1728ndash1739 2014

[71] H Marchand The Categories and Types of Present-Day EnglishWord Formation A Synchronic-Diachronic Approach BeckrsquoscheVerlagsbuchhandlung Munchen Germany 1969

[72] L Bauer EnglishWord-Formation Cambridge University PressCambridge 1983

Complexity 37

[73] I PlagWord-Formation in English CambridgeUniversity PressCambridge 2003

[74] R Beard Lexeme-Morpheme Base Morphology A GeneralTheory of Inflection and Word Formation State University ofNew York Press Albany NY USA 1995

[75] K Geeraert J Newman and R H Baayen ldquoIdiom VariationExperimental Data and a Blueprint of a Computational ModelrdquoTopics in Cognitive Science vol 9 no 3 pp 653ndash669 2017

[76] C Shaoul S Bitschau N Schilling et al ldquondl2 Naive discrim-inative learning an implementation in R R packagerdquo 2015

[77] P Milin D Divjak and R H Baayen ldquoA learning perspectiveon individual differences in skilled reading Exploring andexploiting orthographic and semantic discrimination cuesrdquoJournal of Experimental Psychology Learning Memory andCognition vol 43 no 11 pp 1730ndash1751 2017

[78] M Ramscar P Hendrix C Shaoul P Milin and R H BaayenldquoNonlinear dynamics of lifelong learning the myth of cognitivedeclinerdquo Topics in Cognitive Science vol 6 pp 5ndash42 2014

[79] M Ramscar C C Sun P Hendrix and R H Baayen ldquoTheMis-measurement of Mind Life-Span Changes in Paired-Associate-Learning Scores Reflect the ldquoCostrdquo of Learning Not CognitiveDeclinerdquo Psychological Science vol 28 no 8 pp 1171ndash1179 2017

[80] G desRosiers and D Ivison ldquoPaired associate learning Form 1and form 2 of the Wechsler memory scalerdquo Archives of ClinicalNeuropsychology vol 3 no 1 pp 47ndash67 1988

[81] E Bruni N K Tran andM Baroni ldquoMultimodal distributionalsemanticsrdquo Journal of Artificial Intelligence Research vol 49 pp1ndash47 2014

[82] W N Venables and B D RipleyModern Applied Statistics withS-PLUS Springer NY USA 2002

[83] A B Warriner V Kuperman and M Brysbaert ldquoNorms ofvalence arousal and dominance for 13915 English lemmasrdquoBehavior Research Methods vol 45 no 4 pp 1191ndash1207 2013

[84] D Kastovsky ldquoProductivity in word formationrdquo Linguistics vol24 no 3 pp 585ndash600 1986

[85] R H Baayen Y Chuang and J P Blevins ldquoInflectionalmorphology with linear mappingsrdquoTheMental Lexicon vol 13no 2 pp 232ndash270 2018

[86] Burnage CELEX A Guide for Users Centre for Lexical Infor-mation Nijmegen The Netherlands 1988

[87] C McBride-Chang ldquoModels of Speech Perception and Phono-logical Processing in Readingrdquo Child Development vol 67 no4 pp 1836ndash1856 1996

[88] G C Van Orden H Kloos et al ldquoThe question of phonologyand readingrdquo inThe Science of Reading A Handbook pp 61ndash782005

[89] D Jared and K OrsquoDonnell ldquoSkilled adult readers activate themeanings of high-frequency words using phonology Evidencefrom eye trackingrdquoMemoryampCognition vol 45 no 2 pp 334ndash346 2017

[90] G S Dell ldquoA Spreading-Activation Theory of Retrieval inSentence Productionrdquo Psychological Review vol 93 no 3 pp283ndash321 1986

[91] A Marantz ldquoNo escape from morphemes in morphologicalprocessingrdquo Language and Cognitive Processes vol 28 no 7 pp905ndash916 2013

[92] M Bozic W D Marslen-Wilson E A Stamatakis M HDavis and L K Tyler ldquoDifferentiating morphology formand meaning Neural correlates of morphological complexityrdquoCognitive Neuroscience vol 19 no 9 pp 1464ndash1475 2007

[93] R F Port andA P Leary ldquoAgainst formal phonologyrdquoLanguagevol 81 no 4 pp 927ndash964 2005

[94] J P Blevins ldquoStems and paradigmsrdquo Language vol 79 no 4 pp737ndash767 2003

[95] R Forsyth and D Holmes ldquoFeature-finding for test classifica-tionrdquo Literary and Linguistic Computing vol 11 no 4 pp 163ndash174 1996

[96] H Pham and R H Baayen ldquoVietnamese compounds showan anti-frequency effect in visual lexical decisionrdquo LanguageCognition and Neuroscience vol 30 no 9 pp 1077ndash1095 2015

[97] L Cohen and S Dehaene ldquoVentral and dorsal contributions toword readingrdquo inTheCognitive Neurosciences M S GazzanigaEd pp 789ndash804 Massachusetts Institute of Technology 2009

[98] E Keuleers P Lacey K Rastle and M Brysbaert ldquoThe BritishLexicon Project Lexical decision data for 28730 monosyllabicand disyllabic English wordsrdquo Behavior Research Methods vol44 no 1 pp 287ndash304 2012

[99] R K Wagner J K Torgesen and C A Rashotte ldquoDevel-opment of reading-related phonological processing abilitiesnew evidence of bidirectional causality from a latent variablelongitudinal studyrdquoDevelopmental Psychology vol 30 no 1 pp73ndash87 1994

[100] K F EWong andH-C Chen ldquoOrthographic and phonologicalprocessing in reading Chinese text Evidence from eye fixa-tionsrdquo Language and Cognitive Processes vol 14 no 5-6 pp461ndash480 1999

[101] R L Newman D Jared and C A Haigh ldquoDoes phonologyplay a role when skilled readers read high-frequency wordsEvidence from ERPsrdquo Language and Cognitive Processes vol 27no 9 pp 1361ndash1384 2012

[102] D Jared J Ashby S J Agauas and B A Levy ldquoPhonologicalactivation of word meanings in grade 5 readersrdquo Journal ofExperimental Psychology LearningMemory andCognition vol42 no 4 pp 524ndash541 2016

[103] T Bitan A Kaftory A Meiri-Leib Z Eviatar and O PelegldquoPhonological ambiguity modulates resolution of semanticambiguity during reading An fMRI study of Hebrewrdquo Neu-ropsychology vol 31 no 7 pp 759ndash777 2017

[104] D Jared and S Bainbridge ldquoReading homophone puns Evi-dence from eye trackingrdquo Canadian Journal of ExperimentalPsychology vol 71 no 1 pp 2ndash13 2017

[105] S Amenta M Marelli and S Sulpizio ldquoFrom sound tomeaning Phonology-to-Semantics mapping in visual wordrecognitionrdquo Psychonomic Bulletin amp Review vol 24 no 3 pp887ndash893 2017

[106] M Perrone-Bertolotti J Kujala J R Vidal et al ldquoHow silent issilent reading intracerebral evidence for top-down activationof temporal voice areas during readingrdquo The Journal of Neuro-science vol 32 no 49 pp 17554ndash17562 2012

[107] B Yao P Belin andC Scheepers ldquoSilent reading of direct versusindirect speech activates voice-selective areas in the auditorycortexrdquo Journal of Cognitive Neuroscience vol 23 no 10 pp3146ndash3152 2011

[108] M Coltheart B Curtis P Atkins and M Haller ldquoMod-els of Reading Aloud Dual-Route and Parallel-Distributed-Processing Approachesrdquo Psychological Review vol 100 no 4pp 589ndash608 1993

[109] M Coltheart ldquoModeling reading The dual-route approachrdquo inThe Science of Reading A Handbook pp 6ndash23 2005

38 Complexity

[110] M Pitt K Johnson E Hume S Kiesling and W RaymondldquoThe Buckeye corpus of conversational speech Labeling con-ventions and a test of transcriber reliabilityrdquo Speech Communi-cation vol 45 no 1 pp 89ndash95 2005

[111] K Johnson ldquoMassive reduction in conversational AmericanEnglishrdquo in Proceedings of the Spontaneous Speech Data andAnalysis Proceedings of the 1st Session of the 10th InternationalSymposium pp 29ndash54 The National International Institute forJapanese Language Tokyo Japan 2004

[112] C Phillips ldquoLevels of representation in the electrophysiology ofspeech perceptionrdquo Cognitive Science vol 25 no 5 pp 711ndash7312001

[113] R L Diehl A J Lotto and L L Holt ldquoSpeech perceptionrdquoAnnual Review of Psychology vol 55 pp 149ndash179 2004

[114] D Norris and J M McQueen ldquoShortlist B a Bayesian model ofcontinuous speech recognitionrdquo Psychological Review vol 115no 2 pp 357ndash395 2008

[115] S Hawkins ldquoRoles and representations of systematic finephonetic detail in speech understandingrdquo Journal of Phoneticsvol 31 no 3-4 pp 373ndash405 2003

[116] CCucchiarini andH Strik ldquoAutomatic Phonetic TranscriptionAn overviewrdquo in Proceedings of the 15th ICPhS pp 347ndash350Barcelona Spain 2003

[117] K JohnsonTheAuditoryPerceptual Basis for Speech Segmenta-tion Ohio State University Working Papers in Linguistics 1997

[118] H Fletcher ldquoAuditory patternsrdquo Reviews of Modern Physics vol12 no 1 pp 47ndash65 1940

[119] D Arnold ldquoAcousticndlcoder Coding sound files for use withndl R package version 101rdquo 2017

[120] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2016

[121] J Pickett and I Pollack ldquoIntelligibility of excerpts from fluentspeech Effects of rate of utterance and duration of excerptrdquoLanguage and Speech vol 6 pp 151ndash164 1963

[122] L Shockey ldquoPerception of reduced forms by non-native speak-ers of Englishrdquo in Sound Patterns of Spontaneous Speech DDuez Ed pp 97ndash100 ESCA Aix 1998

[123] M Ernestus H Baayen and R Schreuder ldquoThe recognition ofreduced word formsrdquo Brain and Language vol 81 no 1-3 pp162ndash173 2002

[124] R H Baayen C Shaoul J Willits and M Ramscar ldquoCom-prehension without segmentation a proof of concept withnaive discriminative learningrdquo Language Cognition and Neu-roscience vol 31 no 1 pp 106ndash128 2016

[125] C P Browman and L Goldstein ldquoArticulatory phonology Anoverviewrdquo Phonetica vol 49 no 3-4 pp 155ndash180 1992

[126] J A Tourville and F H Guenther ldquoThe DIVA model A neuraltheory of speech acquisition and productionrdquo Language andCognitive Processes vol 26 no 7 pp 952ndash981 2011

[127] G Csardi and T Nepusz ldquoThe igraph software package forcomplex network researchrdquo InterJournal Complex Systems vol1695 no 5 pp 1ndash9 2006

[128] M Ernestus and R H Baayen ldquoPredicting the unpredictableInterpreting neutralized segments in Dutchrdquo Language vol 79no 1 pp 5ndash38 2003

[129] R Weingarten G Nottbusch and U Will ldquoMorphemes sylla-bles and graphemes in written word productionrdquo inMultidisci-plinary Approaches to Speech Production T Pechmann and CHabel Eds pp 529ndash572 Mouton de Gruyter Berlin Germany2004

[130] R Bertram F E Toslashnnessen S Stromqvist J Hyona andP Niemi ldquoCascaded processing in written compound wordproductionrdquo Frontiers in Human Neuroscience vol 9 pp 1ndash102015

[131] T Cho ldquoEffects of morpheme boundaries on intergesturaltiming Evidence from Koreanrdquo Phonetica vol 58 no 3 pp129ndash162 2001

[132] S N Wood Generalized Additive Models Chapman ampHallCRC NY USA 2017

[133] M Seidenberg ldquoSublexical structures in visual word recogni-tion Access units or orthographic redundancyrdquo in Attentionand Performance XII M Coltheart Ed pp 245ndash264 LawrenceErlbaum Associates Hove 1987

[134] J M McQueen ldquoSegmentation of continuous speech usingphonotacticsrdquo Journal of Memory and Language vol 39 no 1pp 21ndash46 1998

[135] J B Hay Causes and Consequences of Word Structure Rout-ledge NY USA London UK 2003

[136] J B Hay ldquoFrom speech perception to morphology Affix-ordering revisitedrdquo Language vol 78 pp 527ndash555 2002

[137] J B Hay and R H Baayen ldquoPhonotactics parsing and produc-tivityrdquo Italian Journal of Linguistics vol 15 no 1 pp 99ndash1302003

[138] J Hay J Pierrehumbert and M Beckman ldquoSpeech perceptionwell-formedness and the statistics of the lexiconrdquo Papers inlaboratory phonology VI pp 58ndash74 2004

[139] M Ferro C Marzi and V Pirrelli ldquoA self-organizing modelof word storage and processing implications for morphologylearningrdquo Lingue e Linguaggio vol 10 no 2 pp 209ndash226 2011

[140] F ChersiM Ferro G Pezzulo andV Pirrelli ldquoTopological self-organization and prediction learning support both action andlexical chains in the brainrdquo Topics in Cognitive Science vol 6no 3 pp 476ndash491 2014

[141] V Pirrelli M Ferro and C Marzi ldquoComputational complexityof abstractive morphologyrdquo in Understanding and MeasuringMorphological Complexity M Bearman D Brown and G GCorbett Eds pp 141ndash166 Oxford University Press 2015

[142] C Marzi M Ferro and V Pirrelli ldquoIs inflectional irregularitydysfunctional to human processingrdquo in Abstract Booklet TheMental Lexicon 2018 V Kuperman Ed University of AlbertaEdmonton 2018

[143] E Keuleers M Stevens P Mandera and M Brysbaert ldquoWordknowledge in the crowd Measuring vocabulary size and wordprevalence in a massive online experimentrdquo The QuarterlyJournal of Experimental Psychology vol 68 no 8 pp 1665ndash16922015

[144] M Ernestus Voice Assimilation And Segment Reduction inCasual Dutch A Corpus-Based Study of the Phonology-PhoneticsInterface LOT Utrecht the Netherlands 2000

[145] R Kemps M Ernestus R Schreuder and H Baayen ldquoProcess-ing reducedword formsThe suffix restoration effectrdquoBrain andLanguage vol 90 no 1-3 pp 117ndash127 2004

[146] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) vol 1 pp 886ndash893 June 2005

[147] G Hickok ldquoThe architecture of speech production and the roleof the phoneme in speech processingrdquo Language Cognition andNeuroscience vol 29 no 1 pp 2ndash20 2014

[148] B V Tucker M Sims and R H Baayen Opposing Forceson Acoustic Duration Manuscript University of Alberta andUniversity of Tubingen 2018

Complexity 39

[149] I Plag J Homann and G Kunter ldquoHomophony and mor-phology The acoustics of word-final S in Englishrdquo Journal ofLinguistics vol 53 no 1 pp 181ndash216 2017

[150] F Tomaschek I Plag M Ernestus and R H BaayenModeling the Duration of Word-Final s in English withNaive Discriminative Learning Manuscript University ofSiegenTubingenNijmegen 2018

[151] R E Kalman ldquoA new approach to linear filtering and predictionproblemsrdquo Journal of Fluids Engineering vol 82 no 1 pp 35ndash451960

[152] B Russell ldquoOn denotingrdquo Mind vol 14 no 56 pp 479ndash4931905

[153] N Hornstein Logical Form FromGB toMinimalism Blackwell1995

[154] R Jackendoff Semantic StructuresMITPress Cambridge 1990[155] J J McCarthy ldquoA prosodic theory of non-concatenative mor-

phologyrdquo Linguistic Inquiry vol 12 pp 373ndash418 1981[156] A Ussishkin ldquoA Fixed Prosodic Theory of Nonconcatenative

Templaticmorphologyrdquo Natural Language amp Linguistic Theoryvol 23 no 1 pp 169ndash218 2005

[157] A Ussishkin ldquoAffix-favored contrast inequity and psycholin-guistic grounding for non-concatenative morphologyrdquo Mor-phology vol 16 no 1 pp 107ndash125 2006

[158] RW Young andWMorganTheNavajo Language A Grammarand Colloquial Dictionary University of New Mexico Press1980

[159] S C Levinson and A Majid ldquoThe island of time Yelı Dnye thelanguage of Rossel islandrdquo Frontiers in psychology vol 4 2013

[160] S C Levinson ldquoThe language of space in Yelı Dnyerdquo inGrammars of Space Explorations in Cognitive Diversity pp 157ndash203 Cambridge University Press 2006

[161] E Shafaei-Bajestan and R H Baayen ldquoWide Learning forAuditory Comprehensionrdquo in Proceedings of the Interspeech2018 2018

[162] J Hay and K Drager ldquoStuffed toys and speech perceptionrdquoLinguistics vol 48 no 4 pp 865ndash892 2010

[163] V Henri and C Henri ldquoOn our earliest recollections ofchildhoodrdquo Psychological Review vol 2 pp 215-216 1895

[164] S Freud ldquoChildhood and concealing memoriesrdquo in The BasicWritings of Sigmund Freud The Modern Library NY USA19051953

[165] P J Bauer and M Larkina ldquoThe onset of childhood amnesiain childhood A prospective investigation of the course anddeterminants of forgetting of early-life eventsrdquoMemory vol 22no 8 pp 907ndash924 2014

[166] M S Seidenberg and L M Gonnerman ldquoExplaining deriva-tional morphology as the convergence of codesrdquo Trends inCognitive Sciences vol 4 no 9 pp 353ndash361 2000

[167] M Coltheart K Rastle C Perry R Langdon and J ZieglerldquoDRC a dual route cascaded model of visual word recognitionand reading aloudrdquoPsychological Review vol 108 no 1 pp 204ndash256 2001

[168] M V Butz and E F Kutter How the Mind Comes into BeingIntroducing Cognitive Science from a Functional and Computa-tional Perspective Oxford University Press 2016

[169] M W Harm and M S Seidenberg ldquoPhonology readingacquisition and dyslexia Insights from connectionist modelsrdquoPsychological Review vol 106 no 3 pp 491ndash528 1999

[170] L M Gonnerman M S Seidenberg and E S AndersenldquoGraded semantic and phonological similarity effects in prim-ing Evidence for a distributed connectionist approach to

morphologyrdquo Journal of Experimental Psychology General vol136 no 2 pp 323ndash345 2007

[171] H Zhao O Gallo I Frosio and J Kautz ldquoLoss Functions forImage Restoration With Neural Networksrdquo IEEE Transactionson Computational Imaging vol 3 no 1 pp 47ndash57 2017

[172] C Clarke and P Luce ldquoPerceptual adaptation to speakercharacteristics Vot boundaries in stop voicing categorizationrdquoin Proceedings of the ISCA Workshop on Plasticity in SpeechPerception 2005

[173] DNorris JMMcQueen andACutler ldquoPerceptual learning inspeechrdquo Cognitive Psychology vol 47 no 2 pp 204ndash238 2003

[174] A Schweitzer and N Lewandowski ldquoSocial factors in conver-gence of f1 and f2 in spontaneous speechrdquo in Proceedings ofthe 10th International Seminar on Speech Production CologneGermany 2014

[175] M J Pickering and S Garrod ldquoToward a mechanistic psychol-ogy of dialoguerdquo Behavioral and Brain Sciences vol 27 no 2 pp169ndash190 2004

[176] C M Clarke and M F Garrett ldquoRapid adaptation to foreign-accented Englishrdquo The Journal of the Acoustical Society ofAmerica vol 116 no 6 pp 3647ndash3658 2004

[177] W J Levelt ldquoDie konnektionistische moderdquo Sprache UndKognition vol 10 no 2 pp 61ndash72 1991

[178] A Kostic ldquoInformational load constraints on processinginflected morphologyrdquo in Morphological Aspects of LanguageProcessing L B Feldman Ed Lawrence Erlbaum Inc Publish-ers New Jersey USA 1995

[179] J P BlevinsWord andParadigmMorphology OxfordUniversityPress 2016

[180] E Forstemann Uber Deutsche volksetymologie Zeitschrift furvergleichende Sprachforschung auf dem Gebiete des DeutschenGriechischen und Lateinischen 1852

[181] K Nault Morphological Therapy Protocal [Ph D thesis] Uni-versity of Alberta Edmonton 2010

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 22: The Discriminative Lexicon: A Unified Computational Model ...downloads.hindawi.com/journals/complexity/2019/4895891.pdf · as developed within computational linguistics and com-puter

22 Complexity

longest path which invariably was perfectly aligned with thesequence of triphones that defines wordsrsquo forms

We also evaluated model performance with a secondmore general heuristic algorithm that also makes use of thesame algorithm from graph theory Our algorithm sets up agraph with vertices collected from the triphones that are bestsupported by the relevant semantic vectors and considers allpaths it can find that lead from an initial triphone to a finaltriphoneThis algorithm which is presented inmore detail inthe appendix and which is essential for novel complex wordsproduced the correct form for 3982 out of 3987 words Itselected a shorter form for five words int for intent lin forlinnen mis for mistress oint for ointment and pin for pippinThe correct forms were also found but ranked second due toa simple length penalty that is implemented in the algorithm

From these analyses it is clear that mapping nearly4000 semantic vectors on their corresponding triphone pathscan be accomplished with very high accuracy for Englishmonolexomic words The question to be addressed nextis how well this approach works for complex words Wefirst address inflected forms and limit ourselves here to theinflected variants of the present set of 3987 monolexomicwords

522 Performance on Inflected Words Following the classicdistinction between inflection and word formation inflectedwords did not receive semantic vectors of their own Never-theless we can create semantic vectors for inflected words byadding the semantic vector of an inflectional function to thesemantic vector of its base However there are several ways inwhich themapping frommeaning to form for inflectedwordscan be set up To explain this we need some further notation

Let S119898 and T119898 denote the submatrices of S and T thatcontain the semantic and triphone vectors of monolexomicwords Assume that a subset of 119896 of these monolexomicwords is attested in the training corpus with inflectionalfunction 119886 Let T119886 denote the matrix with the triphonevectors of these inflected words and let S119886 denote thecorresponding semantic vectors To obtain S119886 we take thepertinent submatrix S119898119886 from S119898 and add the semantic vectors119886 of the affix

S119886 = S119898119886 + i⨂ s119886 (27)

Here i is a unit vector of length 119896 and ⨂ is the generalizedKronecker product which in (27) stacks 119896 copies of s119886 row-wise As a first step we could define a separate mapping G119886for each inflectional function 119886

S119886G119886 = T119886 (28)

but in this set-up learning of inflected words does not benefitfrom the knowledge of the base words This can be remediedby a mapping for augmented matrices that contain the rowvectors for both base words and inflected words[S119898

S119886]G119886 = [T119898

T119886] (29)

The dimension of G119886 (length of semantic vector by lengthof triphone vector) remains the same so this option is notmore costly than the preceding one Nevertheless for eachinflectional function a separate large matrix is required Amuch more parsimonious solution is to build augmentedmatrices for base words and all inflected words jointly

[[[[[[[[[[S119898S1198861S1198862S119886119899

]]]]]]]]]]G = [[[[[[[[[[

T119898T1198861T1198862T119886119899

]]]]]]]]]](30)

The dimension of G is identical to that of G119886 but now allinflectional functions are dealt with by a single mappingIn what follows we report the results obtained with thismapping

We selected 6595 inflected variants which met the cri-terion that the frequency of the corresponding inflectionalfunction was at least 50 This resulted in a dataset with 91comparatives 97 superlatives 2401 plurals 1333 continuousforms (eg walking) 859 past tense forms and 1086 formsclassed as perfective (past participles) as well as 728 thirdperson verb forms (eg walks) Many forms can be analyzedas either past tenses or as past participles We followed theanalyses of the treetagger which resulted in a dataset inwhichboth inflectional functions are well-attested

Following (30) we obtained an (augmented) 10582 times5483 semantic matrix S whereas before we retained the 5483columns with the highest column variance The (augmented)triphone matrix T for this dataset had the same dimensions

Inspection of the activations of the triphones revealedthat targeted triphones had top activations for 85 of themonolexomic words and 86 of the inflected words Theproportion of words with at most one intruding triphone was97 for both monolexomic and inflected words The graph-based algorithm performed with an overall accuracy of 94accuracies broken down bymorphology revealed an accuracyof 99 for the monolexomic words and an accuracy of 92for inflected words One source of errors for the productionalgorithm is inconsistent coding in the celex database Forinstance the stem of prosper is coded as having a final schwafollowed by r but the inflected forms are coded without ther creating a mismatch between a partially rhotic stem andcompletely non-rhotic inflected variants

We next put model performance to a more stringent testby using 10-fold cross-validation for the inflected variantsFor each fold we trained on all stems and 90 of all inflectedforms and then evaluated performance on the 10of inflectedforms that were not seen in training In this way we can ascer-tain the extent to which our production system (network plusgraph-based algorithm for ordering triphones) is productiveWe excluded from the cross-validation procedure irregularinflected forms forms with celex phonological forms withinconsistent within-paradigm rhoticism as well as forms thestem of which was not available in the training set Thus

Complexity 23

cross-validation was carried out for a total of 6236 inflectedforms

As before the semantic vectors for inflected words wereobtained by addition of the corresponding content and inflec-tional semantic vectors For each training set we calculatedthe transformationmatrixG from the S andTmatrices of thattraining set For an out-of-bag inflected form in the test setwe calculated its semantic vector and multiplied this vectorwith the transformation matrix (using (21)) to obtain thepredicted triphone vector t

The proportion of forms that were predicted correctlywas 062 The proportion of forms ranked second was 025Forms that were incorrectly ranked first typically were otherinflected forms (including bare stems) that happened toreceive stronger support than the targeted form Such errorsare not uncommon in spoken English For instance inthe Buckeye corpus [110] closest is once reduced to closFurthermore Dell [90] classified forms such as concludementfor conclusion and he relax for he relaxes as (noncontextual)errors

The algorithm failed to produce the targeted form for3 of the cases Examples of the forms produced instead arethe past tense for blazed being realized with [zId] insteadof [zd] the plural mouths being predicted as having [Ts] ascoda rather than [Ds] and finest being reduced to finst Thevoiceless production formouth does not follow the dictionarynorm but is used as attested by on-line pronunciationdictionaries Furthermore the voicing alternation is partlyunpredictable (see eg [128] for final devoicing in Dutch)and hence model performance here is not unreasonable Wenext consider model accuracy for derived words

523 Performance on Derived Words When building thevector space model we distinguished between inflectionand word formation Inflected words did not receive theirown semantic vectors By contrast each derived word wasassigned its own lexome together with a lexome for itsderivational function Thus happiness was paired with twolexomes happiness and ness Since the semantic matrix forour dataset already contains semantic vectors for derivedwords we first investigated how well forms are predictedwhen derived words are assessed along with monolexomicwords without any further differentiation between the twoTo this end we constructed a semantic matrix S for 4885words (rows) by 4993 (columns) and constructed the trans-formation matrix G from this matrix and the correspondingtriphone matrix T The predicted form vectors of T =SG supported the targeted triphones above all other tri-phones without exception Furthermore the graph-basedalgorithm correctly reconstructed 99 of all forms withonly 5 cases where it assigned the correct form secondrank

Next we inspected how the algorithm performs whenthe semantic vectors of derived forms are obtained fromthe semantic vectors of their base words and those of theirderivational lexomes instead of using the semantic vectors ofthe derived words themselves To allow subsequent evalua-tion by cross-validation we selected those derived words that

contained an affix that occured at least 30 times in our data set(again (38) agent (177) ful (45) instrument (82) less(54) ly (127) and ness (57)) to a total of 770 complex words

We first combined these derived words with the 3987monolexomic words For the resulting 4885 words the Tand S matrices were constructed from which we derivedthe transformation matrix G and subsequently the matrixof predicted triphone strengths T The proportion of wordsfor which the targeted triphones were the best supportedtriphones was 096 and the graph algorithm performed withan accuracy of 989

In order to assess the productivity of the system weevaluated performance on derived words by means of 10-fold cross-validation For each fold we made sure that eachaffix was present proportionally to its frequency in the overalldataset and that a derived wordrsquos base word was included inthe training set

We first examined performance when the transformationmatrix is estimated from a semantic matrix that contains thesemantic vectors of the derived words themselves whereassemantic vectors for unseen derived words are obtained bysummation of the semantic vectors of base and affix It turnsout that this set-up results in a total failure In the presentframework the semantic vectors of derived words are tooidiosyncratic and too finely tuned to their own collocationalpreferences They are too scattered in semantic space tosupport a transformation matrix that supports the triphonesof both stem and affix

We then used exactly the same cross-validation procedureas outlined above for inflected words constructing semanticvectors for derived words from the semantic vectors of baseand affix and calculating the transformation matrix G fromthese summed vectors For unseen derivedwordsGwas usedto transform the semantic vectors for novel derived words(obtained by summing the vectors of base and affix) intotriphone space

For 75 of the derived words the graph algorithmreconstructed the targeted triphones For 14 of the derivednouns the targeted form was not retrieved These includecases such as resound which the model produced with [s]instead of the (unexpected) [z] sewer which the modelproduced with R instead of only R and tumbler where themodel used syllabic [l] instead of nonsyllabic [l] given in thecelex target form

53 Weak Links in the Triphone Graph and Delays in SpeechProduction An important property of the model is that thesupport for trigrams changes where a wordrsquos graph branchesout for different morphological variants This is illustratedin Figure 9 for the inflected form blending Support for thestem-final triphone End is still at 1 but then blend can endor continue as blends blended or blending The resultinguncertainty is reflected in the weights on the edges leavingEnd In this example the targeted ing form is driven by theinflectional semantic vector for continuous and hence theedge to ndI is best supported For other forms the edgeweights will be different and hence other paths will be bettersupported

24 Complexity

1

1

1051

024023

03

023

1

001 001

1

02

03

023

002

1 0051

bl

blE

lEn

EndndI

dIN

IN

INI

NIN

dId

Id

IdI

dII

IIN

ndz

dz

dzI

zIN

nd

EnI

nIN

lEI

EIN

Figure 9 The directed graph for blending Vertices represent triphones including triphones such as IIN that were posited by the graphalgorithm to bridge potentially relevant transitions that are not instantiated in the training data Edges are labelled with their edge weightsThe graph the edge weights of which are specific to the lexomes blend and continuous incorporates not only the path for blending butalso the paths for blend blends and blended

The uncertainty that arises at branching points in thetriphone graph is of interest in the light of several exper-imental results For instance inter keystroke intervals intyping become longer at syllable and morph boundaries[129 130] Evidence from articulography suggests variabilityis greater at morph boundaries [131] Longer keystroke exe-cution times and greater articulatory variability are exactlywhat is expected under reduced edge support We thereforeexamined whether the edge weights at the first branchingpoint are predictive for lexical processing To this end weinvestigated the acoustic duration of the segment at the centerof the first triphone with a reduced LDL edge weight (in thepresent example d in ndI henceforth lsquobranching segmentrsquo)for those words in our study that are attested in the Buckeyecorpus [110]

The dataset we extracted from the Buckeye corpus com-prised 15105 tokens of a total of 1327 word types collectedfrom 40 speakers For each of these words we calculatedthe relative duration of the branching segment calculatedby dividing segment duration by word duration For thepurposes of statistical evaluation the distribution of thisrelative duration was brought closer to normality bymeans ofa logarithmic transformationWe fitted a generalized additivemixed model [132] to log relative duration with randomintercepts for word and speaker log LDL edge weight aspredictor of interest and local speech rate (Phrase Rate)log neighborhood density (Ncount) and log word frequencyas control variables A summary of this model is presentedin Table 7 As expected an increase in LDL Edge Weight

goes hand in hand with a reduction in the duration of thebranching segment In other words when the edge weight isreduced production is slowed

The adverse effects of weaknesses in a wordrsquos path inthe graph where the path branches is of interest againstthe discussion about the function of weak links in diphonetransitions in the literature on comprehension [133ndash138] Forcomprehension it has been argued that bigram or diphonelsquotroughsrsquo ie low transitional probabilities in a chain of hightransitional probabilities provide points where sequencesare segmented and parsed into their constituents Fromthe perspective of discrimination learning however low-probability transitions function in exactly the opposite wayfor comprehension [54 124] Naive discriminative learningalso predicts that troughs should give rise to shorter process-ing times in comprehension but not because morphologicaldecomposition would proceed more effectively Since high-frequency boundary bigrams and diphones are typically usedword-internally across many words they have a low cuevalidity for thesewords Conversely low-frequency boundarybigrams are much more typical for very specific base+affixcombinations and hence are better discriminative cues thatafford enhanced activation of wordsrsquo lexomes This in turngives rise to faster processing (see also Ramscar et al [78] forthe discriminative function of low-frequency bigrams in low-frequency monolexomic words)

There is remarkable convergence between the directedgraphs for speech production such as illustrated in Figure 9and computational models using temporal self-organizing

Complexity 25

Table 7 Statistics for the partial effects of a generalized additive mixed model fitted to the relative duration of edge segments in the Buckeyecorpus The trend for log frequency is positive accelerating TPRS thin plate regression spline

A parametric coefficients Estimate Std Error t-value p-valueIntercept -19165 00592 -323908 lt 00001Log LDL Edge Weight -01358 00400 -33960 00007Phrase Rate 00047 00021 22449 00248Log Ncount 01114 00189 58837 lt 00001B smooth terms edf Refdf F-value p-valueTPRS log Frequency 19007 19050 75495 00004random intercepts word 11498839 13230000 314919 lt 00001random intercepts speaker 302995 390000 345579 lt 00001

maps (TSOMs [139ndash141]) TSOMs also use error-drivenlearning and in recent work attention is also drawn toweak edges in wordsrsquo paths in the TSOM and the conse-quences thereof for lexical processing [142] We are not usingTSOMs however one reason being that in our experiencethey do not scale up well to realistically sized lexiconsA second reason is that we are pursuing the hypothesisthat form serves meaning and that self-organization ofform by itself is not necessary This hypothesis is likelytoo strong especially as we do not provide a rationale forthe trigram and triphone units that we use to representaspects of form It is worth noting that spatial organizationof form features is not restricted to TSOMs Although inour approach the spatial organization of triphone features isleft unspecified such an organization can be easily enforced(see [85] for further details) by using an algorithm suchas graphopt which self-organizes the vertices of a graphinto a two-dimensional plane Figure 9 was obtained withthis algorithm (Graphopt was developed for the layout oflarge graphs httpwwwschmuhlorggraphopt andis implemented in the igraph package Graphopt uses basicprinciples of physics to iteratively determine an optimallayout Each node in the graph is given both mass and anelectric charge and edges between nodes are modeled asspringsThis sets up a system inwhich there are attracting andrepelling forces between the vertices of the graph and thisphysical system is simulated until it reaches an equilibrium)

54 Discussion Our production network reconstructsknown wordsrsquo forms with a high accuracy 999 formonolexomic words 92 for inflected words and 99 forderived words For novel complex words accuracy under10-fold cross-validation was 62 for inflected words and 75for derived words

The drop in performance for novel forms is perhapsunsurprising given that speakers understand many morewords than they themselves produce even though they hearnovel forms on a fairly regular basis as they proceed throughlife [79 143]

However we also encountered several technical problemsthat are not due to the algorithm but to the representationsthat the algorithm has had to work with First it is surprisingthat accuracy is as high as it is given that the semanticvectors are constructed from a small corpus with a very

simple discriminative algorithm Second we encounteredinconsistencies in the phonological forms retrieved from thecelex database inconsistencies that in part are due to theuse of discrete triphones Third many cases where the modelpredictions do not match the targeted triphone sequence thetargeted forms have a minor irregularity (eg resound with[z] instead of [s]) Fourth several of the typical errors that themodel makes are known kinds of speech errors or reducedforms that one might encounter in engaged conversationalspeech

It is noteworthy that themodel is almost completely data-driven The triphones are derived from celex and otherthan somemanual corrections for inconsistencies are derivedautomatically from wordsrsquo phone sequences The semanticvectors are based on the tasa corpus and were not in any wayoptimized for the production model Given the triphone andsemantic vectors the transformation matrix is completelydetermined No by-hand engineering of rules and exceptionsis required nor is it necessary to specify with hand-codedlinks what the first segment of a word is what its secondsegment is etc as in the weaver model [37] It is only in theheuristic graph algorithm that three thresholds are requiredin order to avoid that graphs become too large to remaincomputationally tractable

The speech production system that emerges from thisapproach comprises first of all an excellent memory forforms that have been encountered before Importantly thismemory is not a static memory but a dynamic one It is not arepository of stored forms but it reconstructs the forms fromthe semantics it is requested to encode For regular unseenforms however a second network is required that projects aregularized semantic space (obtained by accumulation of thesemantic vectors of content and inflectional or derivationalfunctions) onto the triphone output space Importantly itappears that no separate transformations or rules are requiredfor individual inflectional or derivational functions

Jointly the two networks both of which can also betrained incrementally using the learning rule ofWidrow-Hoff[44] define a dual route model attempts to build a singleintegrated network were not successfulThe semantic vectorsof derived words are too idiosyncratic to allow generalizationfor novel forms It is the property of the semantic vectorsof derived lexomes described in Section 323 namely thatthey are close to but not inside the cloud of their content

26 Complexity

lexomes which makes them productive Novel forms do notpartake in the idiosyncracies of lexicalized existing wordsbut instead remain bound to base and derivational lexomesIt is exactly this property that makes them pronounceableOnce produced a novel form will then gravitate as experi-ence accumulates towards the cloud of semantic vectors ofits morphological category meanwhile also developing itsown idiosyncracies in pronunciation including sometimeshighly reduced pronunciation variants (eg tyk for Dutchnatuurlijk ([natyrlk]) [111 144 145] It is perhaps possibleto merge the two routes into one but for this much morefine-grained semantic vectors are required that incorporateinformation about discourse and the speakerrsquos stance withrespect to the adressee (see eg [115] for detailed discussionof such factors)

6 Bringing in Time

Thus far we have not considered the role of time The audiosignal comes in over time longerwords are typically readwithmore than one fixation and likewise articulation is a temporalprocess In this section we briefly outline how time can bebrought into the model We do so by discussing the readingof proper names

Proper names (morphologically compounds) pose achallenge to compositional theories as the people referredto by names such as Richard Dawkins Richard NixonRichardThompson and SandraThompson are not obviouslysemantic composites of each other We therefore assignlexomes to names irrespective of whether the individu-als referred to are real or fictive alive or dead Further-more we assume that when the personal name and familyname receive their own fixations both names are under-stood as pertaining to the same named entity which istherefore coupled with its own unique lexome Thus forthe example sentences in Table 8 the letter trigrams ofJohn are paired with the lexome JohnClark and like-wise the letter trigrams of Clark are paired with this lex-ome

By way of illustration we obtained thematrix of semanticvectors training on 100 randomly ordered tokens of thesentences of Table 8 We then constructed a trigram cuematrix C specifying for each word (John Clark wrote )which trigrams it contains In parallel a matrix L specify-ing for each word its corresponding lexomes (JohnClarkJohnClark write ) was set up We then calculatedthe matrix F by solving CF = L and used F to cal-culate estimated (predicted) semantic vectors L Figure 10presents the correlations of the estimated semantic vectorsfor the word forms John John Welsch Janet and Clarkwith the targeted semantic vectors for the named entitiesJohnClark JohnWelsch JohnWiggam AnneHastieand JaneClark For the sequentially read words John andWelsch the semantic vector generated by John and thatgenerated by Welsch were summed to obtain the integratedsemantic vector for the composite name

Figure 10 upper left panel illustrates that upon readingthe personal name John there is considerable uncertainty

about which named entity is at issue When subsequentlythe family name is read (upper right panel) uncertainty isreduced and John Welsch now receives full support whereasother named entities have correlations close to zero or evennegative correlations As in the small world of the presenttoy example Janet is a unique personal name there is nouncertainty about what named entity is at issue when Janetis read For the family name Clark on its own by contrastJohn Clark and Janet Clark are both viable options

For the present example we assumed that every ortho-graphic word received one fixation However depending onwhether the eye lands far enough into the word namessuch as John Clark and compounds such as graph theorycan also be read with a single fixation For single-fixationreading learning events should comprise the joint trigrams ofthe two orthographic words which are then simultaneouslycalibrated against the targeted lexome (JohnClark graph-theory)

Obviously the true complexities of reading such asparafoveal preview are not captured by this example Never-theless as a rough first approximation this example showsa way forward for integrating time into the discriminativelexicon

7 General Discussion

We have presented a unified discrimination-driven modelfor visual and auditory word comprehension and wordproduction the discriminative lexicon The layout of thismodel is visualized in Figure 11 Input (cochlea and retina)and output (typing and speaking) systems are presented inlight gray the representations of themodel are shown in darkgray For auditory comprehension we make use of frequencyband summary (FBS) features low-level cues that we deriveautomatically from the speech signal The FBS features serveas input to the auditory network F119886 that maps vectors of FBSfeatures onto the semantic vectors of S For reading lettertrigrams represent the visual input at a level of granularitythat is optimal for a functional model of the lexicon (see [97]for a discussion of lower-level visual features) Trigrams arerelatively high-level features compared to the FBS featuresFor features implementing visual input at a much lower levelof visual granularity using histograms of oriented gradients[146] see Linke et al [15] Letter trigrams are mapped by thevisual network F119900 onto S

For reading we also implemented a network heredenoted by K119886 that maps trigrams onto auditory targets(auditory verbal images) represented by triphones Theseauditory triphones in turn are mapped by network H119886onto the semantic vectors S This network is motivated notonly by our observation that for predicting visual lexicaldecision latencies triphones outperform trigrams by a widemargin Network H119886 is also part of the control loop forspeech production see Hickok [147] for discussion of thiscontrol loop and the necessity of auditory targets in speechproduction Furthermore the acoustic durations with whichEnglish speakers realize the stem vowels of English verbs[148] as well as the duration of word final s in English [149]

Complexity 27

Table 8 Example sentences for the reading of proper names To keep the example simple inflectional lexomes are not taken into account

John Clark wrote great books about antsJohn Clark published a great book about dangerous antsJohn Welsch makes great photographs of airplanesJohn Welsch has a collection of great photographs of airplanes landingJohn Wiggam will build great harpsichordsJohn Wiggam will be a great expert in tuning harpsichordsAnne Hastie has taught statistics to great second year studentsAnne Hastie has taught probability to great first year studentsJanet Clark teaches graph theory to second year studentsJohnClark write great book about antJohnClark publish a great book about dangerous antJohnWelsch make great photograph of airplaneJohnWelsch have a collection of great photograph airplane landingJohnWiggam future build great harpsichordJohnWiggam future be a great expert in tuning harpsichordAnneHastie have teach statistics to great second year studentAnneHastie have teach probability to great first year studentJanetClark teach graphtheory to second year student

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

John

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

John Welsch

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

Janet

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

Clark

Pear

son

corr

elat

ion

minus0

50

00

51

0

Figure 10 Correlations of estimated semantic vectors for John JohnWelsch Janet andClarkwith the targeted semantic vectors of JohnClarkJohnWelsch JohnWiggam AnneHastie and JaneClark

can be predicted from ndl networks using auditory targets todiscriminate between lexomes [150]

Speech production is modeled by network G119886 whichmaps semantic vectors onto auditory targets which in turnserve as input for articulation Although current modelsof speech production assemble articulatory targets fromphonemes (see eg [126 147]) a possibility that is certainlycompatible with our model when phones are recontextual-ized as triphones we think that it is worthwhile investigatingwhether a network mapping semantic vectors onto vectors ofarticulatory parameters that in turn drive the articulators canbe made to work especially since many reduced forms arenot straightforwardly derivable from the phone sequences oftheir lsquocanonicalrsquo forms [111 144]

We have shown that the networks of our model havereasonable to high production and recognition accuraciesand also generate a wealth of well-supported predictions forlexical processing Central to all networks is the hypothesisthat the relation between form and meaning can be modeleddiscriminatively with large but otherwise surprisingly simplelinear networks the underlyingmathematics of which (linearalgebra) is well-understood Given the present results it isclear that the potential of linear algebra for understandingthe lexicon and lexical processing has thus far been severelyunderestimated (the model as outlined in Figure 11 is amodular one for reasons of computational convenience andinterpretabilityThe different networks probably interact (seefor some evidence [47]) One such interaction is explored

28 Complexity

typing speaking

orthographic targets

trigrams

auditory targets

triphones

semantic vectors

TASA

auditory cues

FBS features

orthographic cues

trigrams

cochlea retina

To Ta

Ha

GoGa

KaS

Fa Fo

CoCa

Figure 11 Overview of the discriminative lexicon Input and output systems are presented in light gray the representations of the model areshown in dark gray Each of these representations is a matrix with its own set of features Arcs are labeled with the discriminative networks(transformationmatrices) thatmapmatrices onto each otherThe arc from semantic vectors to orthographic vectors is in gray as thismappingis not explored in the present study

in Baayen et al [85] In their production model for Latininflection feedback from the projected triphone paths backto the semantics (synthesis by analysis) is allowed so thatof otherwise equally well-supported paths the path that bestexpresses the targeted semantics is selected For informationflows between systems in speech production see Hickok[147]) In what follows we touch upon a series of issues thatare relevant for evaluating our model

Incremental learning In the present study we esti-mated the weights on the connections of these networkswith matrix operations but importantly these weights canalso be estimated incrementally using the learning rule ofWidrow and Hoff [44] further improvements in accuracyare expected when using the Kalman filter [151] As allmatrices can be updated incrementally (and this holds aswell for the matrix with semantic vectors which are alsotime-variant) and in theory should be updated incrementally

whenever information about the order of learning events isavailable the present theory has potential for studying lexicalacquisition and the continuing development of the lexiconover the lifetime [79]

Morphology without compositional operations Wehave shown that our networks are predictive for a rangeof experimental findings Important from the perspective ofdiscriminative linguistics is that there are no compositionaloperations in the sense of Frege and Russell [1 2 152] Thework on compositionality in logic has deeply influencedformal linguistics (eg [3 153]) and has led to the beliefthat the ldquoarchitecture of the language facultyrdquo is groundedin a homomorphism between a calculus (or algebra) ofsyntactic representations and a calculus (or algebra) based onsemantic primitives Within this tradition compositionalityarises when rules combining representations of form arematched with rules combining representations of meaning

Complexity 29

The approach of Marelli and Baroni [60] who derive thesemantic vector of a complex word from the semantic vectorof its base and a dedicated affix-specific mapping from thepertinent base words to the corresponding derived wordsis in the spirit of compositionality in that a change in form(through affixation) goes hand in hand with a change inmeaning (modeled with a linear mapping in semantic space)

The perspective on morphological processing developedin the present study breaks with this tradition Althoughsemantic vectors for inflected words are obtained by summa-tion of the semantic vectors of the base word and those ofits inflectional functions (see [85] for modeling of the richverbal paradigms of Latin) the form vectors correspondingto the resulting semantic vectors are not obtained by thesummation of vectors for chunks of form nor by any otheroperations on forms Attempts to align chunks of wordsrsquo formwith their semantic structures mdash typically represented bygraphs constructed over semantic primitives (see eg [154])mdash are destined to end in quagmires of arbitrary decisionsabout the number of inflectional classes required (Estonian30 or 300) what the chunks should look like (does Germanhave a morpheme d in its articles and demonstratives derdie das diese ) whether Semitic morphology requiresoperations on form on one or two tiers [155ndash157] andhow many affixal position slots are required for Athebascanlanguages such asNavajo [158] A language that is particularlypainful for compositional theories is Yelı Dnye an Indo-Pacific language spoken on Rossel island (in the south-east ofPapuaNewGuinea)The language has a substantial inventoryof over a 1000 function words used for verb agreementTypically these words are monosyllables that simultaneouslyspecify values for negation tense aspect person and numberof subject deixis evidentiality associated motion and coun-terfactuality Furthermore verbs typically have no less thaneight different stems the use of which depends on particularconstellations of inflectional functions [159 160] Yelı Dnyethus provides a striking counterexample to the hypothesisthat a calculus of meaning would be paralleled by a calculusof form

Whereas the theory of linear discriminative learningradically rejects the idea that morphological processing isgrounded in compositionality as defined in mainstream for-mal linguistic theory it does allow for the conjoint expressionin form of lexical meanings and inflectional functions andfor forms to map onto such conjoint semantic vectors Butjust asmultiple inflectional functions can be expressed jointlyin a single word multiple words can map onto nonconjointsemantic vectors as was illustrated above for named entityrecognition John Wiggam can be understood to refer to avery specific person without this person being a composi-tional function of the proper name John and the surnameWiggam In both these cases there is no isomorphy betweeninflectional functions and lexical meanings on the one handand chunks of form on the other hand

Of course conjoint vectors for lexical meanings andinflectional functions can be implemented andmade to workonly because a Latin verb form such as sapıvissemus isunderstood when building the semantic space S to expresslexomes for person (first) number (plural) voice (active)

tense (pluperfect) mood (subjunctive) and lexical meaning(to know) (see [85] for computational modeling details)Although this could loosely be described as a decomposi-tional approach to inflected words we prefer not to use theterm lsquodecompositionalrsquo as in formal linguistics this termas outlined above has a very different meaning A moreadequate terminology would be that the system is conjointrealizational for production and conjoint inferential for com-prehension (we note here that we have focused on temporallyconjoint realization and inference leaving temporally disjointrealization and inference for sequences of words and possiblycompounds and fixed expression for further research)

Naive versus linear discriminative learning There isone important technical difference between the learningengine of the current study ldl and naive discriminativelearning (ndl) Because ndl makes the simplifying assump-tion that outcomes are orthogonal the weights from the cuesto a given outcome are independent of the weights from thecues to the other outcomes This independence assumptionwhich motivates the word naive in naive discriminativelearning makes it possible to very efficiently update networkweights during incremental learning In ldl by contrast thisindependence no longer holds learning is no longer naiveWe therefore refer to our networks not as ndl networksbut simply as linear discriminative learning networks Actualincremental updating of ldl networks is computationallymore costly but further numerical optimization is possibleand implementations are underway

Scaling An important property of the present approachis that good results are obtained already for datasets ofmodestsize Our semantic matrix is derived from the tasa corpuswhich with only 10 million words is dwarfed by the 28 billionwords of the ukWaC corpus used by Marelli and Baroni [60]more words than any individual speaker can ever encounterin their lifetime Similarly Arnold et al [18] report goodresults for word identification when models are trained on20 hours of speech which contrasts with the huge volumesof speech required for training deep learning networks forspeech recognition Although performance increases some-whatwithmore training data [161] it is clear that considerableheadway can be made with relatively moderate volumesof speech It thus becomes feasible to train models onfor instance Australian English and New Zealand Englishusing already existing speech corpora and to implementnetworks that make precise quantitative predictions for howunderstanding is mediated by listenersrsquo expectations abouttheir interlocutors (see eg [162])

In this study we have obtained good results with asemantic matrix with a dimensionality of around 4000lexomes By itself we find it surprising that a semantic vectorspace can be scaffolded with a mere 4000 words But thisfindingmay also shed light on the phenomenon of childhoodamnesia [163 164] Bauer and Larkina [165] pointed outthat young children have autobiographical memories thatthey may not remember a few years later They argue thatthe rate of forgetting diminishes as we grow older andstabilizes in adulthood Part of this process of forgettingmay relate to the changes in the semantic matrix Cruciallywe expect the correlational structure of lexomes to change

30 Complexity

considerably over time as experience accumulates If this isindeed the case the semantic vectors for the lexomes foryoung children are different from the corresponding lexomesfor older children which will again differ from those ofadults As a consequence autobiographical memories thatwere anchored to the lexomes at a young age can no longerbe accessed as the semantic vectors to which these memorieswere originally anchored no longer exist

No examplars Given how we estimate the transforma-tion matrices with which we navigate between form andmeaning it might seem that our approach is in essence animplementation of an exemplar-based theory of the mentallexicon After all for instance the C119886 matrix specifies foreach auditory word form (exemplar) its pertinent frequencyband summary features However the computational imple-mentation of the present study should not be confused withthe underlying algorithmic conceptualization The underly-ing conceptualization is that learning proceeds incrementallytrial by trial with the weights of transformation networksbeing updated with the learning rule of Widrow and Hoff[44] In the mean trial-by-trial updating with the Widrow-Hoff learning rule results in the same expected connectionweights as obtained by the present estimates using thegeneralized inverse The important conceptual advantage ofincremental learning is however that there is no need to storeexemplars By way of example for auditory comprehensionacoustic tokens are given with the input to the learningsystemThese tokens the ephemeral result of interactingwiththe environment leave their traces in the transformationmatrix F119886 but are not themselves stored The same holdsfor speech production We assume that speakers dynamicallyconstruct the semantic vectors from their past experiencesrather than retrieving these semantic vectors from somedictionary-like fixed representation familiar from currentfile systems The dynamic nature of these semantic vectorsemerged loud and clear from the modeling of novel complexwords for speech production under cross-validation In otherwords our hypothesis is that all input and output vectors areephemeral in the sense that they represent temporary statesof the cognitive system that are not themselves represented inthat system but that leave their traces in the transformationmatrices that jointly characterize and define the cognitivesystem that we approximate with the discriminative lexicon

This dynamic perspective on the mental lexicon asconsisting of several coupled networks that jointly bringthe cognitive system into states that in turn feed into fur-ther networks (not modeled here) for sentence integration(comprehension) and articulation (production) raises thequestion about the status of units in this theory Units playan important role when constructing numeric vectors forform and meaning Units for letter trigrams and phonetrigrams are of course context-enriched letter and phoneunits Content lexomes as well as lexomes for derivationaland inflectional functions are crutches for central semanticfunctions ranging from functions realizing relatively concreteonomasiological functions (lsquothis is a dog not a catrsquo) toabstract situational functions allowing entities events andstates to be specified on the dimensions of time spacequantity aspect etc However crucial to the central thrust of

the present study there are no morphological symbols (unitscombining form andmeaning) normorphological formunitssuch as stems and exponents of any kind in the modelAbove we have outlined the many linguistic considerationsthat have led us not to want to incorporate such units aspart of our theory Importantly as there are no hidden layersin our model it is not possible to seek for lsquosubsymbolicrsquoreflexes of morphological units in patterns of activation overhidden layers Here we depart from earlier connectionistmodels which rejected symbolic representations but retainedthe belief that there should be morphological representationalbeit subsymbolic ones Seidenberg and Gonnerman [166]for instance discuss morphological representations as beinglsquointerlevel representationsrsquo that are emergent reflections ofcorrelations among orthography phonology and semanticsThis is not to say that themore agglutinative a language is themore the present model will seem to the outside observer tobe operating withmorphemes But themore a language tendstowards fusional or polysyntheticmorphology the less strongthis impression will be

Model complexity Next consider the question of howdifficult lexical processing actually is Sidestepping the com-plexities of the neural embedding of lexical processing [126147] here we narrow down this question to algorithmiccomplexity State-of-the-art models in psychology [37 114167] implementmany layers of hierarchically organized unitsand many hold it for an established fact that such units arein some sense psychologically real (see eg [19 91 168])However empirical results can be equally well understoodwithout requiring the theoretical construct of the morpheme(see eg [47 54 58 166 169 170]) The present studyillustrates this point for lsquoboundaryrsquo effects in written andoral production When paths in the triphone graph branchuncertainty increases and processing slows down Althoughobserved lsquoboundaryrsquo effects are compatible with a post-Bloomfieldian lexicon the very same effects arise in ourmodel even though morphemes do not figure in any way inour computations

Importantly the simplicity of the linear mappings in ourmodel of the discriminative lexicon allows us to sidestep theproblem that typically arise in computationally implementedfull-scale hierarchical systems namely that errors arisingat lower levels propagate to higher levels The major stepsforward made in recent end-to-end models in languageengineering suggest that end-to-end cognitive models arealso likely to perform much better One might argue thatthe hidden layers of deep learning networks represent thetraditional lsquohidden layersrsquo of phonemes morphemes andword forms mediating between form and meaning in lin-guistic models and offshoots thereof in psychology Howeverthe case of baboon lexical learning [12 13] illustrates thatthis research strategy is not without risk and that simplerdiscriminative networks can substantially outperform deeplearning networks [15] We grant however that it is conceiv-able that model predictions will improve when the presentlinear networks are replaced by deep learning networks whenpresented with the same input and output representationsused here (see Zhao et al [171] for a discussion of lossfunctions for deep learning targeting numeric instead of

Complexity 31

categorical output vectors) What is unclear is whethersuch networks will improve our understandingmdashthe currentlinear networks offer the analyst connection weights betweeninput and output representations that are straightforwardlylinguistically interpretable Where we think deep learningnetworks really will come into their own is bridging the gapbetween for instance the cochlea and retina on the one handand the heuristic representations (letter trigrams) that wehave used as a starting point for our functional theory of themappings between form and meaning

Network flexibility An important challenge for deeplearning networks designed to model human lexical process-ing is to keep the networks flexible and open to rapid updat-ing This openness to continuous learning is required notonly by findings on categorization [40 41 41 42] includingphonetic categorization [172 173] but is also visible in timeseries of reaction times Baayen and Hendrix [17] reportedimproved prediction of visual lexical decision reaction timesin the British Lexicon Project when a naive discriminativelearning network is updated from trial to trial simulating thewithin-experiment learning that goes on as subjects proceedthrough the experiment In natural speech the phenomenonof phonetic convergence between interlocutors [174] likewisebears witness to a lexicon that is constantly recalibratedin interaction with its environment (for the many ways inwhich interlocutorsrsquo speech aligns see [175]) Likewise therapidity with which listeners adapt to foreign accents [176]within a few minutes shows that lexical networks exhibitfast local optimization As deep learning networks typicallyrequire huge numbers of cycles through the training dataonce trained they are at risk of not having the requiredflexibility for fast local adaptation This risk is reduced for(incremental) wide learning as illustrated by the phoneticconvergence evidenced by the model of Arnold et al [18] forauditory comprehension (a very similar point was made byLevelt [177] in his critique of connectionist models in thenineties)

Open questions This initial study necessarily leavesmany questions unanswered Compounds and inflectedderived words have not been addressed and morphologicaloperations such as reduplication infixation and nonconcate-native morphology provide further testing grounds for thepresent approach We also need to address the processingof compounds with multiple constituents ubiquitous inlanguages as different as German Mandarin and EstonianFurthermore in actual utterances words undergo assimila-tion at their boundaries and hence a crucial test case forthe discriminative lexicon is to model the production andcomprehension of words in multi-word utterances For ourproduction model we have also completely ignored stresssyllabification and pitch which however likely has renderedthe production task more difficult than necessary

A challenge for discriminative morphology is the devel-opment of proper semantic matrices For languages withrich morphology such as Estonian current morphologicalparsers will identify case-inflected words as nominativesgenitives or partitives (among others) but these labelsfor inflectional forms do not correspond to the semanticfunctions encoded in these forms (see eg [178 179]) What

is required are computational tools that detect the appropriateinflectional lexomes for these words in the sentences orutterances in which they are embedded

A related problem specifically for the speech productionmodel is how to predict the strongly reduced word formsthat are rampant in conversational speech [111] Our auditorycomprehension network F119886 is confronted with reduced formsin training and the study of Arnold et al [18] indicates thatpresent identification accuracymay approximate human per-formance for single-word recognition Whereas for auditorycomprehension we are making progress what is required forspeech production is a much more detailed understanding ofthe circumstances under which speakers actually use reducedword forms The factors driving reduction may be in partcaptured by the semantic vectors but we anticipate thataspects of the unfolding discourse play an important role aswell which brings us back to the unresolved question of howto best model not isolated words but words in context

An important challenge for our theory which formalizesincremental implicit learning is how to address and modelthe wealth of explicit knowledge that speakers have abouttheir language We play with the way words sound in poetrywe are perplexed by wordsrsquo meanings and reinterpret themso that they make more sense (eg the folk etymology [180]reflected in modern English crawfish a crustacean and nota fish originally Middle English crevis compare ecrevissein French) we teach morphology in schools and we evenfind that making morphological relations explicit may bebeneficial for aphasic patients [181] The rich culture of worduse cannot but interact with the implicit system but howexactly this interaction unfolds andwhat its consequences areat both sides of the consciousunconscious divide is presentlyunclear

Although many questions remain our model is remark-ably successful in capturing many aspects of lexical andmorphological processing in both comprehension and pro-duction Since themodel builds on a combination of linguisti-callymotivated lsquosmartrsquo low-level representations for form andmeaning and large but otherwise straightforward andmathe-matically well-understood two-layer linear networks (to beclear networks without any hidden layers) the conclusionseems justified that algorithmically the ldquomental lexiconrdquomaybe much simpler than previously thought

Appendix

A Vectors Matrices and MatrixMultiplication

Figure 12 presents two data points 119886 and 119887 with coordinates(3 4) and (-2 -3) Arrows drawn from the origin to thesepoints are shown in blue with a solid and dashed linerespectively We refer to these points as vectors Vectors aredenoted with lower case letters in bold and we place the 119909coordinate of a point next to the 119910 coordinate

a = (3 4) b = (minus2 minus3) (A1)

32 Complexity

minus4 minus2 0 2 4

minus4

minus2

02

4

x

y

a

b

x

y

Figure 12 Points 119886 and 119887 can be transformed into points 119901 and 119902 bya linear transformation

We can represent a and b jointly by placing them next to eachother in a matrix For matrices we use capital letters in boldfont

A = ( 3 4minus2 minus3) (A2)

In addition to the points 119886 and 119887 Figure 12 also shows twoother datapoints 119909 and 119910 The matrix for these two points is

B = (minus5 24 minus1) (A3)

Let us assume that points 119886 and 119887 represent the forms oftwo words 1205961 and 1205962 and that the points 119909 and 119910 representthe meanings of these two words (below we discuss in detailhow exactly wordsrsquo forms and meanings can be representedas vectors of numbers) As morphology is the study of therelation between wordsrsquo forms and their meanings we areinterested in how to transform points 119886 and b into points119909 and 119910 and likewise in how to transform points 119909 and119910 back into points 119886 and 119887 The first transformation isrequired for comprehension and the second transformationfor production

We begin with the transformation for comprehensionFormally we have matrix A which we want to transforminto matrix B A way of mapping A onto B is by means ofa linear transformation using matrix multiplication For 2 times2 matrices matrix multiplication is defined as follows(1198861 11988621198871 1198872) (1199091 11990921199101 1199102) = (11988611199091 + 11988621199101 11988611199092 + 1198862119910211988711199091 + 11988721199101 11988711199092 + 11988721199102) (A4)

Such a mapping of A onto B is given by a matrix F

F = ( 1 2minus2 minus1) (A5)

and it is straightforward to verify that indeed( 3 4minus2 minus3) ( 1 2minus2 minus1) = (minus5 24 minus1) (A6)

Using matrix notation we can write

AF = B (A7)

Given A and B how do we obtain F The answeris straightforward but we need two further concepts theidentity matrix and the matrix inverse For multiplication ofnumbers the identity multiplication is multiplication with 1For matrices multiplication with the identity matrix I leavesthe locations of the points unchangedThe identitymatrix hasones on the main diagonal and zeros elsewhere It is easilyverified that indeed(1 00 1) (1199091 11990921199101 1199102) = (1199091 11990921199101 1199102) (1 00 1) = (1199091 11990921199101 1199102) (A8)

For numbers the inverse of multiplication with 119904 is dividingby 119904 (1119904 ) (119904119909) = 119909 (A9)

For matrices the inverse of a square matrix X is that matrixXminus1 such that their product is the identity matrix

Xminus1X = XXminus1 = I (A10)

For nonsquare matrices the inverse Yminus1 is defined such that

Yminus1 (YX) = (XY) 119884minus1 = X (A11)

We find the square matrix F that maps the square matrices Aonto B as follows

AF = B

Aminus1AF = Aminus1B

IF = Aminus1B

F = Aminus1B(A12)

Since for the present example A and its inverse happen to beidentical (Aminus1 = A) we obtain (A6)

Fmaps wordsrsquo form vectors onto wordsrsquo semantic vectorsLet us now consider the reverse and see how we can obtainwordsrsquo form vectors from wordsrsquo semantic vectors That isgiven B we want to find that matrix G which maps B ontoA ie

BG = A (A13)

Solving this equation proceeds exactly as above

BG = A

Bminus1BG = Bminus1A

IG = Bminus1A

G = Bminus1A(A14)

Complexity 33

The inverse of B is

Bminus1 = (13 2343 53) (A15)

and hence we have

(13 2343 53) ( 3 4minus2 minus3) = (minus13 minus2323 13 ) = G (A16)

The inverse of a matrix needs not exist A square matrixthat does not have an inverse is referred to as a singularmatrix Almost all matrices with which we work in theremainder of this study are singular For singular matricesan approximation of the inverse can be used such as theMoore-Penrose generalized inverse (In R the generalizedinverse is available in theMASS package function ginv Thenumeric library that is used by ginv is LAPACK availableat httpwwwnetliborglapack) In this study wedenote the generalized inverse of a matrix X by X1015840

The examples presented here are restricted to 2 times 2matrices but themathematics generalize tomatrices of largerdimensions When multiplying two matrices X and Y theonly constraint is that the number of columns ofX is the sameas the number of rows ofYThe resultingmatrix has the samenumber of rows as X and the same number of columns as Y

In the present study the rows of matrices represent wordsand columns the features of thesewords For instance the rowvectors of A can represent wordsrsquo form features and the rowvectors of and B the corresponding semantic features Thefirst row vector of A is mapped onto the first row vector ofB as follows (3 4) ( 1 2minus2 minus1) = (minus5 2) (A17)

A transformationmatrix such as F can be represented as atwo-layer networkThe network corresponding to F is shownin Figure 13 When the input vector (3 4) is presented to thenetwork the nodes 1198941 and 1198942 are set to 3 and 4 respectively Toobtain the activations 1199001 = minus5 and 1199002 = 2 for the output vector(minus5 2) we cumulate the evidence at the input weighted bythe connection strengths specified on the edges of the graphThus for 1199001 we obtain the value 3 times 1 + 4 times minus2 = minus5 and for1199002 we have 3 times 2 + 4 times minus1 = 2 In other words the networkmaps the input vector (3 4) onto the output vector (minus5 2)

An important property of linear maps is that they areproductive We can present a novel form vector say

s = (2 2) (A18)

to the network and it will map this vector onto a novelsemantic vector Using matrices this new semantic vector isstraightforwardly calculated(2 2) ( 1 2minus2 minus1) = (minus2 2) (A19)

1 2 minus2 minus1

i1 i2

o1 o2

semantic vectors come out here

form features go in here

connection weightsare the transformationmatrix

Figure 13 The linear network corresponding to the transformationmatrix F There are two input nodes 1198941 and 1198942 (in blue)and twooutput nodes 1199001 and 1199002 in red When the input vector (3 4) ispresented to the network the value at 1199001 is 3 times 1 + 4 times minus2 = minus5and that at 1199002 is 3 times 2 + 4 times minus1 = 2 The network maps the point(3 4) onto the point (minus5 2)B Graph-Based Triphone Sequencing

The hypothesis underlying graph-based triphone sequencingis that in the directed graph that has triphones as verticesand an edge between any two vertices that overlap (iesegments 2 and 3 of triphone 1 are identical to segments 1and 2 of triphone 2) the path from a wordrsquos initial triphone(the triphone starting with ) to a wordrsquos final triphone (thetriphone ending with a ) is the path receiving the bestsupport of all possible paths from the initial to the finaltriphone Unfortunately the directed graph containing allvertices and edges is too large tomake computations tractablein a reasonable amount of computation time The followingalgorithm is a heuristic algorithm that first collects potentiallyrelevant vertices and edges and then calculates all pathsstarting fromany initial vertex using theall simple pathsfunction from the igraph package [127] As a first step allvertices that are supported by the stem and whole wordwith network support exceeding 01 are selected To theresulting set those vertices are added that are supported bythe affix which is conditional on these vertices having anetwork support exceeding 095 A directed graph can nowbe constructed and for each initial vertex all paths startingat these vertices are calculated The subset of those pathsreaching a final vertex is selected and the support for eachpath is calculated As longer paths trivially will have greatersupport the support for a path is weighted for its lengthsimply by dividing the raw support by the path length Pathsare ranked by weighted path support and the path withmaximal support is selected as the acoustic image drivingarticulation

It is possible that no path from an initial vertex to a finalvertex is found due to critical boundary triphones not beinginstantiated in the data on which the model was trainedThis happens when the model is assessed on novel inflected

34 Complexity

nobo

dyso

met

hing

ever

ythi

ngno

thin

gev

eryo

nean

ythi

ngan

yone

som

eone

hers

him his

via

than I

exce

ptou

tside

thei

rsw

hoev

erth

ence

min

ein

side

amon

gst its

thei

rou

rm

yyo

uyo

ur her

she

your

selv

esw

hate

ver us we it

they he

them m

em

yself

him

self

hers

elfou

rselv

esyo

urse

lfits

elfth

emse

lves

abou

tw

ithou

tw

ithin

beyo

ndam

ong

betw

een

thro

ugho

utdu

ring

since by

from in of

with at fo

r tobe

low

abov

e up offdo

wn

upon on

agai

nst

tow

ard

tow

ards

arou

ndne

aral

ong

unde

rbe

hind

besid

eth

roug

hon

toin

toov

erac

ross

nobodysomethingeverythingnothingeveryoneanythinganyonesomeonehershimhisviathanIexceptoutsidetheirswhoeverthencemineinsideamongstitstheirourmyyouyourhersheyourselveswhateverusweittheyhethemmemyselfhimselfherselfourselvesyourselfitselfthemselvesaboutwithoutwithinbeyondamongbetweenthroughoutduringsincebyfrominofwithatfortobelowaboveupoffdownupononagainsttowardtowardsaroundnearalongunderbehindbesidethroughontointooveracross

minus01 0 01 02Value

Color Keyand Histogram

050

015

0025

00C

ount

Figure 14 Heatmap for the correlations of the semantic vectors of pronouns and prepositions Note that quantifier pronouns form a groupof their own that prepositions and the other pronouns form distinguisable groups and that within groups further subclusters are visible egforwe and us she and her and across and over All diagonal elements represent correlations equal to 1 (this is not brought out by the heatmapwhich color-codes diagonal elements with the color for the highest correlation of the range we specified 025)

and derived words under cross-validationTherefore verticesand edges are added to the graph for all nonfinal verticesthat are at the end of any of the paths starting at any initialtriphone Such novel vertices and edges are assigned zeronetwork support Typically paths from the initial vertex tothe final vertex can now be constructed and path support isevaluated as above

For an algorithm that allows triphone paths to includecycles which may be the case in languages with much richer

morphology than English see Baayen et al [85] and for codethe R packageWpmWithLdl described therein

C A Heatmap for Function Words(Pronouns and Prepositions)

The heatmap for the correlations of the semantic vectors ofpronouns and prepositions is presented in Figure 14

Complexity 35

Data Availability

With the exception of the data from theDistributed Little RedHen Lab all primary data are publicly available from the citedsources

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The authors are indebted to Geoff Hollis Jessie Nixon ChrisWestbury and LuisMienhardt for their constructive feedbackon earlier versions of this manuscript This research wassupported by an ERC advancedGrant (no 742545) to the firstauthor

References

[1] G Frege ldquoBegriffsschrift a formula language modeled uponthat of arithmetic for pure thoughtrdquo in From Frege to GodelA Source Book in Mathematical Logic 1879-1931 pp 1ndash82 1967

[2] B RussellAn Inquiry intoMeaning andTruth Allen andUnwinLondon UK 1942

[3] R Montague ldquoThe proper treatment of quantification in ordi-nary englishrdquo in Approaches to Natural Language pp 221ndash242Springer 1973

[4] N Chomsky Syntactic Structures vol 33 Mouton The Hague1957

[5] W J M Levelt Formal Grammars in Linguistics And Psycholin-guistics Volume 1 An Introduction to the Theory of FormalLanguages and Automata Volume 2 Applications in LinguisticTheory Volume 3 Psycholinguistic Applications John BenjaminsPublishing 2008

[6] D F Kleinschmidt and T Florian Jaeger ldquoRobust speechperception Recognize the familiar generalize to the similar andadapt to the novelrdquo Psychological Review vol 122 no 2 pp 148ndash203 2015

[7] L Breiman ldquoRandom forestsrdquo Machine Learning vol 45 no 1pp 5ndash32 2001

[8] C Strobl A-L Boulesteix A Zeileis and T Hothorn ldquoBiasin random forest variable importance measures illustrationssources and a solutionrdquo BMC Bioinformatics vol 8 2007

[9] A Hannun C Case J Casper et al ldquoDeep speechScaling up end-to-end speech recognitionrdquo 2014httpsarxivorgabs14125567

[10] Z Tuske P Golik R Schluter and H Ney ldquoAcoustic modelingwith deep neural networks using raw time signal for lvcsrrdquo inProceedings of the INTERSPEECH pp 890ndash894 2014

[11] J Schmidhuber ldquoDeep learning in neural networks anoverviewrdquo Neural Networks vol 61 pp 85ndash117 2015

[12] T Hannagan J C Ziegler S Dufau J Fagot and J GraingerldquoDeep learning of orthographic representations in baboonsrdquoPLoS ONE vol 9 no 1 2014

[13] J Grainger S Dufau M Montant J C Ziegler and J FagotldquoOrthographic processing in baboons (Papio papio)rdquo Sciencevol 336 no 6078 pp 245ndash248 2012

[14] D Scarf K Boy A U Reinert J Devine O Gunturkun andM Colombo ldquoOrthographic processing in pigeons (columba

livia)rdquo Proceedings of the National Academy of Sciences vol 113no 40 pp 11272ndash11276 2016

[15] M Linke F Broker M Ramscar and R H Baayen ldquoArebaboons learning ldquoorthographicrdquo representations Probablynotrdquo PLoS ONE vol 12 no 8 2017

[16] D H Hubel and T N Wiesel ldquoReceptive fields binocularinteraction and functional architecture in the catrsquos visualcortexrdquo The Journal of Physiology vol 160 pp 106ndash154 1962

[17] R H Baayen and P Hendrix ldquoTwo-layer networks non-linear separation and human learningrdquo in From Semantics toDialectometry Festschrift in honor of JohnNerbonne Tributes 32M Wieling M Kroon G Van Noord and G Bouma Eds pp13ndash22 College Publications 2017

[18] D Arnold F Tomaschek K Sering F Lopez and R H BaayenldquoWords from spontaneous conversational speech can be rec-ognized with human-like accuracy by an error-driven learningalgorithm that discriminates between meanings straight fromsmart acoustic features bypassing the phoneme as recognitionunitrdquo PLoS ONE vol 12 no 4 2017

[19] P Zwitserlood ldquoProcessing and representation of morphologi-cal complexity in native language comprehension and produc-tionrdquo in The Construction of Words G E Booij Ed pp 583ndash602 Springer 2018

[20] G Booij ldquoConstruction morphologyrdquo Language and LinguisticsCompass vol 4 no 7 pp 543ndash555 2010

[21] J P Blevins ldquoWord-based morphologyrdquo Journal of Linguisticsvol 42 no 3 pp 531ndash573 2006

[22] G Stump Inflectional Morphology A Theory of ParadigmStructure Cambridge University Press 2001

[23] R Beard ldquoOn the extent and nature of irregularity in thelexiconrdquo Lingua vol 42 no 4 pp 305ndash341 1977

[24] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1974

[25] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1991

[26] C Hockett ldquoThe origin of speechrdquo Scientific American vol 203pp 89ndash97 1960

[27] M Erelt Estonian Language M Erelt Ed Estonian AcademyPublishers Tallinn Estonia 2003

[28] P H Matthews Grammatical Theory in the United States fromBloomfield to Chomsky Cambridge University Press Cam-bridge 1993

[29] N Chomsky andMHalleTheSound Pattern of English Harperand Row NY USA 1968

[30] T Landauer and S Dumais ldquoA solution to Platorsquos problem Thelatent semantic analysis theory of acquisition induction andrepresentation of knowledgerdquo Psychological Review vol 104 no2 pp 211ndash240 1997

[31] W Weaver ldquoTranslationrdquo in Machine Translation of LanguagesFourteen Essays W N Locke and A D Booth Eds pp 15ndash23MIT Press Cambridge 1955

[32] J R Firth Selected Papers of J R Firth 1952-59 IndianaUniversity Press 1968

[33] K Lund and C Burgess ldquoProducing high-dimensional seman-tic spaces from lexical co-occurrencerdquo Behavior Research Meth-ods Instruments and Computers vol 28 no 2 pp 203ndash2081996

[34] C Shaoul and C Westbury ldquoExploring lexical co-occurrencespace using HiDExrdquo Behavior Research Methods vol 42 no 2pp 393ndash413 2010

36 Complexity

[35] T Mikolov I Sutskever K Chen G Corrado and J DeanldquoDistributed representations of words and phrases and theircompositionalityrdquo inAdvances in Neural Information ProcessingSystems pp 3111ndash3119 2013

[36] A Roelofs ldquoThe WEAVER model of word-form encoding inspeech productionrdquo Cognition vol 64 no 3 pp 249ndash284 1997

[37] W J M Levelt A Roelofs and A S Meyer ldquoA theory of lexicalaccess in speech productionrdquoBehavioral and Brain Sciences vol22 no 1 pp 1ndash38 1999

[38] M Taft ldquoInteractive-activation as a Framework for Under-standing Morphological Processingrdquo Language and CognitiveProcesses vol 9 no 3 pp 271ndash294 1994

[39] R Schreuder and R H Baayen ldquoModeling morphologicalprocessingrdquo inMorphological Aspects of Language Processing LB Feldman Ed pp 131ndash154 Lawrence ErlbaumHillsdale NewJersey USA 1995

[40] B C Love D L Medin and T M Gureckis ldquoSUSTAIN ANetwork Model of Category Learningrdquo Psychological Reviewvol 111 no 2 pp 309ndash332 2004

[41] C J Marsolek ldquoWhat antipriming reveals about primingrdquoTrends in Cognitive Sciences vol 12 no 5 pp 176ndash181 2008

[42] M Ramscar and R Port ldquoCategorization (without categories)rdquoin Handbook of Cognitive Linguistics E Dabrowska and DDivjak Eds pp 75ndash99 De Gruyter Berlin Germany 2015

[43] R A Rescorla and A R Wagner ldquoA theory of Pavlovianconditioning Variations in the effectiveness of reinforcementand nonreinforcementrdquo in Classical Conditioning II CurrentResearch and Theory A H Black and W F Prokasy Eds pp64ndash99 Appleton Century Crofts NY USA 1972

[44] B Widrow and M E Hoff ldquoAdaptive switching circuitsrdquo 1960WESCON Convention Record Part IV pp 96ndash104 1960

[45] R H Baayen R Schreuder and R Sproat ldquoMorphology inthe Mental Lexicon A Computational Model for Visual WordRecognitionrdquo in Lexicon Development for Speech and LanguageProcessing F van Eynde and D Gibbon Eds pp 267ndash291Kluwer Academic Publishers 2000

[46] M Taft ldquoA morphological-decomposition model of lexicalrepresentationrdquo Linguistics vol 26 no 4 pp 657ndash667 1988

[47] MWHarmandM S Seidenberg ldquoComputing themeanings ofwords in reading Cooperative division of labor between visualand phonological processesrdquo Psychological Review vol 111 no3 pp 662ndash720 2004

[48] P C Trimmer J M McNamara A Houston and J A RMarshall ldquoDoes natural selection favour the Rescorla-Wagnerrulerdquo Journal of Theoretical Biology vol 302 pp 39ndash52 2012

[49] R R Miller R C Barnet and N J Grahame ldquoAssessment ofthe Rescorla-Wagner modelrdquo Psychological Bulletin vol 117 no3 pp 363ndash386 1995

[50] L J Kamin ldquoPredictability surprise attention and condition-ingrdquo in Punishment and Aversive Behavior B A Campbell andRMChurch Eds pp 276ndash296Appleton-Century-CroftsNYUSA 1969

[51] R A Rescorla ldquoPavlovian conditioning Itrsquos not what you thinkit isrdquo American Psychologist vol 43 no 3 pp 151ndash160 1988

[52] M Ramscar D Yarlett M Dye K Denny and K ThorpeldquoThe Effects of Feature-Label-Order and Their Implications forSymbolic Learningrdquo Cognitive Science vol 34 no 6 pp 909ndash957 2010

[53] M Ramscar and D Yarlett ldquoLinguistic self-correction in theabsence of feedback A new approach to the logical problem oflanguage acquisitionrdquo Cognitive Science vol 31 no 6 pp 927ndash960 2007

[54] R H Baayen P Milin D F ETHurđevic P Hendrix and MMarelli ldquoAn Amorphous Model for Morphological Process-ing in Visual Comprehension Based on Naive DiscriminativeLearningrdquo Psychological Review vol 118 no 3 pp 438ndash4812011

[55] F M Del Prado Martın R Bertram T Haikio R Schreuderand R H Baayen ldquoMorphological family size in a morphologi-cally rich languageThe case of Finnish compared to Dutch andHebrewrdquo Journal of Experimental Psychology LearningMemoryand Cognition vol 30 pp 1271ndash1278 2004

[56] P Milin D F Durdevic and F M Del Prado Martın ldquoThesimultaneous effects of inflectional paradigms and classes onlexical recognition Evidence from Serbianrdquo Journal of Memoryand Language vol 60 no 1 pp 50ndash64 2009

[57] B K Bergen ldquoThe psychological reality of phonaesthemesrdquoLanguage vol 80 no 2 pp 290ndash311 2004

[58] P Milin L B Feldman M Ramscar P Hendrix and R HBaayen ldquoDiscrimination in lexical decisionrdquo PLoS ONE vol 12no 2 2017

[59] R H Baayen P Milin and M Ramscar ldquoFrequency in lexicalprocessingrdquo Aphasiology vol 30 no 11 pp 1174ndash1220 2016

[60] M Marelli and M Baroni ldquoAffixation in semantic space Mod-eling morpheme meanings with compositional distributionalsemanticsrdquo Psychological Review vol 122 no 3 pp 485ndash5152015

[61] T Sering PMilin andRH Baayen ldquoLanguage comprehensionas amultiple label classification problemrdquo StatisticaNeerlandicapp 1ndash15 2018

[62] R Kaye and RWilson Linear Algebra Oxford University Press1998

[63] S H Ivens and B L Koslin Demands for Reading LiteracyRequire New Accountability Methods Touchstone Applied Sci-ence Associates 1991

[64] T K Landauer P W Foltz and D Laham ldquoAn introduction tolatent semantic analysisrdquoDiscourse Processes vol 25 no 2-3 pp259ndash284 1998

[65] H Schmid ldquoImprovements in Part-of-Speech Tagging with anApplication to Germanrdquo in Proceedings of the ACL SIGDAT-Workshop pp 13ndash25 Dublin Ireland 1995

[66] R H Baayen R Piepenbrock and L Gulikers The CELEXLexical Database (CD-ROM) Linguistic Data ConsortiumUniversity of Pennsylvania Philadelphia PA 1995

[67] J Mitchell and M Lapata ldquoVector-based models of semanticcompositionrdquo in Proceedings of the ACL pp 236ndash244 2008

[68] A Lazaridou M Marelli R Zamparelli and M BaronildquoCompositionally derived representations of morphologicallycomplex words in distributional semanticsrdquo in Proceedings ofthe ACL (1) pp 1517ndash1526 2013

[69] R Cotterell H Schutze and J Eisner ldquoMorphological smooth-ing and extrapolation of word embeddingsrdquo in Proceedings ofthe 54th Annual Meeting of the Association for ComputationalLinguistics vol 1 pp 1651ndash1660 2016

[70] B D Zeller S Pado and J Snajder ldquoTowards semantic valida-tion of a derivational lexiconrdquo in Proceedings of the COLING2014 the 25th International Conference on Computational Lin-guistics Technical Papers pp 1728ndash1739 2014

[71] H Marchand The Categories and Types of Present-Day EnglishWord Formation A Synchronic-Diachronic Approach BeckrsquoscheVerlagsbuchhandlung Munchen Germany 1969

[72] L Bauer EnglishWord-Formation Cambridge University PressCambridge 1983

Complexity 37

[73] I PlagWord-Formation in English CambridgeUniversity PressCambridge 2003

[74] R Beard Lexeme-Morpheme Base Morphology A GeneralTheory of Inflection and Word Formation State University ofNew York Press Albany NY USA 1995

[75] K Geeraert J Newman and R H Baayen ldquoIdiom VariationExperimental Data and a Blueprint of a Computational ModelrdquoTopics in Cognitive Science vol 9 no 3 pp 653ndash669 2017

[76] C Shaoul S Bitschau N Schilling et al ldquondl2 Naive discrim-inative learning an implementation in R R packagerdquo 2015

[77] P Milin D Divjak and R H Baayen ldquoA learning perspectiveon individual differences in skilled reading Exploring andexploiting orthographic and semantic discrimination cuesrdquoJournal of Experimental Psychology Learning Memory andCognition vol 43 no 11 pp 1730ndash1751 2017

[78] M Ramscar P Hendrix C Shaoul P Milin and R H BaayenldquoNonlinear dynamics of lifelong learning the myth of cognitivedeclinerdquo Topics in Cognitive Science vol 6 pp 5ndash42 2014

[79] M Ramscar C C Sun P Hendrix and R H Baayen ldquoTheMis-measurement of Mind Life-Span Changes in Paired-Associate-Learning Scores Reflect the ldquoCostrdquo of Learning Not CognitiveDeclinerdquo Psychological Science vol 28 no 8 pp 1171ndash1179 2017

[80] G desRosiers and D Ivison ldquoPaired associate learning Form 1and form 2 of the Wechsler memory scalerdquo Archives of ClinicalNeuropsychology vol 3 no 1 pp 47ndash67 1988

[81] E Bruni N K Tran andM Baroni ldquoMultimodal distributionalsemanticsrdquo Journal of Artificial Intelligence Research vol 49 pp1ndash47 2014

[82] W N Venables and B D RipleyModern Applied Statistics withS-PLUS Springer NY USA 2002

[83] A B Warriner V Kuperman and M Brysbaert ldquoNorms ofvalence arousal and dominance for 13915 English lemmasrdquoBehavior Research Methods vol 45 no 4 pp 1191ndash1207 2013

[84] D Kastovsky ldquoProductivity in word formationrdquo Linguistics vol24 no 3 pp 585ndash600 1986

[85] R H Baayen Y Chuang and J P Blevins ldquoInflectionalmorphology with linear mappingsrdquoTheMental Lexicon vol 13no 2 pp 232ndash270 2018

[86] Burnage CELEX A Guide for Users Centre for Lexical Infor-mation Nijmegen The Netherlands 1988

[87] C McBride-Chang ldquoModels of Speech Perception and Phono-logical Processing in Readingrdquo Child Development vol 67 no4 pp 1836ndash1856 1996

[88] G C Van Orden H Kloos et al ldquoThe question of phonologyand readingrdquo inThe Science of Reading A Handbook pp 61ndash782005

[89] D Jared and K OrsquoDonnell ldquoSkilled adult readers activate themeanings of high-frequency words using phonology Evidencefrom eye trackingrdquoMemoryampCognition vol 45 no 2 pp 334ndash346 2017

[90] G S Dell ldquoA Spreading-Activation Theory of Retrieval inSentence Productionrdquo Psychological Review vol 93 no 3 pp283ndash321 1986

[91] A Marantz ldquoNo escape from morphemes in morphologicalprocessingrdquo Language and Cognitive Processes vol 28 no 7 pp905ndash916 2013

[92] M Bozic W D Marslen-Wilson E A Stamatakis M HDavis and L K Tyler ldquoDifferentiating morphology formand meaning Neural correlates of morphological complexityrdquoCognitive Neuroscience vol 19 no 9 pp 1464ndash1475 2007

[93] R F Port andA P Leary ldquoAgainst formal phonologyrdquoLanguagevol 81 no 4 pp 927ndash964 2005

[94] J P Blevins ldquoStems and paradigmsrdquo Language vol 79 no 4 pp737ndash767 2003

[95] R Forsyth and D Holmes ldquoFeature-finding for test classifica-tionrdquo Literary and Linguistic Computing vol 11 no 4 pp 163ndash174 1996

[96] H Pham and R H Baayen ldquoVietnamese compounds showan anti-frequency effect in visual lexical decisionrdquo LanguageCognition and Neuroscience vol 30 no 9 pp 1077ndash1095 2015

[97] L Cohen and S Dehaene ldquoVentral and dorsal contributions toword readingrdquo inTheCognitive Neurosciences M S GazzanigaEd pp 789ndash804 Massachusetts Institute of Technology 2009

[98] E Keuleers P Lacey K Rastle and M Brysbaert ldquoThe BritishLexicon Project Lexical decision data for 28730 monosyllabicand disyllabic English wordsrdquo Behavior Research Methods vol44 no 1 pp 287ndash304 2012

[99] R K Wagner J K Torgesen and C A Rashotte ldquoDevel-opment of reading-related phonological processing abilitiesnew evidence of bidirectional causality from a latent variablelongitudinal studyrdquoDevelopmental Psychology vol 30 no 1 pp73ndash87 1994

[100] K F EWong andH-C Chen ldquoOrthographic and phonologicalprocessing in reading Chinese text Evidence from eye fixa-tionsrdquo Language and Cognitive Processes vol 14 no 5-6 pp461ndash480 1999

[101] R L Newman D Jared and C A Haigh ldquoDoes phonologyplay a role when skilled readers read high-frequency wordsEvidence from ERPsrdquo Language and Cognitive Processes vol 27no 9 pp 1361ndash1384 2012

[102] D Jared J Ashby S J Agauas and B A Levy ldquoPhonologicalactivation of word meanings in grade 5 readersrdquo Journal ofExperimental Psychology LearningMemory andCognition vol42 no 4 pp 524ndash541 2016

[103] T Bitan A Kaftory A Meiri-Leib Z Eviatar and O PelegldquoPhonological ambiguity modulates resolution of semanticambiguity during reading An fMRI study of Hebrewrdquo Neu-ropsychology vol 31 no 7 pp 759ndash777 2017

[104] D Jared and S Bainbridge ldquoReading homophone puns Evi-dence from eye trackingrdquo Canadian Journal of ExperimentalPsychology vol 71 no 1 pp 2ndash13 2017

[105] S Amenta M Marelli and S Sulpizio ldquoFrom sound tomeaning Phonology-to-Semantics mapping in visual wordrecognitionrdquo Psychonomic Bulletin amp Review vol 24 no 3 pp887ndash893 2017

[106] M Perrone-Bertolotti J Kujala J R Vidal et al ldquoHow silent issilent reading intracerebral evidence for top-down activationof temporal voice areas during readingrdquo The Journal of Neuro-science vol 32 no 49 pp 17554ndash17562 2012

[107] B Yao P Belin andC Scheepers ldquoSilent reading of direct versusindirect speech activates voice-selective areas in the auditorycortexrdquo Journal of Cognitive Neuroscience vol 23 no 10 pp3146ndash3152 2011

[108] M Coltheart B Curtis P Atkins and M Haller ldquoMod-els of Reading Aloud Dual-Route and Parallel-Distributed-Processing Approachesrdquo Psychological Review vol 100 no 4pp 589ndash608 1993

[109] M Coltheart ldquoModeling reading The dual-route approachrdquo inThe Science of Reading A Handbook pp 6ndash23 2005

38 Complexity

[110] M Pitt K Johnson E Hume S Kiesling and W RaymondldquoThe Buckeye corpus of conversational speech Labeling con-ventions and a test of transcriber reliabilityrdquo Speech Communi-cation vol 45 no 1 pp 89ndash95 2005

[111] K Johnson ldquoMassive reduction in conversational AmericanEnglishrdquo in Proceedings of the Spontaneous Speech Data andAnalysis Proceedings of the 1st Session of the 10th InternationalSymposium pp 29ndash54 The National International Institute forJapanese Language Tokyo Japan 2004

[112] C Phillips ldquoLevels of representation in the electrophysiology ofspeech perceptionrdquo Cognitive Science vol 25 no 5 pp 711ndash7312001

[113] R L Diehl A J Lotto and L L Holt ldquoSpeech perceptionrdquoAnnual Review of Psychology vol 55 pp 149ndash179 2004

[114] D Norris and J M McQueen ldquoShortlist B a Bayesian model ofcontinuous speech recognitionrdquo Psychological Review vol 115no 2 pp 357ndash395 2008

[115] S Hawkins ldquoRoles and representations of systematic finephonetic detail in speech understandingrdquo Journal of Phoneticsvol 31 no 3-4 pp 373ndash405 2003

[116] CCucchiarini andH Strik ldquoAutomatic Phonetic TranscriptionAn overviewrdquo in Proceedings of the 15th ICPhS pp 347ndash350Barcelona Spain 2003

[117] K JohnsonTheAuditoryPerceptual Basis for Speech Segmenta-tion Ohio State University Working Papers in Linguistics 1997

[118] H Fletcher ldquoAuditory patternsrdquo Reviews of Modern Physics vol12 no 1 pp 47ndash65 1940

[119] D Arnold ldquoAcousticndlcoder Coding sound files for use withndl R package version 101rdquo 2017

[120] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2016

[121] J Pickett and I Pollack ldquoIntelligibility of excerpts from fluentspeech Effects of rate of utterance and duration of excerptrdquoLanguage and Speech vol 6 pp 151ndash164 1963

[122] L Shockey ldquoPerception of reduced forms by non-native speak-ers of Englishrdquo in Sound Patterns of Spontaneous Speech DDuez Ed pp 97ndash100 ESCA Aix 1998

[123] M Ernestus H Baayen and R Schreuder ldquoThe recognition ofreduced word formsrdquo Brain and Language vol 81 no 1-3 pp162ndash173 2002

[124] R H Baayen C Shaoul J Willits and M Ramscar ldquoCom-prehension without segmentation a proof of concept withnaive discriminative learningrdquo Language Cognition and Neu-roscience vol 31 no 1 pp 106ndash128 2016

[125] C P Browman and L Goldstein ldquoArticulatory phonology Anoverviewrdquo Phonetica vol 49 no 3-4 pp 155ndash180 1992

[126] J A Tourville and F H Guenther ldquoThe DIVA model A neuraltheory of speech acquisition and productionrdquo Language andCognitive Processes vol 26 no 7 pp 952ndash981 2011

[127] G Csardi and T Nepusz ldquoThe igraph software package forcomplex network researchrdquo InterJournal Complex Systems vol1695 no 5 pp 1ndash9 2006

[128] M Ernestus and R H Baayen ldquoPredicting the unpredictableInterpreting neutralized segments in Dutchrdquo Language vol 79no 1 pp 5ndash38 2003

[129] R Weingarten G Nottbusch and U Will ldquoMorphemes sylla-bles and graphemes in written word productionrdquo inMultidisci-plinary Approaches to Speech Production T Pechmann and CHabel Eds pp 529ndash572 Mouton de Gruyter Berlin Germany2004

[130] R Bertram F E Toslashnnessen S Stromqvist J Hyona andP Niemi ldquoCascaded processing in written compound wordproductionrdquo Frontiers in Human Neuroscience vol 9 pp 1ndash102015

[131] T Cho ldquoEffects of morpheme boundaries on intergesturaltiming Evidence from Koreanrdquo Phonetica vol 58 no 3 pp129ndash162 2001

[132] S N Wood Generalized Additive Models Chapman ampHallCRC NY USA 2017

[133] M Seidenberg ldquoSublexical structures in visual word recogni-tion Access units or orthographic redundancyrdquo in Attentionand Performance XII M Coltheart Ed pp 245ndash264 LawrenceErlbaum Associates Hove 1987

[134] J M McQueen ldquoSegmentation of continuous speech usingphonotacticsrdquo Journal of Memory and Language vol 39 no 1pp 21ndash46 1998

[135] J B Hay Causes and Consequences of Word Structure Rout-ledge NY USA London UK 2003

[136] J B Hay ldquoFrom speech perception to morphology Affix-ordering revisitedrdquo Language vol 78 pp 527ndash555 2002

[137] J B Hay and R H Baayen ldquoPhonotactics parsing and produc-tivityrdquo Italian Journal of Linguistics vol 15 no 1 pp 99ndash1302003

[138] J Hay J Pierrehumbert and M Beckman ldquoSpeech perceptionwell-formedness and the statistics of the lexiconrdquo Papers inlaboratory phonology VI pp 58ndash74 2004

[139] M Ferro C Marzi and V Pirrelli ldquoA self-organizing modelof word storage and processing implications for morphologylearningrdquo Lingue e Linguaggio vol 10 no 2 pp 209ndash226 2011

[140] F ChersiM Ferro G Pezzulo andV Pirrelli ldquoTopological self-organization and prediction learning support both action andlexical chains in the brainrdquo Topics in Cognitive Science vol 6no 3 pp 476ndash491 2014

[141] V Pirrelli M Ferro and C Marzi ldquoComputational complexityof abstractive morphologyrdquo in Understanding and MeasuringMorphological Complexity M Bearman D Brown and G GCorbett Eds pp 141ndash166 Oxford University Press 2015

[142] C Marzi M Ferro and V Pirrelli ldquoIs inflectional irregularitydysfunctional to human processingrdquo in Abstract Booklet TheMental Lexicon 2018 V Kuperman Ed University of AlbertaEdmonton 2018

[143] E Keuleers M Stevens P Mandera and M Brysbaert ldquoWordknowledge in the crowd Measuring vocabulary size and wordprevalence in a massive online experimentrdquo The QuarterlyJournal of Experimental Psychology vol 68 no 8 pp 1665ndash16922015

[144] M Ernestus Voice Assimilation And Segment Reduction inCasual Dutch A Corpus-Based Study of the Phonology-PhoneticsInterface LOT Utrecht the Netherlands 2000

[145] R Kemps M Ernestus R Schreuder and H Baayen ldquoProcess-ing reducedword formsThe suffix restoration effectrdquoBrain andLanguage vol 90 no 1-3 pp 117ndash127 2004

[146] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) vol 1 pp 886ndash893 June 2005

[147] G Hickok ldquoThe architecture of speech production and the roleof the phoneme in speech processingrdquo Language Cognition andNeuroscience vol 29 no 1 pp 2ndash20 2014

[148] B V Tucker M Sims and R H Baayen Opposing Forceson Acoustic Duration Manuscript University of Alberta andUniversity of Tubingen 2018

Complexity 39

[149] I Plag J Homann and G Kunter ldquoHomophony and mor-phology The acoustics of word-final S in Englishrdquo Journal ofLinguistics vol 53 no 1 pp 181ndash216 2017

[150] F Tomaschek I Plag M Ernestus and R H BaayenModeling the Duration of Word-Final s in English withNaive Discriminative Learning Manuscript University ofSiegenTubingenNijmegen 2018

[151] R E Kalman ldquoA new approach to linear filtering and predictionproblemsrdquo Journal of Fluids Engineering vol 82 no 1 pp 35ndash451960

[152] B Russell ldquoOn denotingrdquo Mind vol 14 no 56 pp 479ndash4931905

[153] N Hornstein Logical Form FromGB toMinimalism Blackwell1995

[154] R Jackendoff Semantic StructuresMITPress Cambridge 1990[155] J J McCarthy ldquoA prosodic theory of non-concatenative mor-

phologyrdquo Linguistic Inquiry vol 12 pp 373ndash418 1981[156] A Ussishkin ldquoA Fixed Prosodic Theory of Nonconcatenative

Templaticmorphologyrdquo Natural Language amp Linguistic Theoryvol 23 no 1 pp 169ndash218 2005

[157] A Ussishkin ldquoAffix-favored contrast inequity and psycholin-guistic grounding for non-concatenative morphologyrdquo Mor-phology vol 16 no 1 pp 107ndash125 2006

[158] RW Young andWMorganTheNavajo Language A Grammarand Colloquial Dictionary University of New Mexico Press1980

[159] S C Levinson and A Majid ldquoThe island of time Yelı Dnye thelanguage of Rossel islandrdquo Frontiers in psychology vol 4 2013

[160] S C Levinson ldquoThe language of space in Yelı Dnyerdquo inGrammars of Space Explorations in Cognitive Diversity pp 157ndash203 Cambridge University Press 2006

[161] E Shafaei-Bajestan and R H Baayen ldquoWide Learning forAuditory Comprehensionrdquo in Proceedings of the Interspeech2018 2018

[162] J Hay and K Drager ldquoStuffed toys and speech perceptionrdquoLinguistics vol 48 no 4 pp 865ndash892 2010

[163] V Henri and C Henri ldquoOn our earliest recollections ofchildhoodrdquo Psychological Review vol 2 pp 215-216 1895

[164] S Freud ldquoChildhood and concealing memoriesrdquo in The BasicWritings of Sigmund Freud The Modern Library NY USA19051953

[165] P J Bauer and M Larkina ldquoThe onset of childhood amnesiain childhood A prospective investigation of the course anddeterminants of forgetting of early-life eventsrdquoMemory vol 22no 8 pp 907ndash924 2014

[166] M S Seidenberg and L M Gonnerman ldquoExplaining deriva-tional morphology as the convergence of codesrdquo Trends inCognitive Sciences vol 4 no 9 pp 353ndash361 2000

[167] M Coltheart K Rastle C Perry R Langdon and J ZieglerldquoDRC a dual route cascaded model of visual word recognitionand reading aloudrdquoPsychological Review vol 108 no 1 pp 204ndash256 2001

[168] M V Butz and E F Kutter How the Mind Comes into BeingIntroducing Cognitive Science from a Functional and Computa-tional Perspective Oxford University Press 2016

[169] M W Harm and M S Seidenberg ldquoPhonology readingacquisition and dyslexia Insights from connectionist modelsrdquoPsychological Review vol 106 no 3 pp 491ndash528 1999

[170] L M Gonnerman M S Seidenberg and E S AndersenldquoGraded semantic and phonological similarity effects in prim-ing Evidence for a distributed connectionist approach to

morphologyrdquo Journal of Experimental Psychology General vol136 no 2 pp 323ndash345 2007

[171] H Zhao O Gallo I Frosio and J Kautz ldquoLoss Functions forImage Restoration With Neural Networksrdquo IEEE Transactionson Computational Imaging vol 3 no 1 pp 47ndash57 2017

[172] C Clarke and P Luce ldquoPerceptual adaptation to speakercharacteristics Vot boundaries in stop voicing categorizationrdquoin Proceedings of the ISCA Workshop on Plasticity in SpeechPerception 2005

[173] DNorris JMMcQueen andACutler ldquoPerceptual learning inspeechrdquo Cognitive Psychology vol 47 no 2 pp 204ndash238 2003

[174] A Schweitzer and N Lewandowski ldquoSocial factors in conver-gence of f1 and f2 in spontaneous speechrdquo in Proceedings ofthe 10th International Seminar on Speech Production CologneGermany 2014

[175] M J Pickering and S Garrod ldquoToward a mechanistic psychol-ogy of dialoguerdquo Behavioral and Brain Sciences vol 27 no 2 pp169ndash190 2004

[176] C M Clarke and M F Garrett ldquoRapid adaptation to foreign-accented Englishrdquo The Journal of the Acoustical Society ofAmerica vol 116 no 6 pp 3647ndash3658 2004

[177] W J Levelt ldquoDie konnektionistische moderdquo Sprache UndKognition vol 10 no 2 pp 61ndash72 1991

[178] A Kostic ldquoInformational load constraints on processinginflected morphologyrdquo in Morphological Aspects of LanguageProcessing L B Feldman Ed Lawrence Erlbaum Inc Publish-ers New Jersey USA 1995

[179] J P BlevinsWord andParadigmMorphology OxfordUniversityPress 2016

[180] E Forstemann Uber Deutsche volksetymologie Zeitschrift furvergleichende Sprachforschung auf dem Gebiete des DeutschenGriechischen und Lateinischen 1852

[181] K Nault Morphological Therapy Protocal [Ph D thesis] Uni-versity of Alberta Edmonton 2010

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 23: The Discriminative Lexicon: A Unified Computational Model ...downloads.hindawi.com/journals/complexity/2019/4895891.pdf · as developed within computational linguistics and com-puter

Complexity 23

cross-validation was carried out for a total of 6236 inflectedforms

As before the semantic vectors for inflected words wereobtained by addition of the corresponding content and inflec-tional semantic vectors For each training set we calculatedthe transformationmatrixG from the S andTmatrices of thattraining set For an out-of-bag inflected form in the test setwe calculated its semantic vector and multiplied this vectorwith the transformation matrix (using (21)) to obtain thepredicted triphone vector t

The proportion of forms that were predicted correctlywas 062 The proportion of forms ranked second was 025Forms that were incorrectly ranked first typically were otherinflected forms (including bare stems) that happened toreceive stronger support than the targeted form Such errorsare not uncommon in spoken English For instance inthe Buckeye corpus [110] closest is once reduced to closFurthermore Dell [90] classified forms such as concludementfor conclusion and he relax for he relaxes as (noncontextual)errors

The algorithm failed to produce the targeted form for3 of the cases Examples of the forms produced instead arethe past tense for blazed being realized with [zId] insteadof [zd] the plural mouths being predicted as having [Ts] ascoda rather than [Ds] and finest being reduced to finst Thevoiceless production formouth does not follow the dictionarynorm but is used as attested by on-line pronunciationdictionaries Furthermore the voicing alternation is partlyunpredictable (see eg [128] for final devoicing in Dutch)and hence model performance here is not unreasonable Wenext consider model accuracy for derived words

523 Performance on Derived Words When building thevector space model we distinguished between inflectionand word formation Inflected words did not receive theirown semantic vectors By contrast each derived word wasassigned its own lexome together with a lexome for itsderivational function Thus happiness was paired with twolexomes happiness and ness Since the semantic matrix forour dataset already contains semantic vectors for derivedwords we first investigated how well forms are predictedwhen derived words are assessed along with monolexomicwords without any further differentiation between the twoTo this end we constructed a semantic matrix S for 4885words (rows) by 4993 (columns) and constructed the trans-formation matrix G from this matrix and the correspondingtriphone matrix T The predicted form vectors of T =SG supported the targeted triphones above all other tri-phones without exception Furthermore the graph-basedalgorithm correctly reconstructed 99 of all forms withonly 5 cases where it assigned the correct form secondrank

Next we inspected how the algorithm performs whenthe semantic vectors of derived forms are obtained fromthe semantic vectors of their base words and those of theirderivational lexomes instead of using the semantic vectors ofthe derived words themselves To allow subsequent evalua-tion by cross-validation we selected those derived words that

contained an affix that occured at least 30 times in our data set(again (38) agent (177) ful (45) instrument (82) less(54) ly (127) and ness (57)) to a total of 770 complex words

We first combined these derived words with the 3987monolexomic words For the resulting 4885 words the Tand S matrices were constructed from which we derivedthe transformation matrix G and subsequently the matrixof predicted triphone strengths T The proportion of wordsfor which the targeted triphones were the best supportedtriphones was 096 and the graph algorithm performed withan accuracy of 989

In order to assess the productivity of the system weevaluated performance on derived words by means of 10-fold cross-validation For each fold we made sure that eachaffix was present proportionally to its frequency in the overalldataset and that a derived wordrsquos base word was included inthe training set

We first examined performance when the transformationmatrix is estimated from a semantic matrix that contains thesemantic vectors of the derived words themselves whereassemantic vectors for unseen derived words are obtained bysummation of the semantic vectors of base and affix It turnsout that this set-up results in a total failure In the presentframework the semantic vectors of derived words are tooidiosyncratic and too finely tuned to their own collocationalpreferences They are too scattered in semantic space tosupport a transformation matrix that supports the triphonesof both stem and affix

We then used exactly the same cross-validation procedureas outlined above for inflected words constructing semanticvectors for derived words from the semantic vectors of baseand affix and calculating the transformation matrix G fromthese summed vectors For unseen derivedwordsGwas usedto transform the semantic vectors for novel derived words(obtained by summing the vectors of base and affix) intotriphone space

For 75 of the derived words the graph algorithmreconstructed the targeted triphones For 14 of the derivednouns the targeted form was not retrieved These includecases such as resound which the model produced with [s]instead of the (unexpected) [z] sewer which the modelproduced with R instead of only R and tumbler where themodel used syllabic [l] instead of nonsyllabic [l] given in thecelex target form

53 Weak Links in the Triphone Graph and Delays in SpeechProduction An important property of the model is that thesupport for trigrams changes where a wordrsquos graph branchesout for different morphological variants This is illustratedin Figure 9 for the inflected form blending Support for thestem-final triphone End is still at 1 but then blend can endor continue as blends blended or blending The resultinguncertainty is reflected in the weights on the edges leavingEnd In this example the targeted ing form is driven by theinflectional semantic vector for continuous and hence theedge to ndI is best supported For other forms the edgeweights will be different and hence other paths will be bettersupported

24 Complexity

1

1

1051

024023

03

023

1

001 001

1

02

03

023

002

1 0051

bl

blE

lEn

EndndI

dIN

IN

INI

NIN

dId

Id

IdI

dII

IIN

ndz

dz

dzI

zIN

nd

EnI

nIN

lEI

EIN

Figure 9 The directed graph for blending Vertices represent triphones including triphones such as IIN that were posited by the graphalgorithm to bridge potentially relevant transitions that are not instantiated in the training data Edges are labelled with their edge weightsThe graph the edge weights of which are specific to the lexomes blend and continuous incorporates not only the path for blending butalso the paths for blend blends and blended

The uncertainty that arises at branching points in thetriphone graph is of interest in the light of several exper-imental results For instance inter keystroke intervals intyping become longer at syllable and morph boundaries[129 130] Evidence from articulography suggests variabilityis greater at morph boundaries [131] Longer keystroke exe-cution times and greater articulatory variability are exactlywhat is expected under reduced edge support We thereforeexamined whether the edge weights at the first branchingpoint are predictive for lexical processing To this end weinvestigated the acoustic duration of the segment at the centerof the first triphone with a reduced LDL edge weight (in thepresent example d in ndI henceforth lsquobranching segmentrsquo)for those words in our study that are attested in the Buckeyecorpus [110]

The dataset we extracted from the Buckeye corpus com-prised 15105 tokens of a total of 1327 word types collectedfrom 40 speakers For each of these words we calculatedthe relative duration of the branching segment calculatedby dividing segment duration by word duration For thepurposes of statistical evaluation the distribution of thisrelative duration was brought closer to normality bymeans ofa logarithmic transformationWe fitted a generalized additivemixed model [132] to log relative duration with randomintercepts for word and speaker log LDL edge weight aspredictor of interest and local speech rate (Phrase Rate)log neighborhood density (Ncount) and log word frequencyas control variables A summary of this model is presentedin Table 7 As expected an increase in LDL Edge Weight

goes hand in hand with a reduction in the duration of thebranching segment In other words when the edge weight isreduced production is slowed

The adverse effects of weaknesses in a wordrsquos path inthe graph where the path branches is of interest againstthe discussion about the function of weak links in diphonetransitions in the literature on comprehension [133ndash138] Forcomprehension it has been argued that bigram or diphonelsquotroughsrsquo ie low transitional probabilities in a chain of hightransitional probabilities provide points where sequencesare segmented and parsed into their constituents Fromthe perspective of discrimination learning however low-probability transitions function in exactly the opposite wayfor comprehension [54 124] Naive discriminative learningalso predicts that troughs should give rise to shorter process-ing times in comprehension but not because morphologicaldecomposition would proceed more effectively Since high-frequency boundary bigrams and diphones are typically usedword-internally across many words they have a low cuevalidity for thesewords Conversely low-frequency boundarybigrams are much more typical for very specific base+affixcombinations and hence are better discriminative cues thatafford enhanced activation of wordsrsquo lexomes This in turngives rise to faster processing (see also Ramscar et al [78] forthe discriminative function of low-frequency bigrams in low-frequency monolexomic words)

There is remarkable convergence between the directedgraphs for speech production such as illustrated in Figure 9and computational models using temporal self-organizing

Complexity 25

Table 7 Statistics for the partial effects of a generalized additive mixed model fitted to the relative duration of edge segments in the Buckeyecorpus The trend for log frequency is positive accelerating TPRS thin plate regression spline

A parametric coefficients Estimate Std Error t-value p-valueIntercept -19165 00592 -323908 lt 00001Log LDL Edge Weight -01358 00400 -33960 00007Phrase Rate 00047 00021 22449 00248Log Ncount 01114 00189 58837 lt 00001B smooth terms edf Refdf F-value p-valueTPRS log Frequency 19007 19050 75495 00004random intercepts word 11498839 13230000 314919 lt 00001random intercepts speaker 302995 390000 345579 lt 00001

maps (TSOMs [139ndash141]) TSOMs also use error-drivenlearning and in recent work attention is also drawn toweak edges in wordsrsquo paths in the TSOM and the conse-quences thereof for lexical processing [142] We are not usingTSOMs however one reason being that in our experiencethey do not scale up well to realistically sized lexiconsA second reason is that we are pursuing the hypothesisthat form serves meaning and that self-organization ofform by itself is not necessary This hypothesis is likelytoo strong especially as we do not provide a rationale forthe trigram and triphone units that we use to representaspects of form It is worth noting that spatial organizationof form features is not restricted to TSOMs Although inour approach the spatial organization of triphone features isleft unspecified such an organization can be easily enforced(see [85] for further details) by using an algorithm suchas graphopt which self-organizes the vertices of a graphinto a two-dimensional plane Figure 9 was obtained withthis algorithm (Graphopt was developed for the layout oflarge graphs httpwwwschmuhlorggraphopt andis implemented in the igraph package Graphopt uses basicprinciples of physics to iteratively determine an optimallayout Each node in the graph is given both mass and anelectric charge and edges between nodes are modeled asspringsThis sets up a system inwhich there are attracting andrepelling forces between the vertices of the graph and thisphysical system is simulated until it reaches an equilibrium)

54 Discussion Our production network reconstructsknown wordsrsquo forms with a high accuracy 999 formonolexomic words 92 for inflected words and 99 forderived words For novel complex words accuracy under10-fold cross-validation was 62 for inflected words and 75for derived words

The drop in performance for novel forms is perhapsunsurprising given that speakers understand many morewords than they themselves produce even though they hearnovel forms on a fairly regular basis as they proceed throughlife [79 143]

However we also encountered several technical problemsthat are not due to the algorithm but to the representationsthat the algorithm has had to work with First it is surprisingthat accuracy is as high as it is given that the semanticvectors are constructed from a small corpus with a very

simple discriminative algorithm Second we encounteredinconsistencies in the phonological forms retrieved from thecelex database inconsistencies that in part are due to theuse of discrete triphones Third many cases where the modelpredictions do not match the targeted triphone sequence thetargeted forms have a minor irregularity (eg resound with[z] instead of [s]) Fourth several of the typical errors that themodel makes are known kinds of speech errors or reducedforms that one might encounter in engaged conversationalspeech

It is noteworthy that themodel is almost completely data-driven The triphones are derived from celex and otherthan somemanual corrections for inconsistencies are derivedautomatically from wordsrsquo phone sequences The semanticvectors are based on the tasa corpus and were not in any wayoptimized for the production model Given the triphone andsemantic vectors the transformation matrix is completelydetermined No by-hand engineering of rules and exceptionsis required nor is it necessary to specify with hand-codedlinks what the first segment of a word is what its secondsegment is etc as in the weaver model [37] It is only in theheuristic graph algorithm that three thresholds are requiredin order to avoid that graphs become too large to remaincomputationally tractable

The speech production system that emerges from thisapproach comprises first of all an excellent memory forforms that have been encountered before Importantly thismemory is not a static memory but a dynamic one It is not arepository of stored forms but it reconstructs the forms fromthe semantics it is requested to encode For regular unseenforms however a second network is required that projects aregularized semantic space (obtained by accumulation of thesemantic vectors of content and inflectional or derivationalfunctions) onto the triphone output space Importantly itappears that no separate transformations or rules are requiredfor individual inflectional or derivational functions

Jointly the two networks both of which can also betrained incrementally using the learning rule ofWidrow-Hoff[44] define a dual route model attempts to build a singleintegrated network were not successfulThe semantic vectorsof derived words are too idiosyncratic to allow generalizationfor novel forms It is the property of the semantic vectorsof derived lexomes described in Section 323 namely thatthey are close to but not inside the cloud of their content

26 Complexity

lexomes which makes them productive Novel forms do notpartake in the idiosyncracies of lexicalized existing wordsbut instead remain bound to base and derivational lexomesIt is exactly this property that makes them pronounceableOnce produced a novel form will then gravitate as experi-ence accumulates towards the cloud of semantic vectors ofits morphological category meanwhile also developing itsown idiosyncracies in pronunciation including sometimeshighly reduced pronunciation variants (eg tyk for Dutchnatuurlijk ([natyrlk]) [111 144 145] It is perhaps possibleto merge the two routes into one but for this much morefine-grained semantic vectors are required that incorporateinformation about discourse and the speakerrsquos stance withrespect to the adressee (see eg [115] for detailed discussionof such factors)

6 Bringing in Time

Thus far we have not considered the role of time The audiosignal comes in over time longerwords are typically readwithmore than one fixation and likewise articulation is a temporalprocess In this section we briefly outline how time can bebrought into the model We do so by discussing the readingof proper names

Proper names (morphologically compounds) pose achallenge to compositional theories as the people referredto by names such as Richard Dawkins Richard NixonRichardThompson and SandraThompson are not obviouslysemantic composites of each other We therefore assignlexomes to names irrespective of whether the individu-als referred to are real or fictive alive or dead Further-more we assume that when the personal name and familyname receive their own fixations both names are under-stood as pertaining to the same named entity which istherefore coupled with its own unique lexome Thus forthe example sentences in Table 8 the letter trigrams ofJohn are paired with the lexome JohnClark and like-wise the letter trigrams of Clark are paired with this lex-ome

By way of illustration we obtained thematrix of semanticvectors training on 100 randomly ordered tokens of thesentences of Table 8 We then constructed a trigram cuematrix C specifying for each word (John Clark wrote )which trigrams it contains In parallel a matrix L specify-ing for each word its corresponding lexomes (JohnClarkJohnClark write ) was set up We then calculatedthe matrix F by solving CF = L and used F to cal-culate estimated (predicted) semantic vectors L Figure 10presents the correlations of the estimated semantic vectorsfor the word forms John John Welsch Janet and Clarkwith the targeted semantic vectors for the named entitiesJohnClark JohnWelsch JohnWiggam AnneHastieand JaneClark For the sequentially read words John andWelsch the semantic vector generated by John and thatgenerated by Welsch were summed to obtain the integratedsemantic vector for the composite name

Figure 10 upper left panel illustrates that upon readingthe personal name John there is considerable uncertainty

about which named entity is at issue When subsequentlythe family name is read (upper right panel) uncertainty isreduced and John Welsch now receives full support whereasother named entities have correlations close to zero or evennegative correlations As in the small world of the presenttoy example Janet is a unique personal name there is nouncertainty about what named entity is at issue when Janetis read For the family name Clark on its own by contrastJohn Clark and Janet Clark are both viable options

For the present example we assumed that every ortho-graphic word received one fixation However depending onwhether the eye lands far enough into the word namessuch as John Clark and compounds such as graph theorycan also be read with a single fixation For single-fixationreading learning events should comprise the joint trigrams ofthe two orthographic words which are then simultaneouslycalibrated against the targeted lexome (JohnClark graph-theory)

Obviously the true complexities of reading such asparafoveal preview are not captured by this example Never-theless as a rough first approximation this example showsa way forward for integrating time into the discriminativelexicon

7 General Discussion

We have presented a unified discrimination-driven modelfor visual and auditory word comprehension and wordproduction the discriminative lexicon The layout of thismodel is visualized in Figure 11 Input (cochlea and retina)and output (typing and speaking) systems are presented inlight gray the representations of themodel are shown in darkgray For auditory comprehension we make use of frequencyband summary (FBS) features low-level cues that we deriveautomatically from the speech signal The FBS features serveas input to the auditory network F119886 that maps vectors of FBSfeatures onto the semantic vectors of S For reading lettertrigrams represent the visual input at a level of granularitythat is optimal for a functional model of the lexicon (see [97]for a discussion of lower-level visual features) Trigrams arerelatively high-level features compared to the FBS featuresFor features implementing visual input at a much lower levelof visual granularity using histograms of oriented gradients[146] see Linke et al [15] Letter trigrams are mapped by thevisual network F119900 onto S

For reading we also implemented a network heredenoted by K119886 that maps trigrams onto auditory targets(auditory verbal images) represented by triphones Theseauditory triphones in turn are mapped by network H119886onto the semantic vectors S This network is motivated notonly by our observation that for predicting visual lexicaldecision latencies triphones outperform trigrams by a widemargin Network H119886 is also part of the control loop forspeech production see Hickok [147] for discussion of thiscontrol loop and the necessity of auditory targets in speechproduction Furthermore the acoustic durations with whichEnglish speakers realize the stem vowels of English verbs[148] as well as the duration of word final s in English [149]

Complexity 27

Table 8 Example sentences for the reading of proper names To keep the example simple inflectional lexomes are not taken into account

John Clark wrote great books about antsJohn Clark published a great book about dangerous antsJohn Welsch makes great photographs of airplanesJohn Welsch has a collection of great photographs of airplanes landingJohn Wiggam will build great harpsichordsJohn Wiggam will be a great expert in tuning harpsichordsAnne Hastie has taught statistics to great second year studentsAnne Hastie has taught probability to great first year studentsJanet Clark teaches graph theory to second year studentsJohnClark write great book about antJohnClark publish a great book about dangerous antJohnWelsch make great photograph of airplaneJohnWelsch have a collection of great photograph airplane landingJohnWiggam future build great harpsichordJohnWiggam future be a great expert in tuning harpsichordAnneHastie have teach statistics to great second year studentAnneHastie have teach probability to great first year studentJanetClark teach graphtheory to second year student

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

John

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

John Welsch

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

Janet

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

Clark

Pear

son

corr

elat

ion

minus0

50

00

51

0

Figure 10 Correlations of estimated semantic vectors for John JohnWelsch Janet andClarkwith the targeted semantic vectors of JohnClarkJohnWelsch JohnWiggam AnneHastie and JaneClark

can be predicted from ndl networks using auditory targets todiscriminate between lexomes [150]

Speech production is modeled by network G119886 whichmaps semantic vectors onto auditory targets which in turnserve as input for articulation Although current modelsof speech production assemble articulatory targets fromphonemes (see eg [126 147]) a possibility that is certainlycompatible with our model when phones are recontextual-ized as triphones we think that it is worthwhile investigatingwhether a network mapping semantic vectors onto vectors ofarticulatory parameters that in turn drive the articulators canbe made to work especially since many reduced forms arenot straightforwardly derivable from the phone sequences oftheir lsquocanonicalrsquo forms [111 144]

We have shown that the networks of our model havereasonable to high production and recognition accuraciesand also generate a wealth of well-supported predictions forlexical processing Central to all networks is the hypothesisthat the relation between form and meaning can be modeleddiscriminatively with large but otherwise surprisingly simplelinear networks the underlyingmathematics of which (linearalgebra) is well-understood Given the present results it isclear that the potential of linear algebra for understandingthe lexicon and lexical processing has thus far been severelyunderestimated (the model as outlined in Figure 11 is amodular one for reasons of computational convenience andinterpretabilityThe different networks probably interact (seefor some evidence [47]) One such interaction is explored

28 Complexity

typing speaking

orthographic targets

trigrams

auditory targets

triphones

semantic vectors

TASA

auditory cues

FBS features

orthographic cues

trigrams

cochlea retina

To Ta

Ha

GoGa

KaS

Fa Fo

CoCa

Figure 11 Overview of the discriminative lexicon Input and output systems are presented in light gray the representations of the model areshown in dark gray Each of these representations is a matrix with its own set of features Arcs are labeled with the discriminative networks(transformationmatrices) thatmapmatrices onto each otherThe arc from semantic vectors to orthographic vectors is in gray as thismappingis not explored in the present study

in Baayen et al [85] In their production model for Latininflection feedback from the projected triphone paths backto the semantics (synthesis by analysis) is allowed so thatof otherwise equally well-supported paths the path that bestexpresses the targeted semantics is selected For informationflows between systems in speech production see Hickok[147]) In what follows we touch upon a series of issues thatare relevant for evaluating our model

Incremental learning In the present study we esti-mated the weights on the connections of these networkswith matrix operations but importantly these weights canalso be estimated incrementally using the learning rule ofWidrow and Hoff [44] further improvements in accuracyare expected when using the Kalman filter [151] As allmatrices can be updated incrementally (and this holds aswell for the matrix with semantic vectors which are alsotime-variant) and in theory should be updated incrementally

whenever information about the order of learning events isavailable the present theory has potential for studying lexicalacquisition and the continuing development of the lexiconover the lifetime [79]

Morphology without compositional operations Wehave shown that our networks are predictive for a rangeof experimental findings Important from the perspective ofdiscriminative linguistics is that there are no compositionaloperations in the sense of Frege and Russell [1 2 152] Thework on compositionality in logic has deeply influencedformal linguistics (eg [3 153]) and has led to the beliefthat the ldquoarchitecture of the language facultyrdquo is groundedin a homomorphism between a calculus (or algebra) ofsyntactic representations and a calculus (or algebra) based onsemantic primitives Within this tradition compositionalityarises when rules combining representations of form arematched with rules combining representations of meaning

Complexity 29

The approach of Marelli and Baroni [60] who derive thesemantic vector of a complex word from the semantic vectorof its base and a dedicated affix-specific mapping from thepertinent base words to the corresponding derived wordsis in the spirit of compositionality in that a change in form(through affixation) goes hand in hand with a change inmeaning (modeled with a linear mapping in semantic space)

The perspective on morphological processing developedin the present study breaks with this tradition Althoughsemantic vectors for inflected words are obtained by summa-tion of the semantic vectors of the base word and those ofits inflectional functions (see [85] for modeling of the richverbal paradigms of Latin) the form vectors correspondingto the resulting semantic vectors are not obtained by thesummation of vectors for chunks of form nor by any otheroperations on forms Attempts to align chunks of wordsrsquo formwith their semantic structures mdash typically represented bygraphs constructed over semantic primitives (see eg [154])mdash are destined to end in quagmires of arbitrary decisionsabout the number of inflectional classes required (Estonian30 or 300) what the chunks should look like (does Germanhave a morpheme d in its articles and demonstratives derdie das diese ) whether Semitic morphology requiresoperations on form on one or two tiers [155ndash157] andhow many affixal position slots are required for Athebascanlanguages such asNavajo [158] A language that is particularlypainful for compositional theories is Yelı Dnye an Indo-Pacific language spoken on Rossel island (in the south-east ofPapuaNewGuinea)The language has a substantial inventoryof over a 1000 function words used for verb agreementTypically these words are monosyllables that simultaneouslyspecify values for negation tense aspect person and numberof subject deixis evidentiality associated motion and coun-terfactuality Furthermore verbs typically have no less thaneight different stems the use of which depends on particularconstellations of inflectional functions [159 160] Yelı Dnyethus provides a striking counterexample to the hypothesisthat a calculus of meaning would be paralleled by a calculusof form

Whereas the theory of linear discriminative learningradically rejects the idea that morphological processing isgrounded in compositionality as defined in mainstream for-mal linguistic theory it does allow for the conjoint expressionin form of lexical meanings and inflectional functions andfor forms to map onto such conjoint semantic vectors Butjust asmultiple inflectional functions can be expressed jointlyin a single word multiple words can map onto nonconjointsemantic vectors as was illustrated above for named entityrecognition John Wiggam can be understood to refer to avery specific person without this person being a composi-tional function of the proper name John and the surnameWiggam In both these cases there is no isomorphy betweeninflectional functions and lexical meanings on the one handand chunks of form on the other hand

Of course conjoint vectors for lexical meanings andinflectional functions can be implemented andmade to workonly because a Latin verb form such as sapıvissemus isunderstood when building the semantic space S to expresslexomes for person (first) number (plural) voice (active)

tense (pluperfect) mood (subjunctive) and lexical meaning(to know) (see [85] for computational modeling details)Although this could loosely be described as a decomposi-tional approach to inflected words we prefer not to use theterm lsquodecompositionalrsquo as in formal linguistics this termas outlined above has a very different meaning A moreadequate terminology would be that the system is conjointrealizational for production and conjoint inferential for com-prehension (we note here that we have focused on temporallyconjoint realization and inference leaving temporally disjointrealization and inference for sequences of words and possiblycompounds and fixed expression for further research)

Naive versus linear discriminative learning There isone important technical difference between the learningengine of the current study ldl and naive discriminativelearning (ndl) Because ndl makes the simplifying assump-tion that outcomes are orthogonal the weights from the cuesto a given outcome are independent of the weights from thecues to the other outcomes This independence assumptionwhich motivates the word naive in naive discriminativelearning makes it possible to very efficiently update networkweights during incremental learning In ldl by contrast thisindependence no longer holds learning is no longer naiveWe therefore refer to our networks not as ndl networksbut simply as linear discriminative learning networks Actualincremental updating of ldl networks is computationallymore costly but further numerical optimization is possibleand implementations are underway

Scaling An important property of the present approachis that good results are obtained already for datasets ofmodestsize Our semantic matrix is derived from the tasa corpuswhich with only 10 million words is dwarfed by the 28 billionwords of the ukWaC corpus used by Marelli and Baroni [60]more words than any individual speaker can ever encounterin their lifetime Similarly Arnold et al [18] report goodresults for word identification when models are trained on20 hours of speech which contrasts with the huge volumesof speech required for training deep learning networks forspeech recognition Although performance increases some-whatwithmore training data [161] it is clear that considerableheadway can be made with relatively moderate volumesof speech It thus becomes feasible to train models onfor instance Australian English and New Zealand Englishusing already existing speech corpora and to implementnetworks that make precise quantitative predictions for howunderstanding is mediated by listenersrsquo expectations abouttheir interlocutors (see eg [162])

In this study we have obtained good results with asemantic matrix with a dimensionality of around 4000lexomes By itself we find it surprising that a semantic vectorspace can be scaffolded with a mere 4000 words But thisfindingmay also shed light on the phenomenon of childhoodamnesia [163 164] Bauer and Larkina [165] pointed outthat young children have autobiographical memories thatthey may not remember a few years later They argue thatthe rate of forgetting diminishes as we grow older andstabilizes in adulthood Part of this process of forgettingmay relate to the changes in the semantic matrix Cruciallywe expect the correlational structure of lexomes to change

30 Complexity

considerably over time as experience accumulates If this isindeed the case the semantic vectors for the lexomes foryoung children are different from the corresponding lexomesfor older children which will again differ from those ofadults As a consequence autobiographical memories thatwere anchored to the lexomes at a young age can no longerbe accessed as the semantic vectors to which these memorieswere originally anchored no longer exist

No examplars Given how we estimate the transforma-tion matrices with which we navigate between form andmeaning it might seem that our approach is in essence animplementation of an exemplar-based theory of the mentallexicon After all for instance the C119886 matrix specifies foreach auditory word form (exemplar) its pertinent frequencyband summary features However the computational imple-mentation of the present study should not be confused withthe underlying algorithmic conceptualization The underly-ing conceptualization is that learning proceeds incrementallytrial by trial with the weights of transformation networksbeing updated with the learning rule of Widrow and Hoff[44] In the mean trial-by-trial updating with the Widrow-Hoff learning rule results in the same expected connectionweights as obtained by the present estimates using thegeneralized inverse The important conceptual advantage ofincremental learning is however that there is no need to storeexemplars By way of example for auditory comprehensionacoustic tokens are given with the input to the learningsystemThese tokens the ephemeral result of interactingwiththe environment leave their traces in the transformationmatrix F119886 but are not themselves stored The same holdsfor speech production We assume that speakers dynamicallyconstruct the semantic vectors from their past experiencesrather than retrieving these semantic vectors from somedictionary-like fixed representation familiar from currentfile systems The dynamic nature of these semantic vectorsemerged loud and clear from the modeling of novel complexwords for speech production under cross-validation In otherwords our hypothesis is that all input and output vectors areephemeral in the sense that they represent temporary statesof the cognitive system that are not themselves represented inthat system but that leave their traces in the transformationmatrices that jointly characterize and define the cognitivesystem that we approximate with the discriminative lexicon

This dynamic perspective on the mental lexicon asconsisting of several coupled networks that jointly bringthe cognitive system into states that in turn feed into fur-ther networks (not modeled here) for sentence integration(comprehension) and articulation (production) raises thequestion about the status of units in this theory Units playan important role when constructing numeric vectors forform and meaning Units for letter trigrams and phonetrigrams are of course context-enriched letter and phoneunits Content lexomes as well as lexomes for derivationaland inflectional functions are crutches for central semanticfunctions ranging from functions realizing relatively concreteonomasiological functions (lsquothis is a dog not a catrsquo) toabstract situational functions allowing entities events andstates to be specified on the dimensions of time spacequantity aspect etc However crucial to the central thrust of

the present study there are no morphological symbols (unitscombining form andmeaning) normorphological formunitssuch as stems and exponents of any kind in the modelAbove we have outlined the many linguistic considerationsthat have led us not to want to incorporate such units aspart of our theory Importantly as there are no hidden layersin our model it is not possible to seek for lsquosubsymbolicrsquoreflexes of morphological units in patterns of activation overhidden layers Here we depart from earlier connectionistmodels which rejected symbolic representations but retainedthe belief that there should be morphological representationalbeit subsymbolic ones Seidenberg and Gonnerman [166]for instance discuss morphological representations as beinglsquointerlevel representationsrsquo that are emergent reflections ofcorrelations among orthography phonology and semanticsThis is not to say that themore agglutinative a language is themore the present model will seem to the outside observer tobe operating withmorphemes But themore a language tendstowards fusional or polysyntheticmorphology the less strongthis impression will be

Model complexity Next consider the question of howdifficult lexical processing actually is Sidestepping the com-plexities of the neural embedding of lexical processing [126147] here we narrow down this question to algorithmiccomplexity State-of-the-art models in psychology [37 114167] implementmany layers of hierarchically organized unitsand many hold it for an established fact that such units arein some sense psychologically real (see eg [19 91 168])However empirical results can be equally well understoodwithout requiring the theoretical construct of the morpheme(see eg [47 54 58 166 169 170]) The present studyillustrates this point for lsquoboundaryrsquo effects in written andoral production When paths in the triphone graph branchuncertainty increases and processing slows down Althoughobserved lsquoboundaryrsquo effects are compatible with a post-Bloomfieldian lexicon the very same effects arise in ourmodel even though morphemes do not figure in any way inour computations

Importantly the simplicity of the linear mappings in ourmodel of the discriminative lexicon allows us to sidestep theproblem that typically arise in computationally implementedfull-scale hierarchical systems namely that errors arisingat lower levels propagate to higher levels The major stepsforward made in recent end-to-end models in languageengineering suggest that end-to-end cognitive models arealso likely to perform much better One might argue thatthe hidden layers of deep learning networks represent thetraditional lsquohidden layersrsquo of phonemes morphemes andword forms mediating between form and meaning in lin-guistic models and offshoots thereof in psychology Howeverthe case of baboon lexical learning [12 13] illustrates thatthis research strategy is not without risk and that simplerdiscriminative networks can substantially outperform deeplearning networks [15] We grant however that it is conceiv-able that model predictions will improve when the presentlinear networks are replaced by deep learning networks whenpresented with the same input and output representationsused here (see Zhao et al [171] for a discussion of lossfunctions for deep learning targeting numeric instead of

Complexity 31

categorical output vectors) What is unclear is whethersuch networks will improve our understandingmdashthe currentlinear networks offer the analyst connection weights betweeninput and output representations that are straightforwardlylinguistically interpretable Where we think deep learningnetworks really will come into their own is bridging the gapbetween for instance the cochlea and retina on the one handand the heuristic representations (letter trigrams) that wehave used as a starting point for our functional theory of themappings between form and meaning

Network flexibility An important challenge for deeplearning networks designed to model human lexical process-ing is to keep the networks flexible and open to rapid updat-ing This openness to continuous learning is required notonly by findings on categorization [40 41 41 42] includingphonetic categorization [172 173] but is also visible in timeseries of reaction times Baayen and Hendrix [17] reportedimproved prediction of visual lexical decision reaction timesin the British Lexicon Project when a naive discriminativelearning network is updated from trial to trial simulating thewithin-experiment learning that goes on as subjects proceedthrough the experiment In natural speech the phenomenonof phonetic convergence between interlocutors [174] likewisebears witness to a lexicon that is constantly recalibratedin interaction with its environment (for the many ways inwhich interlocutorsrsquo speech aligns see [175]) Likewise therapidity with which listeners adapt to foreign accents [176]within a few minutes shows that lexical networks exhibitfast local optimization As deep learning networks typicallyrequire huge numbers of cycles through the training dataonce trained they are at risk of not having the requiredflexibility for fast local adaptation This risk is reduced for(incremental) wide learning as illustrated by the phoneticconvergence evidenced by the model of Arnold et al [18] forauditory comprehension (a very similar point was made byLevelt [177] in his critique of connectionist models in thenineties)

Open questions This initial study necessarily leavesmany questions unanswered Compounds and inflectedderived words have not been addressed and morphologicaloperations such as reduplication infixation and nonconcate-native morphology provide further testing grounds for thepresent approach We also need to address the processingof compounds with multiple constituents ubiquitous inlanguages as different as German Mandarin and EstonianFurthermore in actual utterances words undergo assimila-tion at their boundaries and hence a crucial test case forthe discriminative lexicon is to model the production andcomprehension of words in multi-word utterances For ourproduction model we have also completely ignored stresssyllabification and pitch which however likely has renderedthe production task more difficult than necessary

A challenge for discriminative morphology is the devel-opment of proper semantic matrices For languages withrich morphology such as Estonian current morphologicalparsers will identify case-inflected words as nominativesgenitives or partitives (among others) but these labelsfor inflectional forms do not correspond to the semanticfunctions encoded in these forms (see eg [178 179]) What

is required are computational tools that detect the appropriateinflectional lexomes for these words in the sentences orutterances in which they are embedded

A related problem specifically for the speech productionmodel is how to predict the strongly reduced word formsthat are rampant in conversational speech [111] Our auditorycomprehension network F119886 is confronted with reduced formsin training and the study of Arnold et al [18] indicates thatpresent identification accuracymay approximate human per-formance for single-word recognition Whereas for auditorycomprehension we are making progress what is required forspeech production is a much more detailed understanding ofthe circumstances under which speakers actually use reducedword forms The factors driving reduction may be in partcaptured by the semantic vectors but we anticipate thataspects of the unfolding discourse play an important role aswell which brings us back to the unresolved question of howto best model not isolated words but words in context

An important challenge for our theory which formalizesincremental implicit learning is how to address and modelthe wealth of explicit knowledge that speakers have abouttheir language We play with the way words sound in poetrywe are perplexed by wordsrsquo meanings and reinterpret themso that they make more sense (eg the folk etymology [180]reflected in modern English crawfish a crustacean and nota fish originally Middle English crevis compare ecrevissein French) we teach morphology in schools and we evenfind that making morphological relations explicit may bebeneficial for aphasic patients [181] The rich culture of worduse cannot but interact with the implicit system but howexactly this interaction unfolds andwhat its consequences areat both sides of the consciousunconscious divide is presentlyunclear

Although many questions remain our model is remark-ably successful in capturing many aspects of lexical andmorphological processing in both comprehension and pro-duction Since themodel builds on a combination of linguisti-callymotivated lsquosmartrsquo low-level representations for form andmeaning and large but otherwise straightforward andmathe-matically well-understood two-layer linear networks (to beclear networks without any hidden layers) the conclusionseems justified that algorithmically the ldquomental lexiconrdquomaybe much simpler than previously thought

Appendix

A Vectors Matrices and MatrixMultiplication

Figure 12 presents two data points 119886 and 119887 with coordinates(3 4) and (-2 -3) Arrows drawn from the origin to thesepoints are shown in blue with a solid and dashed linerespectively We refer to these points as vectors Vectors aredenoted with lower case letters in bold and we place the 119909coordinate of a point next to the 119910 coordinate

a = (3 4) b = (minus2 minus3) (A1)

32 Complexity

minus4 minus2 0 2 4

minus4

minus2

02

4

x

y

a

b

x

y

Figure 12 Points 119886 and 119887 can be transformed into points 119901 and 119902 bya linear transformation

We can represent a and b jointly by placing them next to eachother in a matrix For matrices we use capital letters in boldfont

A = ( 3 4minus2 minus3) (A2)

In addition to the points 119886 and 119887 Figure 12 also shows twoother datapoints 119909 and 119910 The matrix for these two points is

B = (minus5 24 minus1) (A3)

Let us assume that points 119886 and 119887 represent the forms oftwo words 1205961 and 1205962 and that the points 119909 and 119910 representthe meanings of these two words (below we discuss in detailhow exactly wordsrsquo forms and meanings can be representedas vectors of numbers) As morphology is the study of therelation between wordsrsquo forms and their meanings we areinterested in how to transform points 119886 and b into points119909 and 119910 and likewise in how to transform points 119909 and119910 back into points 119886 and 119887 The first transformation isrequired for comprehension and the second transformationfor production

We begin with the transformation for comprehensionFormally we have matrix A which we want to transforminto matrix B A way of mapping A onto B is by means ofa linear transformation using matrix multiplication For 2 times2 matrices matrix multiplication is defined as follows(1198861 11988621198871 1198872) (1199091 11990921199101 1199102) = (11988611199091 + 11988621199101 11988611199092 + 1198862119910211988711199091 + 11988721199101 11988711199092 + 11988721199102) (A4)

Such a mapping of A onto B is given by a matrix F

F = ( 1 2minus2 minus1) (A5)

and it is straightforward to verify that indeed( 3 4minus2 minus3) ( 1 2minus2 minus1) = (minus5 24 minus1) (A6)

Using matrix notation we can write

AF = B (A7)

Given A and B how do we obtain F The answeris straightforward but we need two further concepts theidentity matrix and the matrix inverse For multiplication ofnumbers the identity multiplication is multiplication with 1For matrices multiplication with the identity matrix I leavesthe locations of the points unchangedThe identitymatrix hasones on the main diagonal and zeros elsewhere It is easilyverified that indeed(1 00 1) (1199091 11990921199101 1199102) = (1199091 11990921199101 1199102) (1 00 1) = (1199091 11990921199101 1199102) (A8)

For numbers the inverse of multiplication with 119904 is dividingby 119904 (1119904 ) (119904119909) = 119909 (A9)

For matrices the inverse of a square matrix X is that matrixXminus1 such that their product is the identity matrix

Xminus1X = XXminus1 = I (A10)

For nonsquare matrices the inverse Yminus1 is defined such that

Yminus1 (YX) = (XY) 119884minus1 = X (A11)

We find the square matrix F that maps the square matrices Aonto B as follows

AF = B

Aminus1AF = Aminus1B

IF = Aminus1B

F = Aminus1B(A12)

Since for the present example A and its inverse happen to beidentical (Aminus1 = A) we obtain (A6)

Fmaps wordsrsquo form vectors onto wordsrsquo semantic vectorsLet us now consider the reverse and see how we can obtainwordsrsquo form vectors from wordsrsquo semantic vectors That isgiven B we want to find that matrix G which maps B ontoA ie

BG = A (A13)

Solving this equation proceeds exactly as above

BG = A

Bminus1BG = Bminus1A

IG = Bminus1A

G = Bminus1A(A14)

Complexity 33

The inverse of B is

Bminus1 = (13 2343 53) (A15)

and hence we have

(13 2343 53) ( 3 4minus2 minus3) = (minus13 minus2323 13 ) = G (A16)

The inverse of a matrix needs not exist A square matrixthat does not have an inverse is referred to as a singularmatrix Almost all matrices with which we work in theremainder of this study are singular For singular matricesan approximation of the inverse can be used such as theMoore-Penrose generalized inverse (In R the generalizedinverse is available in theMASS package function ginv Thenumeric library that is used by ginv is LAPACK availableat httpwwwnetliborglapack) In this study wedenote the generalized inverse of a matrix X by X1015840

The examples presented here are restricted to 2 times 2matrices but themathematics generalize tomatrices of largerdimensions When multiplying two matrices X and Y theonly constraint is that the number of columns ofX is the sameas the number of rows ofYThe resultingmatrix has the samenumber of rows as X and the same number of columns as Y

In the present study the rows of matrices represent wordsand columns the features of thesewords For instance the rowvectors of A can represent wordsrsquo form features and the rowvectors of and B the corresponding semantic features Thefirst row vector of A is mapped onto the first row vector ofB as follows (3 4) ( 1 2minus2 minus1) = (minus5 2) (A17)

A transformationmatrix such as F can be represented as atwo-layer networkThe network corresponding to F is shownin Figure 13 When the input vector (3 4) is presented to thenetwork the nodes 1198941 and 1198942 are set to 3 and 4 respectively Toobtain the activations 1199001 = minus5 and 1199002 = 2 for the output vector(minus5 2) we cumulate the evidence at the input weighted bythe connection strengths specified on the edges of the graphThus for 1199001 we obtain the value 3 times 1 + 4 times minus2 = minus5 and for1199002 we have 3 times 2 + 4 times minus1 = 2 In other words the networkmaps the input vector (3 4) onto the output vector (minus5 2)

An important property of linear maps is that they areproductive We can present a novel form vector say

s = (2 2) (A18)

to the network and it will map this vector onto a novelsemantic vector Using matrices this new semantic vector isstraightforwardly calculated(2 2) ( 1 2minus2 minus1) = (minus2 2) (A19)

1 2 minus2 minus1

i1 i2

o1 o2

semantic vectors come out here

form features go in here

connection weightsare the transformationmatrix

Figure 13 The linear network corresponding to the transformationmatrix F There are two input nodes 1198941 and 1198942 (in blue)and twooutput nodes 1199001 and 1199002 in red When the input vector (3 4) ispresented to the network the value at 1199001 is 3 times 1 + 4 times minus2 = minus5and that at 1199002 is 3 times 2 + 4 times minus1 = 2 The network maps the point(3 4) onto the point (minus5 2)B Graph-Based Triphone Sequencing

The hypothesis underlying graph-based triphone sequencingis that in the directed graph that has triphones as verticesand an edge between any two vertices that overlap (iesegments 2 and 3 of triphone 1 are identical to segments 1and 2 of triphone 2) the path from a wordrsquos initial triphone(the triphone starting with ) to a wordrsquos final triphone (thetriphone ending with a ) is the path receiving the bestsupport of all possible paths from the initial to the finaltriphone Unfortunately the directed graph containing allvertices and edges is too large tomake computations tractablein a reasonable amount of computation time The followingalgorithm is a heuristic algorithm that first collects potentiallyrelevant vertices and edges and then calculates all pathsstarting fromany initial vertex using theall simple pathsfunction from the igraph package [127] As a first step allvertices that are supported by the stem and whole wordwith network support exceeding 01 are selected To theresulting set those vertices are added that are supported bythe affix which is conditional on these vertices having anetwork support exceeding 095 A directed graph can nowbe constructed and for each initial vertex all paths startingat these vertices are calculated The subset of those pathsreaching a final vertex is selected and the support for eachpath is calculated As longer paths trivially will have greatersupport the support for a path is weighted for its lengthsimply by dividing the raw support by the path length Pathsare ranked by weighted path support and the path withmaximal support is selected as the acoustic image drivingarticulation

It is possible that no path from an initial vertex to a finalvertex is found due to critical boundary triphones not beinginstantiated in the data on which the model was trainedThis happens when the model is assessed on novel inflected

34 Complexity

nobo

dyso

met

hing

ever

ythi

ngno

thin

gev

eryo

nean

ythi

ngan

yone

som

eone

hers

him his

via

than I

exce

ptou

tside

thei

rsw

hoev

erth

ence

min

ein

side

amon

gst its

thei

rou

rm

yyo

uyo

ur her

she

your

selv

esw

hate

ver us we it

they he

them m

em

yself

him

self

hers

elfou

rselv

esyo

urse

lfits

elfth

emse

lves

abou

tw

ithou

tw

ithin

beyo

ndam

ong

betw

een

thro

ugho

utdu

ring

since by

from in of

with at fo

r tobe

low

abov

e up offdo

wn

upon on

agai

nst

tow

ard

tow

ards

arou

ndne

aral

ong

unde

rbe

hind

besid

eth

roug

hon

toin

toov

erac

ross

nobodysomethingeverythingnothingeveryoneanythinganyonesomeonehershimhisviathanIexceptoutsidetheirswhoeverthencemineinsideamongstitstheirourmyyouyourhersheyourselveswhateverusweittheyhethemmemyselfhimselfherselfourselvesyourselfitselfthemselvesaboutwithoutwithinbeyondamongbetweenthroughoutduringsincebyfrominofwithatfortobelowaboveupoffdownupononagainsttowardtowardsaroundnearalongunderbehindbesidethroughontointooveracross

minus01 0 01 02Value

Color Keyand Histogram

050

015

0025

00C

ount

Figure 14 Heatmap for the correlations of the semantic vectors of pronouns and prepositions Note that quantifier pronouns form a groupof their own that prepositions and the other pronouns form distinguisable groups and that within groups further subclusters are visible egforwe and us she and her and across and over All diagonal elements represent correlations equal to 1 (this is not brought out by the heatmapwhich color-codes diagonal elements with the color for the highest correlation of the range we specified 025)

and derived words under cross-validationTherefore verticesand edges are added to the graph for all nonfinal verticesthat are at the end of any of the paths starting at any initialtriphone Such novel vertices and edges are assigned zeronetwork support Typically paths from the initial vertex tothe final vertex can now be constructed and path support isevaluated as above

For an algorithm that allows triphone paths to includecycles which may be the case in languages with much richer

morphology than English see Baayen et al [85] and for codethe R packageWpmWithLdl described therein

C A Heatmap for Function Words(Pronouns and Prepositions)

The heatmap for the correlations of the semantic vectors ofpronouns and prepositions is presented in Figure 14

Complexity 35

Data Availability

With the exception of the data from theDistributed Little RedHen Lab all primary data are publicly available from the citedsources

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The authors are indebted to Geoff Hollis Jessie Nixon ChrisWestbury and LuisMienhardt for their constructive feedbackon earlier versions of this manuscript This research wassupported by an ERC advancedGrant (no 742545) to the firstauthor

References

[1] G Frege ldquoBegriffsschrift a formula language modeled uponthat of arithmetic for pure thoughtrdquo in From Frege to GodelA Source Book in Mathematical Logic 1879-1931 pp 1ndash82 1967

[2] B RussellAn Inquiry intoMeaning andTruth Allen andUnwinLondon UK 1942

[3] R Montague ldquoThe proper treatment of quantification in ordi-nary englishrdquo in Approaches to Natural Language pp 221ndash242Springer 1973

[4] N Chomsky Syntactic Structures vol 33 Mouton The Hague1957

[5] W J M Levelt Formal Grammars in Linguistics And Psycholin-guistics Volume 1 An Introduction to the Theory of FormalLanguages and Automata Volume 2 Applications in LinguisticTheory Volume 3 Psycholinguistic Applications John BenjaminsPublishing 2008

[6] D F Kleinschmidt and T Florian Jaeger ldquoRobust speechperception Recognize the familiar generalize to the similar andadapt to the novelrdquo Psychological Review vol 122 no 2 pp 148ndash203 2015

[7] L Breiman ldquoRandom forestsrdquo Machine Learning vol 45 no 1pp 5ndash32 2001

[8] C Strobl A-L Boulesteix A Zeileis and T Hothorn ldquoBiasin random forest variable importance measures illustrationssources and a solutionrdquo BMC Bioinformatics vol 8 2007

[9] A Hannun C Case J Casper et al ldquoDeep speechScaling up end-to-end speech recognitionrdquo 2014httpsarxivorgabs14125567

[10] Z Tuske P Golik R Schluter and H Ney ldquoAcoustic modelingwith deep neural networks using raw time signal for lvcsrrdquo inProceedings of the INTERSPEECH pp 890ndash894 2014

[11] J Schmidhuber ldquoDeep learning in neural networks anoverviewrdquo Neural Networks vol 61 pp 85ndash117 2015

[12] T Hannagan J C Ziegler S Dufau J Fagot and J GraingerldquoDeep learning of orthographic representations in baboonsrdquoPLoS ONE vol 9 no 1 2014

[13] J Grainger S Dufau M Montant J C Ziegler and J FagotldquoOrthographic processing in baboons (Papio papio)rdquo Sciencevol 336 no 6078 pp 245ndash248 2012

[14] D Scarf K Boy A U Reinert J Devine O Gunturkun andM Colombo ldquoOrthographic processing in pigeons (columba

livia)rdquo Proceedings of the National Academy of Sciences vol 113no 40 pp 11272ndash11276 2016

[15] M Linke F Broker M Ramscar and R H Baayen ldquoArebaboons learning ldquoorthographicrdquo representations Probablynotrdquo PLoS ONE vol 12 no 8 2017

[16] D H Hubel and T N Wiesel ldquoReceptive fields binocularinteraction and functional architecture in the catrsquos visualcortexrdquo The Journal of Physiology vol 160 pp 106ndash154 1962

[17] R H Baayen and P Hendrix ldquoTwo-layer networks non-linear separation and human learningrdquo in From Semantics toDialectometry Festschrift in honor of JohnNerbonne Tributes 32M Wieling M Kroon G Van Noord and G Bouma Eds pp13ndash22 College Publications 2017

[18] D Arnold F Tomaschek K Sering F Lopez and R H BaayenldquoWords from spontaneous conversational speech can be rec-ognized with human-like accuracy by an error-driven learningalgorithm that discriminates between meanings straight fromsmart acoustic features bypassing the phoneme as recognitionunitrdquo PLoS ONE vol 12 no 4 2017

[19] P Zwitserlood ldquoProcessing and representation of morphologi-cal complexity in native language comprehension and produc-tionrdquo in The Construction of Words G E Booij Ed pp 583ndash602 Springer 2018

[20] G Booij ldquoConstruction morphologyrdquo Language and LinguisticsCompass vol 4 no 7 pp 543ndash555 2010

[21] J P Blevins ldquoWord-based morphologyrdquo Journal of Linguisticsvol 42 no 3 pp 531ndash573 2006

[22] G Stump Inflectional Morphology A Theory of ParadigmStructure Cambridge University Press 2001

[23] R Beard ldquoOn the extent and nature of irregularity in thelexiconrdquo Lingua vol 42 no 4 pp 305ndash341 1977

[24] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1974

[25] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1991

[26] C Hockett ldquoThe origin of speechrdquo Scientific American vol 203pp 89ndash97 1960

[27] M Erelt Estonian Language M Erelt Ed Estonian AcademyPublishers Tallinn Estonia 2003

[28] P H Matthews Grammatical Theory in the United States fromBloomfield to Chomsky Cambridge University Press Cam-bridge 1993

[29] N Chomsky andMHalleTheSound Pattern of English Harperand Row NY USA 1968

[30] T Landauer and S Dumais ldquoA solution to Platorsquos problem Thelatent semantic analysis theory of acquisition induction andrepresentation of knowledgerdquo Psychological Review vol 104 no2 pp 211ndash240 1997

[31] W Weaver ldquoTranslationrdquo in Machine Translation of LanguagesFourteen Essays W N Locke and A D Booth Eds pp 15ndash23MIT Press Cambridge 1955

[32] J R Firth Selected Papers of J R Firth 1952-59 IndianaUniversity Press 1968

[33] K Lund and C Burgess ldquoProducing high-dimensional seman-tic spaces from lexical co-occurrencerdquo Behavior Research Meth-ods Instruments and Computers vol 28 no 2 pp 203ndash2081996

[34] C Shaoul and C Westbury ldquoExploring lexical co-occurrencespace using HiDExrdquo Behavior Research Methods vol 42 no 2pp 393ndash413 2010

36 Complexity

[35] T Mikolov I Sutskever K Chen G Corrado and J DeanldquoDistributed representations of words and phrases and theircompositionalityrdquo inAdvances in Neural Information ProcessingSystems pp 3111ndash3119 2013

[36] A Roelofs ldquoThe WEAVER model of word-form encoding inspeech productionrdquo Cognition vol 64 no 3 pp 249ndash284 1997

[37] W J M Levelt A Roelofs and A S Meyer ldquoA theory of lexicalaccess in speech productionrdquoBehavioral and Brain Sciences vol22 no 1 pp 1ndash38 1999

[38] M Taft ldquoInteractive-activation as a Framework for Under-standing Morphological Processingrdquo Language and CognitiveProcesses vol 9 no 3 pp 271ndash294 1994

[39] R Schreuder and R H Baayen ldquoModeling morphologicalprocessingrdquo inMorphological Aspects of Language Processing LB Feldman Ed pp 131ndash154 Lawrence ErlbaumHillsdale NewJersey USA 1995

[40] B C Love D L Medin and T M Gureckis ldquoSUSTAIN ANetwork Model of Category Learningrdquo Psychological Reviewvol 111 no 2 pp 309ndash332 2004

[41] C J Marsolek ldquoWhat antipriming reveals about primingrdquoTrends in Cognitive Sciences vol 12 no 5 pp 176ndash181 2008

[42] M Ramscar and R Port ldquoCategorization (without categories)rdquoin Handbook of Cognitive Linguistics E Dabrowska and DDivjak Eds pp 75ndash99 De Gruyter Berlin Germany 2015

[43] R A Rescorla and A R Wagner ldquoA theory of Pavlovianconditioning Variations in the effectiveness of reinforcementand nonreinforcementrdquo in Classical Conditioning II CurrentResearch and Theory A H Black and W F Prokasy Eds pp64ndash99 Appleton Century Crofts NY USA 1972

[44] B Widrow and M E Hoff ldquoAdaptive switching circuitsrdquo 1960WESCON Convention Record Part IV pp 96ndash104 1960

[45] R H Baayen R Schreuder and R Sproat ldquoMorphology inthe Mental Lexicon A Computational Model for Visual WordRecognitionrdquo in Lexicon Development for Speech and LanguageProcessing F van Eynde and D Gibbon Eds pp 267ndash291Kluwer Academic Publishers 2000

[46] M Taft ldquoA morphological-decomposition model of lexicalrepresentationrdquo Linguistics vol 26 no 4 pp 657ndash667 1988

[47] MWHarmandM S Seidenberg ldquoComputing themeanings ofwords in reading Cooperative division of labor between visualand phonological processesrdquo Psychological Review vol 111 no3 pp 662ndash720 2004

[48] P C Trimmer J M McNamara A Houston and J A RMarshall ldquoDoes natural selection favour the Rescorla-Wagnerrulerdquo Journal of Theoretical Biology vol 302 pp 39ndash52 2012

[49] R R Miller R C Barnet and N J Grahame ldquoAssessment ofthe Rescorla-Wagner modelrdquo Psychological Bulletin vol 117 no3 pp 363ndash386 1995

[50] L J Kamin ldquoPredictability surprise attention and condition-ingrdquo in Punishment and Aversive Behavior B A Campbell andRMChurch Eds pp 276ndash296Appleton-Century-CroftsNYUSA 1969

[51] R A Rescorla ldquoPavlovian conditioning Itrsquos not what you thinkit isrdquo American Psychologist vol 43 no 3 pp 151ndash160 1988

[52] M Ramscar D Yarlett M Dye K Denny and K ThorpeldquoThe Effects of Feature-Label-Order and Their Implications forSymbolic Learningrdquo Cognitive Science vol 34 no 6 pp 909ndash957 2010

[53] M Ramscar and D Yarlett ldquoLinguistic self-correction in theabsence of feedback A new approach to the logical problem oflanguage acquisitionrdquo Cognitive Science vol 31 no 6 pp 927ndash960 2007

[54] R H Baayen P Milin D F ETHurđevic P Hendrix and MMarelli ldquoAn Amorphous Model for Morphological Process-ing in Visual Comprehension Based on Naive DiscriminativeLearningrdquo Psychological Review vol 118 no 3 pp 438ndash4812011

[55] F M Del Prado Martın R Bertram T Haikio R Schreuderand R H Baayen ldquoMorphological family size in a morphologi-cally rich languageThe case of Finnish compared to Dutch andHebrewrdquo Journal of Experimental Psychology LearningMemoryand Cognition vol 30 pp 1271ndash1278 2004

[56] P Milin D F Durdevic and F M Del Prado Martın ldquoThesimultaneous effects of inflectional paradigms and classes onlexical recognition Evidence from Serbianrdquo Journal of Memoryand Language vol 60 no 1 pp 50ndash64 2009

[57] B K Bergen ldquoThe psychological reality of phonaesthemesrdquoLanguage vol 80 no 2 pp 290ndash311 2004

[58] P Milin L B Feldman M Ramscar P Hendrix and R HBaayen ldquoDiscrimination in lexical decisionrdquo PLoS ONE vol 12no 2 2017

[59] R H Baayen P Milin and M Ramscar ldquoFrequency in lexicalprocessingrdquo Aphasiology vol 30 no 11 pp 1174ndash1220 2016

[60] M Marelli and M Baroni ldquoAffixation in semantic space Mod-eling morpheme meanings with compositional distributionalsemanticsrdquo Psychological Review vol 122 no 3 pp 485ndash5152015

[61] T Sering PMilin andRH Baayen ldquoLanguage comprehensionas amultiple label classification problemrdquo StatisticaNeerlandicapp 1ndash15 2018

[62] R Kaye and RWilson Linear Algebra Oxford University Press1998

[63] S H Ivens and B L Koslin Demands for Reading LiteracyRequire New Accountability Methods Touchstone Applied Sci-ence Associates 1991

[64] T K Landauer P W Foltz and D Laham ldquoAn introduction tolatent semantic analysisrdquoDiscourse Processes vol 25 no 2-3 pp259ndash284 1998

[65] H Schmid ldquoImprovements in Part-of-Speech Tagging with anApplication to Germanrdquo in Proceedings of the ACL SIGDAT-Workshop pp 13ndash25 Dublin Ireland 1995

[66] R H Baayen R Piepenbrock and L Gulikers The CELEXLexical Database (CD-ROM) Linguistic Data ConsortiumUniversity of Pennsylvania Philadelphia PA 1995

[67] J Mitchell and M Lapata ldquoVector-based models of semanticcompositionrdquo in Proceedings of the ACL pp 236ndash244 2008

[68] A Lazaridou M Marelli R Zamparelli and M BaronildquoCompositionally derived representations of morphologicallycomplex words in distributional semanticsrdquo in Proceedings ofthe ACL (1) pp 1517ndash1526 2013

[69] R Cotterell H Schutze and J Eisner ldquoMorphological smooth-ing and extrapolation of word embeddingsrdquo in Proceedings ofthe 54th Annual Meeting of the Association for ComputationalLinguistics vol 1 pp 1651ndash1660 2016

[70] B D Zeller S Pado and J Snajder ldquoTowards semantic valida-tion of a derivational lexiconrdquo in Proceedings of the COLING2014 the 25th International Conference on Computational Lin-guistics Technical Papers pp 1728ndash1739 2014

[71] H Marchand The Categories and Types of Present-Day EnglishWord Formation A Synchronic-Diachronic Approach BeckrsquoscheVerlagsbuchhandlung Munchen Germany 1969

[72] L Bauer EnglishWord-Formation Cambridge University PressCambridge 1983

Complexity 37

[73] I PlagWord-Formation in English CambridgeUniversity PressCambridge 2003

[74] R Beard Lexeme-Morpheme Base Morphology A GeneralTheory of Inflection and Word Formation State University ofNew York Press Albany NY USA 1995

[75] K Geeraert J Newman and R H Baayen ldquoIdiom VariationExperimental Data and a Blueprint of a Computational ModelrdquoTopics in Cognitive Science vol 9 no 3 pp 653ndash669 2017

[76] C Shaoul S Bitschau N Schilling et al ldquondl2 Naive discrim-inative learning an implementation in R R packagerdquo 2015

[77] P Milin D Divjak and R H Baayen ldquoA learning perspectiveon individual differences in skilled reading Exploring andexploiting orthographic and semantic discrimination cuesrdquoJournal of Experimental Psychology Learning Memory andCognition vol 43 no 11 pp 1730ndash1751 2017

[78] M Ramscar P Hendrix C Shaoul P Milin and R H BaayenldquoNonlinear dynamics of lifelong learning the myth of cognitivedeclinerdquo Topics in Cognitive Science vol 6 pp 5ndash42 2014

[79] M Ramscar C C Sun P Hendrix and R H Baayen ldquoTheMis-measurement of Mind Life-Span Changes in Paired-Associate-Learning Scores Reflect the ldquoCostrdquo of Learning Not CognitiveDeclinerdquo Psychological Science vol 28 no 8 pp 1171ndash1179 2017

[80] G desRosiers and D Ivison ldquoPaired associate learning Form 1and form 2 of the Wechsler memory scalerdquo Archives of ClinicalNeuropsychology vol 3 no 1 pp 47ndash67 1988

[81] E Bruni N K Tran andM Baroni ldquoMultimodal distributionalsemanticsrdquo Journal of Artificial Intelligence Research vol 49 pp1ndash47 2014

[82] W N Venables and B D RipleyModern Applied Statistics withS-PLUS Springer NY USA 2002

[83] A B Warriner V Kuperman and M Brysbaert ldquoNorms ofvalence arousal and dominance for 13915 English lemmasrdquoBehavior Research Methods vol 45 no 4 pp 1191ndash1207 2013

[84] D Kastovsky ldquoProductivity in word formationrdquo Linguistics vol24 no 3 pp 585ndash600 1986

[85] R H Baayen Y Chuang and J P Blevins ldquoInflectionalmorphology with linear mappingsrdquoTheMental Lexicon vol 13no 2 pp 232ndash270 2018

[86] Burnage CELEX A Guide for Users Centre for Lexical Infor-mation Nijmegen The Netherlands 1988

[87] C McBride-Chang ldquoModels of Speech Perception and Phono-logical Processing in Readingrdquo Child Development vol 67 no4 pp 1836ndash1856 1996

[88] G C Van Orden H Kloos et al ldquoThe question of phonologyand readingrdquo inThe Science of Reading A Handbook pp 61ndash782005

[89] D Jared and K OrsquoDonnell ldquoSkilled adult readers activate themeanings of high-frequency words using phonology Evidencefrom eye trackingrdquoMemoryampCognition vol 45 no 2 pp 334ndash346 2017

[90] G S Dell ldquoA Spreading-Activation Theory of Retrieval inSentence Productionrdquo Psychological Review vol 93 no 3 pp283ndash321 1986

[91] A Marantz ldquoNo escape from morphemes in morphologicalprocessingrdquo Language and Cognitive Processes vol 28 no 7 pp905ndash916 2013

[92] M Bozic W D Marslen-Wilson E A Stamatakis M HDavis and L K Tyler ldquoDifferentiating morphology formand meaning Neural correlates of morphological complexityrdquoCognitive Neuroscience vol 19 no 9 pp 1464ndash1475 2007

[93] R F Port andA P Leary ldquoAgainst formal phonologyrdquoLanguagevol 81 no 4 pp 927ndash964 2005

[94] J P Blevins ldquoStems and paradigmsrdquo Language vol 79 no 4 pp737ndash767 2003

[95] R Forsyth and D Holmes ldquoFeature-finding for test classifica-tionrdquo Literary and Linguistic Computing vol 11 no 4 pp 163ndash174 1996

[96] H Pham and R H Baayen ldquoVietnamese compounds showan anti-frequency effect in visual lexical decisionrdquo LanguageCognition and Neuroscience vol 30 no 9 pp 1077ndash1095 2015

[97] L Cohen and S Dehaene ldquoVentral and dorsal contributions toword readingrdquo inTheCognitive Neurosciences M S GazzanigaEd pp 789ndash804 Massachusetts Institute of Technology 2009

[98] E Keuleers P Lacey K Rastle and M Brysbaert ldquoThe BritishLexicon Project Lexical decision data for 28730 monosyllabicand disyllabic English wordsrdquo Behavior Research Methods vol44 no 1 pp 287ndash304 2012

[99] R K Wagner J K Torgesen and C A Rashotte ldquoDevel-opment of reading-related phonological processing abilitiesnew evidence of bidirectional causality from a latent variablelongitudinal studyrdquoDevelopmental Psychology vol 30 no 1 pp73ndash87 1994

[100] K F EWong andH-C Chen ldquoOrthographic and phonologicalprocessing in reading Chinese text Evidence from eye fixa-tionsrdquo Language and Cognitive Processes vol 14 no 5-6 pp461ndash480 1999

[101] R L Newman D Jared and C A Haigh ldquoDoes phonologyplay a role when skilled readers read high-frequency wordsEvidence from ERPsrdquo Language and Cognitive Processes vol 27no 9 pp 1361ndash1384 2012

[102] D Jared J Ashby S J Agauas and B A Levy ldquoPhonologicalactivation of word meanings in grade 5 readersrdquo Journal ofExperimental Psychology LearningMemory andCognition vol42 no 4 pp 524ndash541 2016

[103] T Bitan A Kaftory A Meiri-Leib Z Eviatar and O PelegldquoPhonological ambiguity modulates resolution of semanticambiguity during reading An fMRI study of Hebrewrdquo Neu-ropsychology vol 31 no 7 pp 759ndash777 2017

[104] D Jared and S Bainbridge ldquoReading homophone puns Evi-dence from eye trackingrdquo Canadian Journal of ExperimentalPsychology vol 71 no 1 pp 2ndash13 2017

[105] S Amenta M Marelli and S Sulpizio ldquoFrom sound tomeaning Phonology-to-Semantics mapping in visual wordrecognitionrdquo Psychonomic Bulletin amp Review vol 24 no 3 pp887ndash893 2017

[106] M Perrone-Bertolotti J Kujala J R Vidal et al ldquoHow silent issilent reading intracerebral evidence for top-down activationof temporal voice areas during readingrdquo The Journal of Neuro-science vol 32 no 49 pp 17554ndash17562 2012

[107] B Yao P Belin andC Scheepers ldquoSilent reading of direct versusindirect speech activates voice-selective areas in the auditorycortexrdquo Journal of Cognitive Neuroscience vol 23 no 10 pp3146ndash3152 2011

[108] M Coltheart B Curtis P Atkins and M Haller ldquoMod-els of Reading Aloud Dual-Route and Parallel-Distributed-Processing Approachesrdquo Psychological Review vol 100 no 4pp 589ndash608 1993

[109] M Coltheart ldquoModeling reading The dual-route approachrdquo inThe Science of Reading A Handbook pp 6ndash23 2005

38 Complexity

[110] M Pitt K Johnson E Hume S Kiesling and W RaymondldquoThe Buckeye corpus of conversational speech Labeling con-ventions and a test of transcriber reliabilityrdquo Speech Communi-cation vol 45 no 1 pp 89ndash95 2005

[111] K Johnson ldquoMassive reduction in conversational AmericanEnglishrdquo in Proceedings of the Spontaneous Speech Data andAnalysis Proceedings of the 1st Session of the 10th InternationalSymposium pp 29ndash54 The National International Institute forJapanese Language Tokyo Japan 2004

[112] C Phillips ldquoLevels of representation in the electrophysiology ofspeech perceptionrdquo Cognitive Science vol 25 no 5 pp 711ndash7312001

[113] R L Diehl A J Lotto and L L Holt ldquoSpeech perceptionrdquoAnnual Review of Psychology vol 55 pp 149ndash179 2004

[114] D Norris and J M McQueen ldquoShortlist B a Bayesian model ofcontinuous speech recognitionrdquo Psychological Review vol 115no 2 pp 357ndash395 2008

[115] S Hawkins ldquoRoles and representations of systematic finephonetic detail in speech understandingrdquo Journal of Phoneticsvol 31 no 3-4 pp 373ndash405 2003

[116] CCucchiarini andH Strik ldquoAutomatic Phonetic TranscriptionAn overviewrdquo in Proceedings of the 15th ICPhS pp 347ndash350Barcelona Spain 2003

[117] K JohnsonTheAuditoryPerceptual Basis for Speech Segmenta-tion Ohio State University Working Papers in Linguistics 1997

[118] H Fletcher ldquoAuditory patternsrdquo Reviews of Modern Physics vol12 no 1 pp 47ndash65 1940

[119] D Arnold ldquoAcousticndlcoder Coding sound files for use withndl R package version 101rdquo 2017

[120] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2016

[121] J Pickett and I Pollack ldquoIntelligibility of excerpts from fluentspeech Effects of rate of utterance and duration of excerptrdquoLanguage and Speech vol 6 pp 151ndash164 1963

[122] L Shockey ldquoPerception of reduced forms by non-native speak-ers of Englishrdquo in Sound Patterns of Spontaneous Speech DDuez Ed pp 97ndash100 ESCA Aix 1998

[123] M Ernestus H Baayen and R Schreuder ldquoThe recognition ofreduced word formsrdquo Brain and Language vol 81 no 1-3 pp162ndash173 2002

[124] R H Baayen C Shaoul J Willits and M Ramscar ldquoCom-prehension without segmentation a proof of concept withnaive discriminative learningrdquo Language Cognition and Neu-roscience vol 31 no 1 pp 106ndash128 2016

[125] C P Browman and L Goldstein ldquoArticulatory phonology Anoverviewrdquo Phonetica vol 49 no 3-4 pp 155ndash180 1992

[126] J A Tourville and F H Guenther ldquoThe DIVA model A neuraltheory of speech acquisition and productionrdquo Language andCognitive Processes vol 26 no 7 pp 952ndash981 2011

[127] G Csardi and T Nepusz ldquoThe igraph software package forcomplex network researchrdquo InterJournal Complex Systems vol1695 no 5 pp 1ndash9 2006

[128] M Ernestus and R H Baayen ldquoPredicting the unpredictableInterpreting neutralized segments in Dutchrdquo Language vol 79no 1 pp 5ndash38 2003

[129] R Weingarten G Nottbusch and U Will ldquoMorphemes sylla-bles and graphemes in written word productionrdquo inMultidisci-plinary Approaches to Speech Production T Pechmann and CHabel Eds pp 529ndash572 Mouton de Gruyter Berlin Germany2004

[130] R Bertram F E Toslashnnessen S Stromqvist J Hyona andP Niemi ldquoCascaded processing in written compound wordproductionrdquo Frontiers in Human Neuroscience vol 9 pp 1ndash102015

[131] T Cho ldquoEffects of morpheme boundaries on intergesturaltiming Evidence from Koreanrdquo Phonetica vol 58 no 3 pp129ndash162 2001

[132] S N Wood Generalized Additive Models Chapman ampHallCRC NY USA 2017

[133] M Seidenberg ldquoSublexical structures in visual word recogni-tion Access units or orthographic redundancyrdquo in Attentionand Performance XII M Coltheart Ed pp 245ndash264 LawrenceErlbaum Associates Hove 1987

[134] J M McQueen ldquoSegmentation of continuous speech usingphonotacticsrdquo Journal of Memory and Language vol 39 no 1pp 21ndash46 1998

[135] J B Hay Causes and Consequences of Word Structure Rout-ledge NY USA London UK 2003

[136] J B Hay ldquoFrom speech perception to morphology Affix-ordering revisitedrdquo Language vol 78 pp 527ndash555 2002

[137] J B Hay and R H Baayen ldquoPhonotactics parsing and produc-tivityrdquo Italian Journal of Linguistics vol 15 no 1 pp 99ndash1302003

[138] J Hay J Pierrehumbert and M Beckman ldquoSpeech perceptionwell-formedness and the statistics of the lexiconrdquo Papers inlaboratory phonology VI pp 58ndash74 2004

[139] M Ferro C Marzi and V Pirrelli ldquoA self-organizing modelof word storage and processing implications for morphologylearningrdquo Lingue e Linguaggio vol 10 no 2 pp 209ndash226 2011

[140] F ChersiM Ferro G Pezzulo andV Pirrelli ldquoTopological self-organization and prediction learning support both action andlexical chains in the brainrdquo Topics in Cognitive Science vol 6no 3 pp 476ndash491 2014

[141] V Pirrelli M Ferro and C Marzi ldquoComputational complexityof abstractive morphologyrdquo in Understanding and MeasuringMorphological Complexity M Bearman D Brown and G GCorbett Eds pp 141ndash166 Oxford University Press 2015

[142] C Marzi M Ferro and V Pirrelli ldquoIs inflectional irregularitydysfunctional to human processingrdquo in Abstract Booklet TheMental Lexicon 2018 V Kuperman Ed University of AlbertaEdmonton 2018

[143] E Keuleers M Stevens P Mandera and M Brysbaert ldquoWordknowledge in the crowd Measuring vocabulary size and wordprevalence in a massive online experimentrdquo The QuarterlyJournal of Experimental Psychology vol 68 no 8 pp 1665ndash16922015

[144] M Ernestus Voice Assimilation And Segment Reduction inCasual Dutch A Corpus-Based Study of the Phonology-PhoneticsInterface LOT Utrecht the Netherlands 2000

[145] R Kemps M Ernestus R Schreuder and H Baayen ldquoProcess-ing reducedword formsThe suffix restoration effectrdquoBrain andLanguage vol 90 no 1-3 pp 117ndash127 2004

[146] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) vol 1 pp 886ndash893 June 2005

[147] G Hickok ldquoThe architecture of speech production and the roleof the phoneme in speech processingrdquo Language Cognition andNeuroscience vol 29 no 1 pp 2ndash20 2014

[148] B V Tucker M Sims and R H Baayen Opposing Forceson Acoustic Duration Manuscript University of Alberta andUniversity of Tubingen 2018

Complexity 39

[149] I Plag J Homann and G Kunter ldquoHomophony and mor-phology The acoustics of word-final S in Englishrdquo Journal ofLinguistics vol 53 no 1 pp 181ndash216 2017

[150] F Tomaschek I Plag M Ernestus and R H BaayenModeling the Duration of Word-Final s in English withNaive Discriminative Learning Manuscript University ofSiegenTubingenNijmegen 2018

[151] R E Kalman ldquoA new approach to linear filtering and predictionproblemsrdquo Journal of Fluids Engineering vol 82 no 1 pp 35ndash451960

[152] B Russell ldquoOn denotingrdquo Mind vol 14 no 56 pp 479ndash4931905

[153] N Hornstein Logical Form FromGB toMinimalism Blackwell1995

[154] R Jackendoff Semantic StructuresMITPress Cambridge 1990[155] J J McCarthy ldquoA prosodic theory of non-concatenative mor-

phologyrdquo Linguistic Inquiry vol 12 pp 373ndash418 1981[156] A Ussishkin ldquoA Fixed Prosodic Theory of Nonconcatenative

Templaticmorphologyrdquo Natural Language amp Linguistic Theoryvol 23 no 1 pp 169ndash218 2005

[157] A Ussishkin ldquoAffix-favored contrast inequity and psycholin-guistic grounding for non-concatenative morphologyrdquo Mor-phology vol 16 no 1 pp 107ndash125 2006

[158] RW Young andWMorganTheNavajo Language A Grammarand Colloquial Dictionary University of New Mexico Press1980

[159] S C Levinson and A Majid ldquoThe island of time Yelı Dnye thelanguage of Rossel islandrdquo Frontiers in psychology vol 4 2013

[160] S C Levinson ldquoThe language of space in Yelı Dnyerdquo inGrammars of Space Explorations in Cognitive Diversity pp 157ndash203 Cambridge University Press 2006

[161] E Shafaei-Bajestan and R H Baayen ldquoWide Learning forAuditory Comprehensionrdquo in Proceedings of the Interspeech2018 2018

[162] J Hay and K Drager ldquoStuffed toys and speech perceptionrdquoLinguistics vol 48 no 4 pp 865ndash892 2010

[163] V Henri and C Henri ldquoOn our earliest recollections ofchildhoodrdquo Psychological Review vol 2 pp 215-216 1895

[164] S Freud ldquoChildhood and concealing memoriesrdquo in The BasicWritings of Sigmund Freud The Modern Library NY USA19051953

[165] P J Bauer and M Larkina ldquoThe onset of childhood amnesiain childhood A prospective investigation of the course anddeterminants of forgetting of early-life eventsrdquoMemory vol 22no 8 pp 907ndash924 2014

[166] M S Seidenberg and L M Gonnerman ldquoExplaining deriva-tional morphology as the convergence of codesrdquo Trends inCognitive Sciences vol 4 no 9 pp 353ndash361 2000

[167] M Coltheart K Rastle C Perry R Langdon and J ZieglerldquoDRC a dual route cascaded model of visual word recognitionand reading aloudrdquoPsychological Review vol 108 no 1 pp 204ndash256 2001

[168] M V Butz and E F Kutter How the Mind Comes into BeingIntroducing Cognitive Science from a Functional and Computa-tional Perspective Oxford University Press 2016

[169] M W Harm and M S Seidenberg ldquoPhonology readingacquisition and dyslexia Insights from connectionist modelsrdquoPsychological Review vol 106 no 3 pp 491ndash528 1999

[170] L M Gonnerman M S Seidenberg and E S AndersenldquoGraded semantic and phonological similarity effects in prim-ing Evidence for a distributed connectionist approach to

morphologyrdquo Journal of Experimental Psychology General vol136 no 2 pp 323ndash345 2007

[171] H Zhao O Gallo I Frosio and J Kautz ldquoLoss Functions forImage Restoration With Neural Networksrdquo IEEE Transactionson Computational Imaging vol 3 no 1 pp 47ndash57 2017

[172] C Clarke and P Luce ldquoPerceptual adaptation to speakercharacteristics Vot boundaries in stop voicing categorizationrdquoin Proceedings of the ISCA Workshop on Plasticity in SpeechPerception 2005

[173] DNorris JMMcQueen andACutler ldquoPerceptual learning inspeechrdquo Cognitive Psychology vol 47 no 2 pp 204ndash238 2003

[174] A Schweitzer and N Lewandowski ldquoSocial factors in conver-gence of f1 and f2 in spontaneous speechrdquo in Proceedings ofthe 10th International Seminar on Speech Production CologneGermany 2014

[175] M J Pickering and S Garrod ldquoToward a mechanistic psychol-ogy of dialoguerdquo Behavioral and Brain Sciences vol 27 no 2 pp169ndash190 2004

[176] C M Clarke and M F Garrett ldquoRapid adaptation to foreign-accented Englishrdquo The Journal of the Acoustical Society ofAmerica vol 116 no 6 pp 3647ndash3658 2004

[177] W J Levelt ldquoDie konnektionistische moderdquo Sprache UndKognition vol 10 no 2 pp 61ndash72 1991

[178] A Kostic ldquoInformational load constraints on processinginflected morphologyrdquo in Morphological Aspects of LanguageProcessing L B Feldman Ed Lawrence Erlbaum Inc Publish-ers New Jersey USA 1995

[179] J P BlevinsWord andParadigmMorphology OxfordUniversityPress 2016

[180] E Forstemann Uber Deutsche volksetymologie Zeitschrift furvergleichende Sprachforschung auf dem Gebiete des DeutschenGriechischen und Lateinischen 1852

[181] K Nault Morphological Therapy Protocal [Ph D thesis] Uni-versity of Alberta Edmonton 2010

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 24: The Discriminative Lexicon: A Unified Computational Model ...downloads.hindawi.com/journals/complexity/2019/4895891.pdf · as developed within computational linguistics and com-puter

24 Complexity

1

1

1051

024023

03

023

1

001 001

1

02

03

023

002

1 0051

bl

blE

lEn

EndndI

dIN

IN

INI

NIN

dId

Id

IdI

dII

IIN

ndz

dz

dzI

zIN

nd

EnI

nIN

lEI

EIN

Figure 9 The directed graph for blending Vertices represent triphones including triphones such as IIN that were posited by the graphalgorithm to bridge potentially relevant transitions that are not instantiated in the training data Edges are labelled with their edge weightsThe graph the edge weights of which are specific to the lexomes blend and continuous incorporates not only the path for blending butalso the paths for blend blends and blended

The uncertainty that arises at branching points in thetriphone graph is of interest in the light of several exper-imental results For instance inter keystroke intervals intyping become longer at syllable and morph boundaries[129 130] Evidence from articulography suggests variabilityis greater at morph boundaries [131] Longer keystroke exe-cution times and greater articulatory variability are exactlywhat is expected under reduced edge support We thereforeexamined whether the edge weights at the first branchingpoint are predictive for lexical processing To this end weinvestigated the acoustic duration of the segment at the centerof the first triphone with a reduced LDL edge weight (in thepresent example d in ndI henceforth lsquobranching segmentrsquo)for those words in our study that are attested in the Buckeyecorpus [110]

The dataset we extracted from the Buckeye corpus com-prised 15105 tokens of a total of 1327 word types collectedfrom 40 speakers For each of these words we calculatedthe relative duration of the branching segment calculatedby dividing segment duration by word duration For thepurposes of statistical evaluation the distribution of thisrelative duration was brought closer to normality bymeans ofa logarithmic transformationWe fitted a generalized additivemixed model [132] to log relative duration with randomintercepts for word and speaker log LDL edge weight aspredictor of interest and local speech rate (Phrase Rate)log neighborhood density (Ncount) and log word frequencyas control variables A summary of this model is presentedin Table 7 As expected an increase in LDL Edge Weight

goes hand in hand with a reduction in the duration of thebranching segment In other words when the edge weight isreduced production is slowed

The adverse effects of weaknesses in a wordrsquos path inthe graph where the path branches is of interest againstthe discussion about the function of weak links in diphonetransitions in the literature on comprehension [133ndash138] Forcomprehension it has been argued that bigram or diphonelsquotroughsrsquo ie low transitional probabilities in a chain of hightransitional probabilities provide points where sequencesare segmented and parsed into their constituents Fromthe perspective of discrimination learning however low-probability transitions function in exactly the opposite wayfor comprehension [54 124] Naive discriminative learningalso predicts that troughs should give rise to shorter process-ing times in comprehension but not because morphologicaldecomposition would proceed more effectively Since high-frequency boundary bigrams and diphones are typically usedword-internally across many words they have a low cuevalidity for thesewords Conversely low-frequency boundarybigrams are much more typical for very specific base+affixcombinations and hence are better discriminative cues thatafford enhanced activation of wordsrsquo lexomes This in turngives rise to faster processing (see also Ramscar et al [78] forthe discriminative function of low-frequency bigrams in low-frequency monolexomic words)

There is remarkable convergence between the directedgraphs for speech production such as illustrated in Figure 9and computational models using temporal self-organizing

Complexity 25

Table 7 Statistics for the partial effects of a generalized additive mixed model fitted to the relative duration of edge segments in the Buckeyecorpus The trend for log frequency is positive accelerating TPRS thin plate regression spline

A parametric coefficients Estimate Std Error t-value p-valueIntercept -19165 00592 -323908 lt 00001Log LDL Edge Weight -01358 00400 -33960 00007Phrase Rate 00047 00021 22449 00248Log Ncount 01114 00189 58837 lt 00001B smooth terms edf Refdf F-value p-valueTPRS log Frequency 19007 19050 75495 00004random intercepts word 11498839 13230000 314919 lt 00001random intercepts speaker 302995 390000 345579 lt 00001

maps (TSOMs [139ndash141]) TSOMs also use error-drivenlearning and in recent work attention is also drawn toweak edges in wordsrsquo paths in the TSOM and the conse-quences thereof for lexical processing [142] We are not usingTSOMs however one reason being that in our experiencethey do not scale up well to realistically sized lexiconsA second reason is that we are pursuing the hypothesisthat form serves meaning and that self-organization ofform by itself is not necessary This hypothesis is likelytoo strong especially as we do not provide a rationale forthe trigram and triphone units that we use to representaspects of form It is worth noting that spatial organizationof form features is not restricted to TSOMs Although inour approach the spatial organization of triphone features isleft unspecified such an organization can be easily enforced(see [85] for further details) by using an algorithm suchas graphopt which self-organizes the vertices of a graphinto a two-dimensional plane Figure 9 was obtained withthis algorithm (Graphopt was developed for the layout oflarge graphs httpwwwschmuhlorggraphopt andis implemented in the igraph package Graphopt uses basicprinciples of physics to iteratively determine an optimallayout Each node in the graph is given both mass and anelectric charge and edges between nodes are modeled asspringsThis sets up a system inwhich there are attracting andrepelling forces between the vertices of the graph and thisphysical system is simulated until it reaches an equilibrium)

54 Discussion Our production network reconstructsknown wordsrsquo forms with a high accuracy 999 formonolexomic words 92 for inflected words and 99 forderived words For novel complex words accuracy under10-fold cross-validation was 62 for inflected words and 75for derived words

The drop in performance for novel forms is perhapsunsurprising given that speakers understand many morewords than they themselves produce even though they hearnovel forms on a fairly regular basis as they proceed throughlife [79 143]

However we also encountered several technical problemsthat are not due to the algorithm but to the representationsthat the algorithm has had to work with First it is surprisingthat accuracy is as high as it is given that the semanticvectors are constructed from a small corpus with a very

simple discriminative algorithm Second we encounteredinconsistencies in the phonological forms retrieved from thecelex database inconsistencies that in part are due to theuse of discrete triphones Third many cases where the modelpredictions do not match the targeted triphone sequence thetargeted forms have a minor irregularity (eg resound with[z] instead of [s]) Fourth several of the typical errors that themodel makes are known kinds of speech errors or reducedforms that one might encounter in engaged conversationalspeech

It is noteworthy that themodel is almost completely data-driven The triphones are derived from celex and otherthan somemanual corrections for inconsistencies are derivedautomatically from wordsrsquo phone sequences The semanticvectors are based on the tasa corpus and were not in any wayoptimized for the production model Given the triphone andsemantic vectors the transformation matrix is completelydetermined No by-hand engineering of rules and exceptionsis required nor is it necessary to specify with hand-codedlinks what the first segment of a word is what its secondsegment is etc as in the weaver model [37] It is only in theheuristic graph algorithm that three thresholds are requiredin order to avoid that graphs become too large to remaincomputationally tractable

The speech production system that emerges from thisapproach comprises first of all an excellent memory forforms that have been encountered before Importantly thismemory is not a static memory but a dynamic one It is not arepository of stored forms but it reconstructs the forms fromthe semantics it is requested to encode For regular unseenforms however a second network is required that projects aregularized semantic space (obtained by accumulation of thesemantic vectors of content and inflectional or derivationalfunctions) onto the triphone output space Importantly itappears that no separate transformations or rules are requiredfor individual inflectional or derivational functions

Jointly the two networks both of which can also betrained incrementally using the learning rule ofWidrow-Hoff[44] define a dual route model attempts to build a singleintegrated network were not successfulThe semantic vectorsof derived words are too idiosyncratic to allow generalizationfor novel forms It is the property of the semantic vectorsof derived lexomes described in Section 323 namely thatthey are close to but not inside the cloud of their content

26 Complexity

lexomes which makes them productive Novel forms do notpartake in the idiosyncracies of lexicalized existing wordsbut instead remain bound to base and derivational lexomesIt is exactly this property that makes them pronounceableOnce produced a novel form will then gravitate as experi-ence accumulates towards the cloud of semantic vectors ofits morphological category meanwhile also developing itsown idiosyncracies in pronunciation including sometimeshighly reduced pronunciation variants (eg tyk for Dutchnatuurlijk ([natyrlk]) [111 144 145] It is perhaps possibleto merge the two routes into one but for this much morefine-grained semantic vectors are required that incorporateinformation about discourse and the speakerrsquos stance withrespect to the adressee (see eg [115] for detailed discussionof such factors)

6 Bringing in Time

Thus far we have not considered the role of time The audiosignal comes in over time longerwords are typically readwithmore than one fixation and likewise articulation is a temporalprocess In this section we briefly outline how time can bebrought into the model We do so by discussing the readingof proper names

Proper names (morphologically compounds) pose achallenge to compositional theories as the people referredto by names such as Richard Dawkins Richard NixonRichardThompson and SandraThompson are not obviouslysemantic composites of each other We therefore assignlexomes to names irrespective of whether the individu-als referred to are real or fictive alive or dead Further-more we assume that when the personal name and familyname receive their own fixations both names are under-stood as pertaining to the same named entity which istherefore coupled with its own unique lexome Thus forthe example sentences in Table 8 the letter trigrams ofJohn are paired with the lexome JohnClark and like-wise the letter trigrams of Clark are paired with this lex-ome

By way of illustration we obtained thematrix of semanticvectors training on 100 randomly ordered tokens of thesentences of Table 8 We then constructed a trigram cuematrix C specifying for each word (John Clark wrote )which trigrams it contains In parallel a matrix L specify-ing for each word its corresponding lexomes (JohnClarkJohnClark write ) was set up We then calculatedthe matrix F by solving CF = L and used F to cal-culate estimated (predicted) semantic vectors L Figure 10presents the correlations of the estimated semantic vectorsfor the word forms John John Welsch Janet and Clarkwith the targeted semantic vectors for the named entitiesJohnClark JohnWelsch JohnWiggam AnneHastieand JaneClark For the sequentially read words John andWelsch the semantic vector generated by John and thatgenerated by Welsch were summed to obtain the integratedsemantic vector for the composite name

Figure 10 upper left panel illustrates that upon readingthe personal name John there is considerable uncertainty

about which named entity is at issue When subsequentlythe family name is read (upper right panel) uncertainty isreduced and John Welsch now receives full support whereasother named entities have correlations close to zero or evennegative correlations As in the small world of the presenttoy example Janet is a unique personal name there is nouncertainty about what named entity is at issue when Janetis read For the family name Clark on its own by contrastJohn Clark and Janet Clark are both viable options

For the present example we assumed that every ortho-graphic word received one fixation However depending onwhether the eye lands far enough into the word namessuch as John Clark and compounds such as graph theorycan also be read with a single fixation For single-fixationreading learning events should comprise the joint trigrams ofthe two orthographic words which are then simultaneouslycalibrated against the targeted lexome (JohnClark graph-theory)

Obviously the true complexities of reading such asparafoveal preview are not captured by this example Never-theless as a rough first approximation this example showsa way forward for integrating time into the discriminativelexicon

7 General Discussion

We have presented a unified discrimination-driven modelfor visual and auditory word comprehension and wordproduction the discriminative lexicon The layout of thismodel is visualized in Figure 11 Input (cochlea and retina)and output (typing and speaking) systems are presented inlight gray the representations of themodel are shown in darkgray For auditory comprehension we make use of frequencyband summary (FBS) features low-level cues that we deriveautomatically from the speech signal The FBS features serveas input to the auditory network F119886 that maps vectors of FBSfeatures onto the semantic vectors of S For reading lettertrigrams represent the visual input at a level of granularitythat is optimal for a functional model of the lexicon (see [97]for a discussion of lower-level visual features) Trigrams arerelatively high-level features compared to the FBS featuresFor features implementing visual input at a much lower levelof visual granularity using histograms of oriented gradients[146] see Linke et al [15] Letter trigrams are mapped by thevisual network F119900 onto S

For reading we also implemented a network heredenoted by K119886 that maps trigrams onto auditory targets(auditory verbal images) represented by triphones Theseauditory triphones in turn are mapped by network H119886onto the semantic vectors S This network is motivated notonly by our observation that for predicting visual lexicaldecision latencies triphones outperform trigrams by a widemargin Network H119886 is also part of the control loop forspeech production see Hickok [147] for discussion of thiscontrol loop and the necessity of auditory targets in speechproduction Furthermore the acoustic durations with whichEnglish speakers realize the stem vowels of English verbs[148] as well as the duration of word final s in English [149]

Complexity 27

Table 8 Example sentences for the reading of proper names To keep the example simple inflectional lexomes are not taken into account

John Clark wrote great books about antsJohn Clark published a great book about dangerous antsJohn Welsch makes great photographs of airplanesJohn Welsch has a collection of great photographs of airplanes landingJohn Wiggam will build great harpsichordsJohn Wiggam will be a great expert in tuning harpsichordsAnne Hastie has taught statistics to great second year studentsAnne Hastie has taught probability to great first year studentsJanet Clark teaches graph theory to second year studentsJohnClark write great book about antJohnClark publish a great book about dangerous antJohnWelsch make great photograph of airplaneJohnWelsch have a collection of great photograph airplane landingJohnWiggam future build great harpsichordJohnWiggam future be a great expert in tuning harpsichordAnneHastie have teach statistics to great second year studentAnneHastie have teach probability to great first year studentJanetClark teach graphtheory to second year student

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

John

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

John Welsch

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

Janet

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

Clark

Pear

son

corr

elat

ion

minus0

50

00

51

0

Figure 10 Correlations of estimated semantic vectors for John JohnWelsch Janet andClarkwith the targeted semantic vectors of JohnClarkJohnWelsch JohnWiggam AnneHastie and JaneClark

can be predicted from ndl networks using auditory targets todiscriminate between lexomes [150]

Speech production is modeled by network G119886 whichmaps semantic vectors onto auditory targets which in turnserve as input for articulation Although current modelsof speech production assemble articulatory targets fromphonemes (see eg [126 147]) a possibility that is certainlycompatible with our model when phones are recontextual-ized as triphones we think that it is worthwhile investigatingwhether a network mapping semantic vectors onto vectors ofarticulatory parameters that in turn drive the articulators canbe made to work especially since many reduced forms arenot straightforwardly derivable from the phone sequences oftheir lsquocanonicalrsquo forms [111 144]

We have shown that the networks of our model havereasonable to high production and recognition accuraciesand also generate a wealth of well-supported predictions forlexical processing Central to all networks is the hypothesisthat the relation between form and meaning can be modeleddiscriminatively with large but otherwise surprisingly simplelinear networks the underlyingmathematics of which (linearalgebra) is well-understood Given the present results it isclear that the potential of linear algebra for understandingthe lexicon and lexical processing has thus far been severelyunderestimated (the model as outlined in Figure 11 is amodular one for reasons of computational convenience andinterpretabilityThe different networks probably interact (seefor some evidence [47]) One such interaction is explored

28 Complexity

typing speaking

orthographic targets

trigrams

auditory targets

triphones

semantic vectors

TASA

auditory cues

FBS features

orthographic cues

trigrams

cochlea retina

To Ta

Ha

GoGa

KaS

Fa Fo

CoCa

Figure 11 Overview of the discriminative lexicon Input and output systems are presented in light gray the representations of the model areshown in dark gray Each of these representations is a matrix with its own set of features Arcs are labeled with the discriminative networks(transformationmatrices) thatmapmatrices onto each otherThe arc from semantic vectors to orthographic vectors is in gray as thismappingis not explored in the present study

in Baayen et al [85] In their production model for Latininflection feedback from the projected triphone paths backto the semantics (synthesis by analysis) is allowed so thatof otherwise equally well-supported paths the path that bestexpresses the targeted semantics is selected For informationflows between systems in speech production see Hickok[147]) In what follows we touch upon a series of issues thatare relevant for evaluating our model

Incremental learning In the present study we esti-mated the weights on the connections of these networkswith matrix operations but importantly these weights canalso be estimated incrementally using the learning rule ofWidrow and Hoff [44] further improvements in accuracyare expected when using the Kalman filter [151] As allmatrices can be updated incrementally (and this holds aswell for the matrix with semantic vectors which are alsotime-variant) and in theory should be updated incrementally

whenever information about the order of learning events isavailable the present theory has potential for studying lexicalacquisition and the continuing development of the lexiconover the lifetime [79]

Morphology without compositional operations Wehave shown that our networks are predictive for a rangeof experimental findings Important from the perspective ofdiscriminative linguistics is that there are no compositionaloperations in the sense of Frege and Russell [1 2 152] Thework on compositionality in logic has deeply influencedformal linguistics (eg [3 153]) and has led to the beliefthat the ldquoarchitecture of the language facultyrdquo is groundedin a homomorphism between a calculus (or algebra) ofsyntactic representations and a calculus (or algebra) based onsemantic primitives Within this tradition compositionalityarises when rules combining representations of form arematched with rules combining representations of meaning

Complexity 29

The approach of Marelli and Baroni [60] who derive thesemantic vector of a complex word from the semantic vectorof its base and a dedicated affix-specific mapping from thepertinent base words to the corresponding derived wordsis in the spirit of compositionality in that a change in form(through affixation) goes hand in hand with a change inmeaning (modeled with a linear mapping in semantic space)

The perspective on morphological processing developedin the present study breaks with this tradition Althoughsemantic vectors for inflected words are obtained by summa-tion of the semantic vectors of the base word and those ofits inflectional functions (see [85] for modeling of the richverbal paradigms of Latin) the form vectors correspondingto the resulting semantic vectors are not obtained by thesummation of vectors for chunks of form nor by any otheroperations on forms Attempts to align chunks of wordsrsquo formwith their semantic structures mdash typically represented bygraphs constructed over semantic primitives (see eg [154])mdash are destined to end in quagmires of arbitrary decisionsabout the number of inflectional classes required (Estonian30 or 300) what the chunks should look like (does Germanhave a morpheme d in its articles and demonstratives derdie das diese ) whether Semitic morphology requiresoperations on form on one or two tiers [155ndash157] andhow many affixal position slots are required for Athebascanlanguages such asNavajo [158] A language that is particularlypainful for compositional theories is Yelı Dnye an Indo-Pacific language spoken on Rossel island (in the south-east ofPapuaNewGuinea)The language has a substantial inventoryof over a 1000 function words used for verb agreementTypically these words are monosyllables that simultaneouslyspecify values for negation tense aspect person and numberof subject deixis evidentiality associated motion and coun-terfactuality Furthermore verbs typically have no less thaneight different stems the use of which depends on particularconstellations of inflectional functions [159 160] Yelı Dnyethus provides a striking counterexample to the hypothesisthat a calculus of meaning would be paralleled by a calculusof form

Whereas the theory of linear discriminative learningradically rejects the idea that morphological processing isgrounded in compositionality as defined in mainstream for-mal linguistic theory it does allow for the conjoint expressionin form of lexical meanings and inflectional functions andfor forms to map onto such conjoint semantic vectors Butjust asmultiple inflectional functions can be expressed jointlyin a single word multiple words can map onto nonconjointsemantic vectors as was illustrated above for named entityrecognition John Wiggam can be understood to refer to avery specific person without this person being a composi-tional function of the proper name John and the surnameWiggam In both these cases there is no isomorphy betweeninflectional functions and lexical meanings on the one handand chunks of form on the other hand

Of course conjoint vectors for lexical meanings andinflectional functions can be implemented andmade to workonly because a Latin verb form such as sapıvissemus isunderstood when building the semantic space S to expresslexomes for person (first) number (plural) voice (active)

tense (pluperfect) mood (subjunctive) and lexical meaning(to know) (see [85] for computational modeling details)Although this could loosely be described as a decomposi-tional approach to inflected words we prefer not to use theterm lsquodecompositionalrsquo as in formal linguistics this termas outlined above has a very different meaning A moreadequate terminology would be that the system is conjointrealizational for production and conjoint inferential for com-prehension (we note here that we have focused on temporallyconjoint realization and inference leaving temporally disjointrealization and inference for sequences of words and possiblycompounds and fixed expression for further research)

Naive versus linear discriminative learning There isone important technical difference between the learningengine of the current study ldl and naive discriminativelearning (ndl) Because ndl makes the simplifying assump-tion that outcomes are orthogonal the weights from the cuesto a given outcome are independent of the weights from thecues to the other outcomes This independence assumptionwhich motivates the word naive in naive discriminativelearning makes it possible to very efficiently update networkweights during incremental learning In ldl by contrast thisindependence no longer holds learning is no longer naiveWe therefore refer to our networks not as ndl networksbut simply as linear discriminative learning networks Actualincremental updating of ldl networks is computationallymore costly but further numerical optimization is possibleand implementations are underway

Scaling An important property of the present approachis that good results are obtained already for datasets ofmodestsize Our semantic matrix is derived from the tasa corpuswhich with only 10 million words is dwarfed by the 28 billionwords of the ukWaC corpus used by Marelli and Baroni [60]more words than any individual speaker can ever encounterin their lifetime Similarly Arnold et al [18] report goodresults for word identification when models are trained on20 hours of speech which contrasts with the huge volumesof speech required for training deep learning networks forspeech recognition Although performance increases some-whatwithmore training data [161] it is clear that considerableheadway can be made with relatively moderate volumesof speech It thus becomes feasible to train models onfor instance Australian English and New Zealand Englishusing already existing speech corpora and to implementnetworks that make precise quantitative predictions for howunderstanding is mediated by listenersrsquo expectations abouttheir interlocutors (see eg [162])

In this study we have obtained good results with asemantic matrix with a dimensionality of around 4000lexomes By itself we find it surprising that a semantic vectorspace can be scaffolded with a mere 4000 words But thisfindingmay also shed light on the phenomenon of childhoodamnesia [163 164] Bauer and Larkina [165] pointed outthat young children have autobiographical memories thatthey may not remember a few years later They argue thatthe rate of forgetting diminishes as we grow older andstabilizes in adulthood Part of this process of forgettingmay relate to the changes in the semantic matrix Cruciallywe expect the correlational structure of lexomes to change

30 Complexity

considerably over time as experience accumulates If this isindeed the case the semantic vectors for the lexomes foryoung children are different from the corresponding lexomesfor older children which will again differ from those ofadults As a consequence autobiographical memories thatwere anchored to the lexomes at a young age can no longerbe accessed as the semantic vectors to which these memorieswere originally anchored no longer exist

No examplars Given how we estimate the transforma-tion matrices with which we navigate between form andmeaning it might seem that our approach is in essence animplementation of an exemplar-based theory of the mentallexicon After all for instance the C119886 matrix specifies foreach auditory word form (exemplar) its pertinent frequencyband summary features However the computational imple-mentation of the present study should not be confused withthe underlying algorithmic conceptualization The underly-ing conceptualization is that learning proceeds incrementallytrial by trial with the weights of transformation networksbeing updated with the learning rule of Widrow and Hoff[44] In the mean trial-by-trial updating with the Widrow-Hoff learning rule results in the same expected connectionweights as obtained by the present estimates using thegeneralized inverse The important conceptual advantage ofincremental learning is however that there is no need to storeexemplars By way of example for auditory comprehensionacoustic tokens are given with the input to the learningsystemThese tokens the ephemeral result of interactingwiththe environment leave their traces in the transformationmatrix F119886 but are not themselves stored The same holdsfor speech production We assume that speakers dynamicallyconstruct the semantic vectors from their past experiencesrather than retrieving these semantic vectors from somedictionary-like fixed representation familiar from currentfile systems The dynamic nature of these semantic vectorsemerged loud and clear from the modeling of novel complexwords for speech production under cross-validation In otherwords our hypothesis is that all input and output vectors areephemeral in the sense that they represent temporary statesof the cognitive system that are not themselves represented inthat system but that leave their traces in the transformationmatrices that jointly characterize and define the cognitivesystem that we approximate with the discriminative lexicon

This dynamic perspective on the mental lexicon asconsisting of several coupled networks that jointly bringthe cognitive system into states that in turn feed into fur-ther networks (not modeled here) for sentence integration(comprehension) and articulation (production) raises thequestion about the status of units in this theory Units playan important role when constructing numeric vectors forform and meaning Units for letter trigrams and phonetrigrams are of course context-enriched letter and phoneunits Content lexomes as well as lexomes for derivationaland inflectional functions are crutches for central semanticfunctions ranging from functions realizing relatively concreteonomasiological functions (lsquothis is a dog not a catrsquo) toabstract situational functions allowing entities events andstates to be specified on the dimensions of time spacequantity aspect etc However crucial to the central thrust of

the present study there are no morphological symbols (unitscombining form andmeaning) normorphological formunitssuch as stems and exponents of any kind in the modelAbove we have outlined the many linguistic considerationsthat have led us not to want to incorporate such units aspart of our theory Importantly as there are no hidden layersin our model it is not possible to seek for lsquosubsymbolicrsquoreflexes of morphological units in patterns of activation overhidden layers Here we depart from earlier connectionistmodels which rejected symbolic representations but retainedthe belief that there should be morphological representationalbeit subsymbolic ones Seidenberg and Gonnerman [166]for instance discuss morphological representations as beinglsquointerlevel representationsrsquo that are emergent reflections ofcorrelations among orthography phonology and semanticsThis is not to say that themore agglutinative a language is themore the present model will seem to the outside observer tobe operating withmorphemes But themore a language tendstowards fusional or polysyntheticmorphology the less strongthis impression will be

Model complexity Next consider the question of howdifficult lexical processing actually is Sidestepping the com-plexities of the neural embedding of lexical processing [126147] here we narrow down this question to algorithmiccomplexity State-of-the-art models in psychology [37 114167] implementmany layers of hierarchically organized unitsand many hold it for an established fact that such units arein some sense psychologically real (see eg [19 91 168])However empirical results can be equally well understoodwithout requiring the theoretical construct of the morpheme(see eg [47 54 58 166 169 170]) The present studyillustrates this point for lsquoboundaryrsquo effects in written andoral production When paths in the triphone graph branchuncertainty increases and processing slows down Althoughobserved lsquoboundaryrsquo effects are compatible with a post-Bloomfieldian lexicon the very same effects arise in ourmodel even though morphemes do not figure in any way inour computations

Importantly the simplicity of the linear mappings in ourmodel of the discriminative lexicon allows us to sidestep theproblem that typically arise in computationally implementedfull-scale hierarchical systems namely that errors arisingat lower levels propagate to higher levels The major stepsforward made in recent end-to-end models in languageengineering suggest that end-to-end cognitive models arealso likely to perform much better One might argue thatthe hidden layers of deep learning networks represent thetraditional lsquohidden layersrsquo of phonemes morphemes andword forms mediating between form and meaning in lin-guistic models and offshoots thereof in psychology Howeverthe case of baboon lexical learning [12 13] illustrates thatthis research strategy is not without risk and that simplerdiscriminative networks can substantially outperform deeplearning networks [15] We grant however that it is conceiv-able that model predictions will improve when the presentlinear networks are replaced by deep learning networks whenpresented with the same input and output representationsused here (see Zhao et al [171] for a discussion of lossfunctions for deep learning targeting numeric instead of

Complexity 31

categorical output vectors) What is unclear is whethersuch networks will improve our understandingmdashthe currentlinear networks offer the analyst connection weights betweeninput and output representations that are straightforwardlylinguistically interpretable Where we think deep learningnetworks really will come into their own is bridging the gapbetween for instance the cochlea and retina on the one handand the heuristic representations (letter trigrams) that wehave used as a starting point for our functional theory of themappings between form and meaning

Network flexibility An important challenge for deeplearning networks designed to model human lexical process-ing is to keep the networks flexible and open to rapid updat-ing This openness to continuous learning is required notonly by findings on categorization [40 41 41 42] includingphonetic categorization [172 173] but is also visible in timeseries of reaction times Baayen and Hendrix [17] reportedimproved prediction of visual lexical decision reaction timesin the British Lexicon Project when a naive discriminativelearning network is updated from trial to trial simulating thewithin-experiment learning that goes on as subjects proceedthrough the experiment In natural speech the phenomenonof phonetic convergence between interlocutors [174] likewisebears witness to a lexicon that is constantly recalibratedin interaction with its environment (for the many ways inwhich interlocutorsrsquo speech aligns see [175]) Likewise therapidity with which listeners adapt to foreign accents [176]within a few minutes shows that lexical networks exhibitfast local optimization As deep learning networks typicallyrequire huge numbers of cycles through the training dataonce trained they are at risk of not having the requiredflexibility for fast local adaptation This risk is reduced for(incremental) wide learning as illustrated by the phoneticconvergence evidenced by the model of Arnold et al [18] forauditory comprehension (a very similar point was made byLevelt [177] in his critique of connectionist models in thenineties)

Open questions This initial study necessarily leavesmany questions unanswered Compounds and inflectedderived words have not been addressed and morphologicaloperations such as reduplication infixation and nonconcate-native morphology provide further testing grounds for thepresent approach We also need to address the processingof compounds with multiple constituents ubiquitous inlanguages as different as German Mandarin and EstonianFurthermore in actual utterances words undergo assimila-tion at their boundaries and hence a crucial test case forthe discriminative lexicon is to model the production andcomprehension of words in multi-word utterances For ourproduction model we have also completely ignored stresssyllabification and pitch which however likely has renderedthe production task more difficult than necessary

A challenge for discriminative morphology is the devel-opment of proper semantic matrices For languages withrich morphology such as Estonian current morphologicalparsers will identify case-inflected words as nominativesgenitives or partitives (among others) but these labelsfor inflectional forms do not correspond to the semanticfunctions encoded in these forms (see eg [178 179]) What

is required are computational tools that detect the appropriateinflectional lexomes for these words in the sentences orutterances in which they are embedded

A related problem specifically for the speech productionmodel is how to predict the strongly reduced word formsthat are rampant in conversational speech [111] Our auditorycomprehension network F119886 is confronted with reduced formsin training and the study of Arnold et al [18] indicates thatpresent identification accuracymay approximate human per-formance for single-word recognition Whereas for auditorycomprehension we are making progress what is required forspeech production is a much more detailed understanding ofthe circumstances under which speakers actually use reducedword forms The factors driving reduction may be in partcaptured by the semantic vectors but we anticipate thataspects of the unfolding discourse play an important role aswell which brings us back to the unresolved question of howto best model not isolated words but words in context

An important challenge for our theory which formalizesincremental implicit learning is how to address and modelthe wealth of explicit knowledge that speakers have abouttheir language We play with the way words sound in poetrywe are perplexed by wordsrsquo meanings and reinterpret themso that they make more sense (eg the folk etymology [180]reflected in modern English crawfish a crustacean and nota fish originally Middle English crevis compare ecrevissein French) we teach morphology in schools and we evenfind that making morphological relations explicit may bebeneficial for aphasic patients [181] The rich culture of worduse cannot but interact with the implicit system but howexactly this interaction unfolds andwhat its consequences areat both sides of the consciousunconscious divide is presentlyunclear

Although many questions remain our model is remark-ably successful in capturing many aspects of lexical andmorphological processing in both comprehension and pro-duction Since themodel builds on a combination of linguisti-callymotivated lsquosmartrsquo low-level representations for form andmeaning and large but otherwise straightforward andmathe-matically well-understood two-layer linear networks (to beclear networks without any hidden layers) the conclusionseems justified that algorithmically the ldquomental lexiconrdquomaybe much simpler than previously thought

Appendix

A Vectors Matrices and MatrixMultiplication

Figure 12 presents two data points 119886 and 119887 with coordinates(3 4) and (-2 -3) Arrows drawn from the origin to thesepoints are shown in blue with a solid and dashed linerespectively We refer to these points as vectors Vectors aredenoted with lower case letters in bold and we place the 119909coordinate of a point next to the 119910 coordinate

a = (3 4) b = (minus2 minus3) (A1)

32 Complexity

minus4 minus2 0 2 4

minus4

minus2

02

4

x

y

a

b

x

y

Figure 12 Points 119886 and 119887 can be transformed into points 119901 and 119902 bya linear transformation

We can represent a and b jointly by placing them next to eachother in a matrix For matrices we use capital letters in boldfont

A = ( 3 4minus2 minus3) (A2)

In addition to the points 119886 and 119887 Figure 12 also shows twoother datapoints 119909 and 119910 The matrix for these two points is

B = (minus5 24 minus1) (A3)

Let us assume that points 119886 and 119887 represent the forms oftwo words 1205961 and 1205962 and that the points 119909 and 119910 representthe meanings of these two words (below we discuss in detailhow exactly wordsrsquo forms and meanings can be representedas vectors of numbers) As morphology is the study of therelation between wordsrsquo forms and their meanings we areinterested in how to transform points 119886 and b into points119909 and 119910 and likewise in how to transform points 119909 and119910 back into points 119886 and 119887 The first transformation isrequired for comprehension and the second transformationfor production

We begin with the transformation for comprehensionFormally we have matrix A which we want to transforminto matrix B A way of mapping A onto B is by means ofa linear transformation using matrix multiplication For 2 times2 matrices matrix multiplication is defined as follows(1198861 11988621198871 1198872) (1199091 11990921199101 1199102) = (11988611199091 + 11988621199101 11988611199092 + 1198862119910211988711199091 + 11988721199101 11988711199092 + 11988721199102) (A4)

Such a mapping of A onto B is given by a matrix F

F = ( 1 2minus2 minus1) (A5)

and it is straightforward to verify that indeed( 3 4minus2 minus3) ( 1 2minus2 minus1) = (minus5 24 minus1) (A6)

Using matrix notation we can write

AF = B (A7)

Given A and B how do we obtain F The answeris straightforward but we need two further concepts theidentity matrix and the matrix inverse For multiplication ofnumbers the identity multiplication is multiplication with 1For matrices multiplication with the identity matrix I leavesthe locations of the points unchangedThe identitymatrix hasones on the main diagonal and zeros elsewhere It is easilyverified that indeed(1 00 1) (1199091 11990921199101 1199102) = (1199091 11990921199101 1199102) (1 00 1) = (1199091 11990921199101 1199102) (A8)

For numbers the inverse of multiplication with 119904 is dividingby 119904 (1119904 ) (119904119909) = 119909 (A9)

For matrices the inverse of a square matrix X is that matrixXminus1 such that their product is the identity matrix

Xminus1X = XXminus1 = I (A10)

For nonsquare matrices the inverse Yminus1 is defined such that

Yminus1 (YX) = (XY) 119884minus1 = X (A11)

We find the square matrix F that maps the square matrices Aonto B as follows

AF = B

Aminus1AF = Aminus1B

IF = Aminus1B

F = Aminus1B(A12)

Since for the present example A and its inverse happen to beidentical (Aminus1 = A) we obtain (A6)

Fmaps wordsrsquo form vectors onto wordsrsquo semantic vectorsLet us now consider the reverse and see how we can obtainwordsrsquo form vectors from wordsrsquo semantic vectors That isgiven B we want to find that matrix G which maps B ontoA ie

BG = A (A13)

Solving this equation proceeds exactly as above

BG = A

Bminus1BG = Bminus1A

IG = Bminus1A

G = Bminus1A(A14)

Complexity 33

The inverse of B is

Bminus1 = (13 2343 53) (A15)

and hence we have

(13 2343 53) ( 3 4minus2 minus3) = (minus13 minus2323 13 ) = G (A16)

The inverse of a matrix needs not exist A square matrixthat does not have an inverse is referred to as a singularmatrix Almost all matrices with which we work in theremainder of this study are singular For singular matricesan approximation of the inverse can be used such as theMoore-Penrose generalized inverse (In R the generalizedinverse is available in theMASS package function ginv Thenumeric library that is used by ginv is LAPACK availableat httpwwwnetliborglapack) In this study wedenote the generalized inverse of a matrix X by X1015840

The examples presented here are restricted to 2 times 2matrices but themathematics generalize tomatrices of largerdimensions When multiplying two matrices X and Y theonly constraint is that the number of columns ofX is the sameas the number of rows ofYThe resultingmatrix has the samenumber of rows as X and the same number of columns as Y

In the present study the rows of matrices represent wordsand columns the features of thesewords For instance the rowvectors of A can represent wordsrsquo form features and the rowvectors of and B the corresponding semantic features Thefirst row vector of A is mapped onto the first row vector ofB as follows (3 4) ( 1 2minus2 minus1) = (minus5 2) (A17)

A transformationmatrix such as F can be represented as atwo-layer networkThe network corresponding to F is shownin Figure 13 When the input vector (3 4) is presented to thenetwork the nodes 1198941 and 1198942 are set to 3 and 4 respectively Toobtain the activations 1199001 = minus5 and 1199002 = 2 for the output vector(minus5 2) we cumulate the evidence at the input weighted bythe connection strengths specified on the edges of the graphThus for 1199001 we obtain the value 3 times 1 + 4 times minus2 = minus5 and for1199002 we have 3 times 2 + 4 times minus1 = 2 In other words the networkmaps the input vector (3 4) onto the output vector (minus5 2)

An important property of linear maps is that they areproductive We can present a novel form vector say

s = (2 2) (A18)

to the network and it will map this vector onto a novelsemantic vector Using matrices this new semantic vector isstraightforwardly calculated(2 2) ( 1 2minus2 minus1) = (minus2 2) (A19)

1 2 minus2 minus1

i1 i2

o1 o2

semantic vectors come out here

form features go in here

connection weightsare the transformationmatrix

Figure 13 The linear network corresponding to the transformationmatrix F There are two input nodes 1198941 and 1198942 (in blue)and twooutput nodes 1199001 and 1199002 in red When the input vector (3 4) ispresented to the network the value at 1199001 is 3 times 1 + 4 times minus2 = minus5and that at 1199002 is 3 times 2 + 4 times minus1 = 2 The network maps the point(3 4) onto the point (minus5 2)B Graph-Based Triphone Sequencing

The hypothesis underlying graph-based triphone sequencingis that in the directed graph that has triphones as verticesand an edge between any two vertices that overlap (iesegments 2 and 3 of triphone 1 are identical to segments 1and 2 of triphone 2) the path from a wordrsquos initial triphone(the triphone starting with ) to a wordrsquos final triphone (thetriphone ending with a ) is the path receiving the bestsupport of all possible paths from the initial to the finaltriphone Unfortunately the directed graph containing allvertices and edges is too large tomake computations tractablein a reasonable amount of computation time The followingalgorithm is a heuristic algorithm that first collects potentiallyrelevant vertices and edges and then calculates all pathsstarting fromany initial vertex using theall simple pathsfunction from the igraph package [127] As a first step allvertices that are supported by the stem and whole wordwith network support exceeding 01 are selected To theresulting set those vertices are added that are supported bythe affix which is conditional on these vertices having anetwork support exceeding 095 A directed graph can nowbe constructed and for each initial vertex all paths startingat these vertices are calculated The subset of those pathsreaching a final vertex is selected and the support for eachpath is calculated As longer paths trivially will have greatersupport the support for a path is weighted for its lengthsimply by dividing the raw support by the path length Pathsare ranked by weighted path support and the path withmaximal support is selected as the acoustic image drivingarticulation

It is possible that no path from an initial vertex to a finalvertex is found due to critical boundary triphones not beinginstantiated in the data on which the model was trainedThis happens when the model is assessed on novel inflected

34 Complexity

nobo

dyso

met

hing

ever

ythi

ngno

thin

gev

eryo

nean

ythi

ngan

yone

som

eone

hers

him his

via

than I

exce

ptou

tside

thei

rsw

hoev

erth

ence

min

ein

side

amon

gst its

thei

rou

rm

yyo

uyo

ur her

she

your

selv

esw

hate

ver us we it

they he

them m

em

yself

him

self

hers

elfou

rselv

esyo

urse

lfits

elfth

emse

lves

abou

tw

ithou

tw

ithin

beyo

ndam

ong

betw

een

thro

ugho

utdu

ring

since by

from in of

with at fo

r tobe

low

abov

e up offdo

wn

upon on

agai

nst

tow

ard

tow

ards

arou

ndne

aral

ong

unde

rbe

hind

besid

eth

roug

hon

toin

toov

erac

ross

nobodysomethingeverythingnothingeveryoneanythinganyonesomeonehershimhisviathanIexceptoutsidetheirswhoeverthencemineinsideamongstitstheirourmyyouyourhersheyourselveswhateverusweittheyhethemmemyselfhimselfherselfourselvesyourselfitselfthemselvesaboutwithoutwithinbeyondamongbetweenthroughoutduringsincebyfrominofwithatfortobelowaboveupoffdownupononagainsttowardtowardsaroundnearalongunderbehindbesidethroughontointooveracross

minus01 0 01 02Value

Color Keyand Histogram

050

015

0025

00C

ount

Figure 14 Heatmap for the correlations of the semantic vectors of pronouns and prepositions Note that quantifier pronouns form a groupof their own that prepositions and the other pronouns form distinguisable groups and that within groups further subclusters are visible egforwe and us she and her and across and over All diagonal elements represent correlations equal to 1 (this is not brought out by the heatmapwhich color-codes diagonal elements with the color for the highest correlation of the range we specified 025)

and derived words under cross-validationTherefore verticesand edges are added to the graph for all nonfinal verticesthat are at the end of any of the paths starting at any initialtriphone Such novel vertices and edges are assigned zeronetwork support Typically paths from the initial vertex tothe final vertex can now be constructed and path support isevaluated as above

For an algorithm that allows triphone paths to includecycles which may be the case in languages with much richer

morphology than English see Baayen et al [85] and for codethe R packageWpmWithLdl described therein

C A Heatmap for Function Words(Pronouns and Prepositions)

The heatmap for the correlations of the semantic vectors ofpronouns and prepositions is presented in Figure 14

Complexity 35

Data Availability

With the exception of the data from theDistributed Little RedHen Lab all primary data are publicly available from the citedsources

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The authors are indebted to Geoff Hollis Jessie Nixon ChrisWestbury and LuisMienhardt for their constructive feedbackon earlier versions of this manuscript This research wassupported by an ERC advancedGrant (no 742545) to the firstauthor

References

[1] G Frege ldquoBegriffsschrift a formula language modeled uponthat of arithmetic for pure thoughtrdquo in From Frege to GodelA Source Book in Mathematical Logic 1879-1931 pp 1ndash82 1967

[2] B RussellAn Inquiry intoMeaning andTruth Allen andUnwinLondon UK 1942

[3] R Montague ldquoThe proper treatment of quantification in ordi-nary englishrdquo in Approaches to Natural Language pp 221ndash242Springer 1973

[4] N Chomsky Syntactic Structures vol 33 Mouton The Hague1957

[5] W J M Levelt Formal Grammars in Linguistics And Psycholin-guistics Volume 1 An Introduction to the Theory of FormalLanguages and Automata Volume 2 Applications in LinguisticTheory Volume 3 Psycholinguistic Applications John BenjaminsPublishing 2008

[6] D F Kleinschmidt and T Florian Jaeger ldquoRobust speechperception Recognize the familiar generalize to the similar andadapt to the novelrdquo Psychological Review vol 122 no 2 pp 148ndash203 2015

[7] L Breiman ldquoRandom forestsrdquo Machine Learning vol 45 no 1pp 5ndash32 2001

[8] C Strobl A-L Boulesteix A Zeileis and T Hothorn ldquoBiasin random forest variable importance measures illustrationssources and a solutionrdquo BMC Bioinformatics vol 8 2007

[9] A Hannun C Case J Casper et al ldquoDeep speechScaling up end-to-end speech recognitionrdquo 2014httpsarxivorgabs14125567

[10] Z Tuske P Golik R Schluter and H Ney ldquoAcoustic modelingwith deep neural networks using raw time signal for lvcsrrdquo inProceedings of the INTERSPEECH pp 890ndash894 2014

[11] J Schmidhuber ldquoDeep learning in neural networks anoverviewrdquo Neural Networks vol 61 pp 85ndash117 2015

[12] T Hannagan J C Ziegler S Dufau J Fagot and J GraingerldquoDeep learning of orthographic representations in baboonsrdquoPLoS ONE vol 9 no 1 2014

[13] J Grainger S Dufau M Montant J C Ziegler and J FagotldquoOrthographic processing in baboons (Papio papio)rdquo Sciencevol 336 no 6078 pp 245ndash248 2012

[14] D Scarf K Boy A U Reinert J Devine O Gunturkun andM Colombo ldquoOrthographic processing in pigeons (columba

livia)rdquo Proceedings of the National Academy of Sciences vol 113no 40 pp 11272ndash11276 2016

[15] M Linke F Broker M Ramscar and R H Baayen ldquoArebaboons learning ldquoorthographicrdquo representations Probablynotrdquo PLoS ONE vol 12 no 8 2017

[16] D H Hubel and T N Wiesel ldquoReceptive fields binocularinteraction and functional architecture in the catrsquos visualcortexrdquo The Journal of Physiology vol 160 pp 106ndash154 1962

[17] R H Baayen and P Hendrix ldquoTwo-layer networks non-linear separation and human learningrdquo in From Semantics toDialectometry Festschrift in honor of JohnNerbonne Tributes 32M Wieling M Kroon G Van Noord and G Bouma Eds pp13ndash22 College Publications 2017

[18] D Arnold F Tomaschek K Sering F Lopez and R H BaayenldquoWords from spontaneous conversational speech can be rec-ognized with human-like accuracy by an error-driven learningalgorithm that discriminates between meanings straight fromsmart acoustic features bypassing the phoneme as recognitionunitrdquo PLoS ONE vol 12 no 4 2017

[19] P Zwitserlood ldquoProcessing and representation of morphologi-cal complexity in native language comprehension and produc-tionrdquo in The Construction of Words G E Booij Ed pp 583ndash602 Springer 2018

[20] G Booij ldquoConstruction morphologyrdquo Language and LinguisticsCompass vol 4 no 7 pp 543ndash555 2010

[21] J P Blevins ldquoWord-based morphologyrdquo Journal of Linguisticsvol 42 no 3 pp 531ndash573 2006

[22] G Stump Inflectional Morphology A Theory of ParadigmStructure Cambridge University Press 2001

[23] R Beard ldquoOn the extent and nature of irregularity in thelexiconrdquo Lingua vol 42 no 4 pp 305ndash341 1977

[24] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1974

[25] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1991

[26] C Hockett ldquoThe origin of speechrdquo Scientific American vol 203pp 89ndash97 1960

[27] M Erelt Estonian Language M Erelt Ed Estonian AcademyPublishers Tallinn Estonia 2003

[28] P H Matthews Grammatical Theory in the United States fromBloomfield to Chomsky Cambridge University Press Cam-bridge 1993

[29] N Chomsky andMHalleTheSound Pattern of English Harperand Row NY USA 1968

[30] T Landauer and S Dumais ldquoA solution to Platorsquos problem Thelatent semantic analysis theory of acquisition induction andrepresentation of knowledgerdquo Psychological Review vol 104 no2 pp 211ndash240 1997

[31] W Weaver ldquoTranslationrdquo in Machine Translation of LanguagesFourteen Essays W N Locke and A D Booth Eds pp 15ndash23MIT Press Cambridge 1955

[32] J R Firth Selected Papers of J R Firth 1952-59 IndianaUniversity Press 1968

[33] K Lund and C Burgess ldquoProducing high-dimensional seman-tic spaces from lexical co-occurrencerdquo Behavior Research Meth-ods Instruments and Computers vol 28 no 2 pp 203ndash2081996

[34] C Shaoul and C Westbury ldquoExploring lexical co-occurrencespace using HiDExrdquo Behavior Research Methods vol 42 no 2pp 393ndash413 2010

36 Complexity

[35] T Mikolov I Sutskever K Chen G Corrado and J DeanldquoDistributed representations of words and phrases and theircompositionalityrdquo inAdvances in Neural Information ProcessingSystems pp 3111ndash3119 2013

[36] A Roelofs ldquoThe WEAVER model of word-form encoding inspeech productionrdquo Cognition vol 64 no 3 pp 249ndash284 1997

[37] W J M Levelt A Roelofs and A S Meyer ldquoA theory of lexicalaccess in speech productionrdquoBehavioral and Brain Sciences vol22 no 1 pp 1ndash38 1999

[38] M Taft ldquoInteractive-activation as a Framework for Under-standing Morphological Processingrdquo Language and CognitiveProcesses vol 9 no 3 pp 271ndash294 1994

[39] R Schreuder and R H Baayen ldquoModeling morphologicalprocessingrdquo inMorphological Aspects of Language Processing LB Feldman Ed pp 131ndash154 Lawrence ErlbaumHillsdale NewJersey USA 1995

[40] B C Love D L Medin and T M Gureckis ldquoSUSTAIN ANetwork Model of Category Learningrdquo Psychological Reviewvol 111 no 2 pp 309ndash332 2004

[41] C J Marsolek ldquoWhat antipriming reveals about primingrdquoTrends in Cognitive Sciences vol 12 no 5 pp 176ndash181 2008

[42] M Ramscar and R Port ldquoCategorization (without categories)rdquoin Handbook of Cognitive Linguistics E Dabrowska and DDivjak Eds pp 75ndash99 De Gruyter Berlin Germany 2015

[43] R A Rescorla and A R Wagner ldquoA theory of Pavlovianconditioning Variations in the effectiveness of reinforcementand nonreinforcementrdquo in Classical Conditioning II CurrentResearch and Theory A H Black and W F Prokasy Eds pp64ndash99 Appleton Century Crofts NY USA 1972

[44] B Widrow and M E Hoff ldquoAdaptive switching circuitsrdquo 1960WESCON Convention Record Part IV pp 96ndash104 1960

[45] R H Baayen R Schreuder and R Sproat ldquoMorphology inthe Mental Lexicon A Computational Model for Visual WordRecognitionrdquo in Lexicon Development for Speech and LanguageProcessing F van Eynde and D Gibbon Eds pp 267ndash291Kluwer Academic Publishers 2000

[46] M Taft ldquoA morphological-decomposition model of lexicalrepresentationrdquo Linguistics vol 26 no 4 pp 657ndash667 1988

[47] MWHarmandM S Seidenberg ldquoComputing themeanings ofwords in reading Cooperative division of labor between visualand phonological processesrdquo Psychological Review vol 111 no3 pp 662ndash720 2004

[48] P C Trimmer J M McNamara A Houston and J A RMarshall ldquoDoes natural selection favour the Rescorla-Wagnerrulerdquo Journal of Theoretical Biology vol 302 pp 39ndash52 2012

[49] R R Miller R C Barnet and N J Grahame ldquoAssessment ofthe Rescorla-Wagner modelrdquo Psychological Bulletin vol 117 no3 pp 363ndash386 1995

[50] L J Kamin ldquoPredictability surprise attention and condition-ingrdquo in Punishment and Aversive Behavior B A Campbell andRMChurch Eds pp 276ndash296Appleton-Century-CroftsNYUSA 1969

[51] R A Rescorla ldquoPavlovian conditioning Itrsquos not what you thinkit isrdquo American Psychologist vol 43 no 3 pp 151ndash160 1988

[52] M Ramscar D Yarlett M Dye K Denny and K ThorpeldquoThe Effects of Feature-Label-Order and Their Implications forSymbolic Learningrdquo Cognitive Science vol 34 no 6 pp 909ndash957 2010

[53] M Ramscar and D Yarlett ldquoLinguistic self-correction in theabsence of feedback A new approach to the logical problem oflanguage acquisitionrdquo Cognitive Science vol 31 no 6 pp 927ndash960 2007

[54] R H Baayen P Milin D F ETHurđevic P Hendrix and MMarelli ldquoAn Amorphous Model for Morphological Process-ing in Visual Comprehension Based on Naive DiscriminativeLearningrdquo Psychological Review vol 118 no 3 pp 438ndash4812011

[55] F M Del Prado Martın R Bertram T Haikio R Schreuderand R H Baayen ldquoMorphological family size in a morphologi-cally rich languageThe case of Finnish compared to Dutch andHebrewrdquo Journal of Experimental Psychology LearningMemoryand Cognition vol 30 pp 1271ndash1278 2004

[56] P Milin D F Durdevic and F M Del Prado Martın ldquoThesimultaneous effects of inflectional paradigms and classes onlexical recognition Evidence from Serbianrdquo Journal of Memoryand Language vol 60 no 1 pp 50ndash64 2009

[57] B K Bergen ldquoThe psychological reality of phonaesthemesrdquoLanguage vol 80 no 2 pp 290ndash311 2004

[58] P Milin L B Feldman M Ramscar P Hendrix and R HBaayen ldquoDiscrimination in lexical decisionrdquo PLoS ONE vol 12no 2 2017

[59] R H Baayen P Milin and M Ramscar ldquoFrequency in lexicalprocessingrdquo Aphasiology vol 30 no 11 pp 1174ndash1220 2016

[60] M Marelli and M Baroni ldquoAffixation in semantic space Mod-eling morpheme meanings with compositional distributionalsemanticsrdquo Psychological Review vol 122 no 3 pp 485ndash5152015

[61] T Sering PMilin andRH Baayen ldquoLanguage comprehensionas amultiple label classification problemrdquo StatisticaNeerlandicapp 1ndash15 2018

[62] R Kaye and RWilson Linear Algebra Oxford University Press1998

[63] S H Ivens and B L Koslin Demands for Reading LiteracyRequire New Accountability Methods Touchstone Applied Sci-ence Associates 1991

[64] T K Landauer P W Foltz and D Laham ldquoAn introduction tolatent semantic analysisrdquoDiscourse Processes vol 25 no 2-3 pp259ndash284 1998

[65] H Schmid ldquoImprovements in Part-of-Speech Tagging with anApplication to Germanrdquo in Proceedings of the ACL SIGDAT-Workshop pp 13ndash25 Dublin Ireland 1995

[66] R H Baayen R Piepenbrock and L Gulikers The CELEXLexical Database (CD-ROM) Linguistic Data ConsortiumUniversity of Pennsylvania Philadelphia PA 1995

[67] J Mitchell and M Lapata ldquoVector-based models of semanticcompositionrdquo in Proceedings of the ACL pp 236ndash244 2008

[68] A Lazaridou M Marelli R Zamparelli and M BaronildquoCompositionally derived representations of morphologicallycomplex words in distributional semanticsrdquo in Proceedings ofthe ACL (1) pp 1517ndash1526 2013

[69] R Cotterell H Schutze and J Eisner ldquoMorphological smooth-ing and extrapolation of word embeddingsrdquo in Proceedings ofthe 54th Annual Meeting of the Association for ComputationalLinguistics vol 1 pp 1651ndash1660 2016

[70] B D Zeller S Pado and J Snajder ldquoTowards semantic valida-tion of a derivational lexiconrdquo in Proceedings of the COLING2014 the 25th International Conference on Computational Lin-guistics Technical Papers pp 1728ndash1739 2014

[71] H Marchand The Categories and Types of Present-Day EnglishWord Formation A Synchronic-Diachronic Approach BeckrsquoscheVerlagsbuchhandlung Munchen Germany 1969

[72] L Bauer EnglishWord-Formation Cambridge University PressCambridge 1983

Complexity 37

[73] I PlagWord-Formation in English CambridgeUniversity PressCambridge 2003

[74] R Beard Lexeme-Morpheme Base Morphology A GeneralTheory of Inflection and Word Formation State University ofNew York Press Albany NY USA 1995

[75] K Geeraert J Newman and R H Baayen ldquoIdiom VariationExperimental Data and a Blueprint of a Computational ModelrdquoTopics in Cognitive Science vol 9 no 3 pp 653ndash669 2017

[76] C Shaoul S Bitschau N Schilling et al ldquondl2 Naive discrim-inative learning an implementation in R R packagerdquo 2015

[77] P Milin D Divjak and R H Baayen ldquoA learning perspectiveon individual differences in skilled reading Exploring andexploiting orthographic and semantic discrimination cuesrdquoJournal of Experimental Psychology Learning Memory andCognition vol 43 no 11 pp 1730ndash1751 2017

[78] M Ramscar P Hendrix C Shaoul P Milin and R H BaayenldquoNonlinear dynamics of lifelong learning the myth of cognitivedeclinerdquo Topics in Cognitive Science vol 6 pp 5ndash42 2014

[79] M Ramscar C C Sun P Hendrix and R H Baayen ldquoTheMis-measurement of Mind Life-Span Changes in Paired-Associate-Learning Scores Reflect the ldquoCostrdquo of Learning Not CognitiveDeclinerdquo Psychological Science vol 28 no 8 pp 1171ndash1179 2017

[80] G desRosiers and D Ivison ldquoPaired associate learning Form 1and form 2 of the Wechsler memory scalerdquo Archives of ClinicalNeuropsychology vol 3 no 1 pp 47ndash67 1988

[81] E Bruni N K Tran andM Baroni ldquoMultimodal distributionalsemanticsrdquo Journal of Artificial Intelligence Research vol 49 pp1ndash47 2014

[82] W N Venables and B D RipleyModern Applied Statistics withS-PLUS Springer NY USA 2002

[83] A B Warriner V Kuperman and M Brysbaert ldquoNorms ofvalence arousal and dominance for 13915 English lemmasrdquoBehavior Research Methods vol 45 no 4 pp 1191ndash1207 2013

[84] D Kastovsky ldquoProductivity in word formationrdquo Linguistics vol24 no 3 pp 585ndash600 1986

[85] R H Baayen Y Chuang and J P Blevins ldquoInflectionalmorphology with linear mappingsrdquoTheMental Lexicon vol 13no 2 pp 232ndash270 2018

[86] Burnage CELEX A Guide for Users Centre for Lexical Infor-mation Nijmegen The Netherlands 1988

[87] C McBride-Chang ldquoModels of Speech Perception and Phono-logical Processing in Readingrdquo Child Development vol 67 no4 pp 1836ndash1856 1996

[88] G C Van Orden H Kloos et al ldquoThe question of phonologyand readingrdquo inThe Science of Reading A Handbook pp 61ndash782005

[89] D Jared and K OrsquoDonnell ldquoSkilled adult readers activate themeanings of high-frequency words using phonology Evidencefrom eye trackingrdquoMemoryampCognition vol 45 no 2 pp 334ndash346 2017

[90] G S Dell ldquoA Spreading-Activation Theory of Retrieval inSentence Productionrdquo Psychological Review vol 93 no 3 pp283ndash321 1986

[91] A Marantz ldquoNo escape from morphemes in morphologicalprocessingrdquo Language and Cognitive Processes vol 28 no 7 pp905ndash916 2013

[92] M Bozic W D Marslen-Wilson E A Stamatakis M HDavis and L K Tyler ldquoDifferentiating morphology formand meaning Neural correlates of morphological complexityrdquoCognitive Neuroscience vol 19 no 9 pp 1464ndash1475 2007

[93] R F Port andA P Leary ldquoAgainst formal phonologyrdquoLanguagevol 81 no 4 pp 927ndash964 2005

[94] J P Blevins ldquoStems and paradigmsrdquo Language vol 79 no 4 pp737ndash767 2003

[95] R Forsyth and D Holmes ldquoFeature-finding for test classifica-tionrdquo Literary and Linguistic Computing vol 11 no 4 pp 163ndash174 1996

[96] H Pham and R H Baayen ldquoVietnamese compounds showan anti-frequency effect in visual lexical decisionrdquo LanguageCognition and Neuroscience vol 30 no 9 pp 1077ndash1095 2015

[97] L Cohen and S Dehaene ldquoVentral and dorsal contributions toword readingrdquo inTheCognitive Neurosciences M S GazzanigaEd pp 789ndash804 Massachusetts Institute of Technology 2009

[98] E Keuleers P Lacey K Rastle and M Brysbaert ldquoThe BritishLexicon Project Lexical decision data for 28730 monosyllabicand disyllabic English wordsrdquo Behavior Research Methods vol44 no 1 pp 287ndash304 2012

[99] R K Wagner J K Torgesen and C A Rashotte ldquoDevel-opment of reading-related phonological processing abilitiesnew evidence of bidirectional causality from a latent variablelongitudinal studyrdquoDevelopmental Psychology vol 30 no 1 pp73ndash87 1994

[100] K F EWong andH-C Chen ldquoOrthographic and phonologicalprocessing in reading Chinese text Evidence from eye fixa-tionsrdquo Language and Cognitive Processes vol 14 no 5-6 pp461ndash480 1999

[101] R L Newman D Jared and C A Haigh ldquoDoes phonologyplay a role when skilled readers read high-frequency wordsEvidence from ERPsrdquo Language and Cognitive Processes vol 27no 9 pp 1361ndash1384 2012

[102] D Jared J Ashby S J Agauas and B A Levy ldquoPhonologicalactivation of word meanings in grade 5 readersrdquo Journal ofExperimental Psychology LearningMemory andCognition vol42 no 4 pp 524ndash541 2016

[103] T Bitan A Kaftory A Meiri-Leib Z Eviatar and O PelegldquoPhonological ambiguity modulates resolution of semanticambiguity during reading An fMRI study of Hebrewrdquo Neu-ropsychology vol 31 no 7 pp 759ndash777 2017

[104] D Jared and S Bainbridge ldquoReading homophone puns Evi-dence from eye trackingrdquo Canadian Journal of ExperimentalPsychology vol 71 no 1 pp 2ndash13 2017

[105] S Amenta M Marelli and S Sulpizio ldquoFrom sound tomeaning Phonology-to-Semantics mapping in visual wordrecognitionrdquo Psychonomic Bulletin amp Review vol 24 no 3 pp887ndash893 2017

[106] M Perrone-Bertolotti J Kujala J R Vidal et al ldquoHow silent issilent reading intracerebral evidence for top-down activationof temporal voice areas during readingrdquo The Journal of Neuro-science vol 32 no 49 pp 17554ndash17562 2012

[107] B Yao P Belin andC Scheepers ldquoSilent reading of direct versusindirect speech activates voice-selective areas in the auditorycortexrdquo Journal of Cognitive Neuroscience vol 23 no 10 pp3146ndash3152 2011

[108] M Coltheart B Curtis P Atkins and M Haller ldquoMod-els of Reading Aloud Dual-Route and Parallel-Distributed-Processing Approachesrdquo Psychological Review vol 100 no 4pp 589ndash608 1993

[109] M Coltheart ldquoModeling reading The dual-route approachrdquo inThe Science of Reading A Handbook pp 6ndash23 2005

38 Complexity

[110] M Pitt K Johnson E Hume S Kiesling and W RaymondldquoThe Buckeye corpus of conversational speech Labeling con-ventions and a test of transcriber reliabilityrdquo Speech Communi-cation vol 45 no 1 pp 89ndash95 2005

[111] K Johnson ldquoMassive reduction in conversational AmericanEnglishrdquo in Proceedings of the Spontaneous Speech Data andAnalysis Proceedings of the 1st Session of the 10th InternationalSymposium pp 29ndash54 The National International Institute forJapanese Language Tokyo Japan 2004

[112] C Phillips ldquoLevels of representation in the electrophysiology ofspeech perceptionrdquo Cognitive Science vol 25 no 5 pp 711ndash7312001

[113] R L Diehl A J Lotto and L L Holt ldquoSpeech perceptionrdquoAnnual Review of Psychology vol 55 pp 149ndash179 2004

[114] D Norris and J M McQueen ldquoShortlist B a Bayesian model ofcontinuous speech recognitionrdquo Psychological Review vol 115no 2 pp 357ndash395 2008

[115] S Hawkins ldquoRoles and representations of systematic finephonetic detail in speech understandingrdquo Journal of Phoneticsvol 31 no 3-4 pp 373ndash405 2003

[116] CCucchiarini andH Strik ldquoAutomatic Phonetic TranscriptionAn overviewrdquo in Proceedings of the 15th ICPhS pp 347ndash350Barcelona Spain 2003

[117] K JohnsonTheAuditoryPerceptual Basis for Speech Segmenta-tion Ohio State University Working Papers in Linguistics 1997

[118] H Fletcher ldquoAuditory patternsrdquo Reviews of Modern Physics vol12 no 1 pp 47ndash65 1940

[119] D Arnold ldquoAcousticndlcoder Coding sound files for use withndl R package version 101rdquo 2017

[120] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2016

[121] J Pickett and I Pollack ldquoIntelligibility of excerpts from fluentspeech Effects of rate of utterance and duration of excerptrdquoLanguage and Speech vol 6 pp 151ndash164 1963

[122] L Shockey ldquoPerception of reduced forms by non-native speak-ers of Englishrdquo in Sound Patterns of Spontaneous Speech DDuez Ed pp 97ndash100 ESCA Aix 1998

[123] M Ernestus H Baayen and R Schreuder ldquoThe recognition ofreduced word formsrdquo Brain and Language vol 81 no 1-3 pp162ndash173 2002

[124] R H Baayen C Shaoul J Willits and M Ramscar ldquoCom-prehension without segmentation a proof of concept withnaive discriminative learningrdquo Language Cognition and Neu-roscience vol 31 no 1 pp 106ndash128 2016

[125] C P Browman and L Goldstein ldquoArticulatory phonology Anoverviewrdquo Phonetica vol 49 no 3-4 pp 155ndash180 1992

[126] J A Tourville and F H Guenther ldquoThe DIVA model A neuraltheory of speech acquisition and productionrdquo Language andCognitive Processes vol 26 no 7 pp 952ndash981 2011

[127] G Csardi and T Nepusz ldquoThe igraph software package forcomplex network researchrdquo InterJournal Complex Systems vol1695 no 5 pp 1ndash9 2006

[128] M Ernestus and R H Baayen ldquoPredicting the unpredictableInterpreting neutralized segments in Dutchrdquo Language vol 79no 1 pp 5ndash38 2003

[129] R Weingarten G Nottbusch and U Will ldquoMorphemes sylla-bles and graphemes in written word productionrdquo inMultidisci-plinary Approaches to Speech Production T Pechmann and CHabel Eds pp 529ndash572 Mouton de Gruyter Berlin Germany2004

[130] R Bertram F E Toslashnnessen S Stromqvist J Hyona andP Niemi ldquoCascaded processing in written compound wordproductionrdquo Frontiers in Human Neuroscience vol 9 pp 1ndash102015

[131] T Cho ldquoEffects of morpheme boundaries on intergesturaltiming Evidence from Koreanrdquo Phonetica vol 58 no 3 pp129ndash162 2001

[132] S N Wood Generalized Additive Models Chapman ampHallCRC NY USA 2017

[133] M Seidenberg ldquoSublexical structures in visual word recogni-tion Access units or orthographic redundancyrdquo in Attentionand Performance XII M Coltheart Ed pp 245ndash264 LawrenceErlbaum Associates Hove 1987

[134] J M McQueen ldquoSegmentation of continuous speech usingphonotacticsrdquo Journal of Memory and Language vol 39 no 1pp 21ndash46 1998

[135] J B Hay Causes and Consequences of Word Structure Rout-ledge NY USA London UK 2003

[136] J B Hay ldquoFrom speech perception to morphology Affix-ordering revisitedrdquo Language vol 78 pp 527ndash555 2002

[137] J B Hay and R H Baayen ldquoPhonotactics parsing and produc-tivityrdquo Italian Journal of Linguistics vol 15 no 1 pp 99ndash1302003

[138] J Hay J Pierrehumbert and M Beckman ldquoSpeech perceptionwell-formedness and the statistics of the lexiconrdquo Papers inlaboratory phonology VI pp 58ndash74 2004

[139] M Ferro C Marzi and V Pirrelli ldquoA self-organizing modelof word storage and processing implications for morphologylearningrdquo Lingue e Linguaggio vol 10 no 2 pp 209ndash226 2011

[140] F ChersiM Ferro G Pezzulo andV Pirrelli ldquoTopological self-organization and prediction learning support both action andlexical chains in the brainrdquo Topics in Cognitive Science vol 6no 3 pp 476ndash491 2014

[141] V Pirrelli M Ferro and C Marzi ldquoComputational complexityof abstractive morphologyrdquo in Understanding and MeasuringMorphological Complexity M Bearman D Brown and G GCorbett Eds pp 141ndash166 Oxford University Press 2015

[142] C Marzi M Ferro and V Pirrelli ldquoIs inflectional irregularitydysfunctional to human processingrdquo in Abstract Booklet TheMental Lexicon 2018 V Kuperman Ed University of AlbertaEdmonton 2018

[143] E Keuleers M Stevens P Mandera and M Brysbaert ldquoWordknowledge in the crowd Measuring vocabulary size and wordprevalence in a massive online experimentrdquo The QuarterlyJournal of Experimental Psychology vol 68 no 8 pp 1665ndash16922015

[144] M Ernestus Voice Assimilation And Segment Reduction inCasual Dutch A Corpus-Based Study of the Phonology-PhoneticsInterface LOT Utrecht the Netherlands 2000

[145] R Kemps M Ernestus R Schreuder and H Baayen ldquoProcess-ing reducedword formsThe suffix restoration effectrdquoBrain andLanguage vol 90 no 1-3 pp 117ndash127 2004

[146] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) vol 1 pp 886ndash893 June 2005

[147] G Hickok ldquoThe architecture of speech production and the roleof the phoneme in speech processingrdquo Language Cognition andNeuroscience vol 29 no 1 pp 2ndash20 2014

[148] B V Tucker M Sims and R H Baayen Opposing Forceson Acoustic Duration Manuscript University of Alberta andUniversity of Tubingen 2018

Complexity 39

[149] I Plag J Homann and G Kunter ldquoHomophony and mor-phology The acoustics of word-final S in Englishrdquo Journal ofLinguistics vol 53 no 1 pp 181ndash216 2017

[150] F Tomaschek I Plag M Ernestus and R H BaayenModeling the Duration of Word-Final s in English withNaive Discriminative Learning Manuscript University ofSiegenTubingenNijmegen 2018

[151] R E Kalman ldquoA new approach to linear filtering and predictionproblemsrdquo Journal of Fluids Engineering vol 82 no 1 pp 35ndash451960

[152] B Russell ldquoOn denotingrdquo Mind vol 14 no 56 pp 479ndash4931905

[153] N Hornstein Logical Form FromGB toMinimalism Blackwell1995

[154] R Jackendoff Semantic StructuresMITPress Cambridge 1990[155] J J McCarthy ldquoA prosodic theory of non-concatenative mor-

phologyrdquo Linguistic Inquiry vol 12 pp 373ndash418 1981[156] A Ussishkin ldquoA Fixed Prosodic Theory of Nonconcatenative

Templaticmorphologyrdquo Natural Language amp Linguistic Theoryvol 23 no 1 pp 169ndash218 2005

[157] A Ussishkin ldquoAffix-favored contrast inequity and psycholin-guistic grounding for non-concatenative morphologyrdquo Mor-phology vol 16 no 1 pp 107ndash125 2006

[158] RW Young andWMorganTheNavajo Language A Grammarand Colloquial Dictionary University of New Mexico Press1980

[159] S C Levinson and A Majid ldquoThe island of time Yelı Dnye thelanguage of Rossel islandrdquo Frontiers in psychology vol 4 2013

[160] S C Levinson ldquoThe language of space in Yelı Dnyerdquo inGrammars of Space Explorations in Cognitive Diversity pp 157ndash203 Cambridge University Press 2006

[161] E Shafaei-Bajestan and R H Baayen ldquoWide Learning forAuditory Comprehensionrdquo in Proceedings of the Interspeech2018 2018

[162] J Hay and K Drager ldquoStuffed toys and speech perceptionrdquoLinguistics vol 48 no 4 pp 865ndash892 2010

[163] V Henri and C Henri ldquoOn our earliest recollections ofchildhoodrdquo Psychological Review vol 2 pp 215-216 1895

[164] S Freud ldquoChildhood and concealing memoriesrdquo in The BasicWritings of Sigmund Freud The Modern Library NY USA19051953

[165] P J Bauer and M Larkina ldquoThe onset of childhood amnesiain childhood A prospective investigation of the course anddeterminants of forgetting of early-life eventsrdquoMemory vol 22no 8 pp 907ndash924 2014

[166] M S Seidenberg and L M Gonnerman ldquoExplaining deriva-tional morphology as the convergence of codesrdquo Trends inCognitive Sciences vol 4 no 9 pp 353ndash361 2000

[167] M Coltheart K Rastle C Perry R Langdon and J ZieglerldquoDRC a dual route cascaded model of visual word recognitionand reading aloudrdquoPsychological Review vol 108 no 1 pp 204ndash256 2001

[168] M V Butz and E F Kutter How the Mind Comes into BeingIntroducing Cognitive Science from a Functional and Computa-tional Perspective Oxford University Press 2016

[169] M W Harm and M S Seidenberg ldquoPhonology readingacquisition and dyslexia Insights from connectionist modelsrdquoPsychological Review vol 106 no 3 pp 491ndash528 1999

[170] L M Gonnerman M S Seidenberg and E S AndersenldquoGraded semantic and phonological similarity effects in prim-ing Evidence for a distributed connectionist approach to

morphologyrdquo Journal of Experimental Psychology General vol136 no 2 pp 323ndash345 2007

[171] H Zhao O Gallo I Frosio and J Kautz ldquoLoss Functions forImage Restoration With Neural Networksrdquo IEEE Transactionson Computational Imaging vol 3 no 1 pp 47ndash57 2017

[172] C Clarke and P Luce ldquoPerceptual adaptation to speakercharacteristics Vot boundaries in stop voicing categorizationrdquoin Proceedings of the ISCA Workshop on Plasticity in SpeechPerception 2005

[173] DNorris JMMcQueen andACutler ldquoPerceptual learning inspeechrdquo Cognitive Psychology vol 47 no 2 pp 204ndash238 2003

[174] A Schweitzer and N Lewandowski ldquoSocial factors in conver-gence of f1 and f2 in spontaneous speechrdquo in Proceedings ofthe 10th International Seminar on Speech Production CologneGermany 2014

[175] M J Pickering and S Garrod ldquoToward a mechanistic psychol-ogy of dialoguerdquo Behavioral and Brain Sciences vol 27 no 2 pp169ndash190 2004

[176] C M Clarke and M F Garrett ldquoRapid adaptation to foreign-accented Englishrdquo The Journal of the Acoustical Society ofAmerica vol 116 no 6 pp 3647ndash3658 2004

[177] W J Levelt ldquoDie konnektionistische moderdquo Sprache UndKognition vol 10 no 2 pp 61ndash72 1991

[178] A Kostic ldquoInformational load constraints on processinginflected morphologyrdquo in Morphological Aspects of LanguageProcessing L B Feldman Ed Lawrence Erlbaum Inc Publish-ers New Jersey USA 1995

[179] J P BlevinsWord andParadigmMorphology OxfordUniversityPress 2016

[180] E Forstemann Uber Deutsche volksetymologie Zeitschrift furvergleichende Sprachforschung auf dem Gebiete des DeutschenGriechischen und Lateinischen 1852

[181] K Nault Morphological Therapy Protocal [Ph D thesis] Uni-versity of Alberta Edmonton 2010

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 25: The Discriminative Lexicon: A Unified Computational Model ...downloads.hindawi.com/journals/complexity/2019/4895891.pdf · as developed within computational linguistics and com-puter

Complexity 25

Table 7 Statistics for the partial effects of a generalized additive mixed model fitted to the relative duration of edge segments in the Buckeyecorpus The trend for log frequency is positive accelerating TPRS thin plate regression spline

A parametric coefficients Estimate Std Error t-value p-valueIntercept -19165 00592 -323908 lt 00001Log LDL Edge Weight -01358 00400 -33960 00007Phrase Rate 00047 00021 22449 00248Log Ncount 01114 00189 58837 lt 00001B smooth terms edf Refdf F-value p-valueTPRS log Frequency 19007 19050 75495 00004random intercepts word 11498839 13230000 314919 lt 00001random intercepts speaker 302995 390000 345579 lt 00001

maps (TSOMs [139ndash141]) TSOMs also use error-drivenlearning and in recent work attention is also drawn toweak edges in wordsrsquo paths in the TSOM and the conse-quences thereof for lexical processing [142] We are not usingTSOMs however one reason being that in our experiencethey do not scale up well to realistically sized lexiconsA second reason is that we are pursuing the hypothesisthat form serves meaning and that self-organization ofform by itself is not necessary This hypothesis is likelytoo strong especially as we do not provide a rationale forthe trigram and triphone units that we use to representaspects of form It is worth noting that spatial organizationof form features is not restricted to TSOMs Although inour approach the spatial organization of triphone features isleft unspecified such an organization can be easily enforced(see [85] for further details) by using an algorithm suchas graphopt which self-organizes the vertices of a graphinto a two-dimensional plane Figure 9 was obtained withthis algorithm (Graphopt was developed for the layout oflarge graphs httpwwwschmuhlorggraphopt andis implemented in the igraph package Graphopt uses basicprinciples of physics to iteratively determine an optimallayout Each node in the graph is given both mass and anelectric charge and edges between nodes are modeled asspringsThis sets up a system inwhich there are attracting andrepelling forces between the vertices of the graph and thisphysical system is simulated until it reaches an equilibrium)

54 Discussion Our production network reconstructsknown wordsrsquo forms with a high accuracy 999 formonolexomic words 92 for inflected words and 99 forderived words For novel complex words accuracy under10-fold cross-validation was 62 for inflected words and 75for derived words

The drop in performance for novel forms is perhapsunsurprising given that speakers understand many morewords than they themselves produce even though they hearnovel forms on a fairly regular basis as they proceed throughlife [79 143]

However we also encountered several technical problemsthat are not due to the algorithm but to the representationsthat the algorithm has had to work with First it is surprisingthat accuracy is as high as it is given that the semanticvectors are constructed from a small corpus with a very

simple discriminative algorithm Second we encounteredinconsistencies in the phonological forms retrieved from thecelex database inconsistencies that in part are due to theuse of discrete triphones Third many cases where the modelpredictions do not match the targeted triphone sequence thetargeted forms have a minor irregularity (eg resound with[z] instead of [s]) Fourth several of the typical errors that themodel makes are known kinds of speech errors or reducedforms that one might encounter in engaged conversationalspeech

It is noteworthy that themodel is almost completely data-driven The triphones are derived from celex and otherthan somemanual corrections for inconsistencies are derivedautomatically from wordsrsquo phone sequences The semanticvectors are based on the tasa corpus and were not in any wayoptimized for the production model Given the triphone andsemantic vectors the transformation matrix is completelydetermined No by-hand engineering of rules and exceptionsis required nor is it necessary to specify with hand-codedlinks what the first segment of a word is what its secondsegment is etc as in the weaver model [37] It is only in theheuristic graph algorithm that three thresholds are requiredin order to avoid that graphs become too large to remaincomputationally tractable

The speech production system that emerges from thisapproach comprises first of all an excellent memory forforms that have been encountered before Importantly thismemory is not a static memory but a dynamic one It is not arepository of stored forms but it reconstructs the forms fromthe semantics it is requested to encode For regular unseenforms however a second network is required that projects aregularized semantic space (obtained by accumulation of thesemantic vectors of content and inflectional or derivationalfunctions) onto the triphone output space Importantly itappears that no separate transformations or rules are requiredfor individual inflectional or derivational functions

Jointly the two networks both of which can also betrained incrementally using the learning rule ofWidrow-Hoff[44] define a dual route model attempts to build a singleintegrated network were not successfulThe semantic vectorsof derived words are too idiosyncratic to allow generalizationfor novel forms It is the property of the semantic vectorsof derived lexomes described in Section 323 namely thatthey are close to but not inside the cloud of their content

26 Complexity

lexomes which makes them productive Novel forms do notpartake in the idiosyncracies of lexicalized existing wordsbut instead remain bound to base and derivational lexomesIt is exactly this property that makes them pronounceableOnce produced a novel form will then gravitate as experi-ence accumulates towards the cloud of semantic vectors ofits morphological category meanwhile also developing itsown idiosyncracies in pronunciation including sometimeshighly reduced pronunciation variants (eg tyk for Dutchnatuurlijk ([natyrlk]) [111 144 145] It is perhaps possibleto merge the two routes into one but for this much morefine-grained semantic vectors are required that incorporateinformation about discourse and the speakerrsquos stance withrespect to the adressee (see eg [115] for detailed discussionof such factors)

6 Bringing in Time

Thus far we have not considered the role of time The audiosignal comes in over time longerwords are typically readwithmore than one fixation and likewise articulation is a temporalprocess In this section we briefly outline how time can bebrought into the model We do so by discussing the readingof proper names

Proper names (morphologically compounds) pose achallenge to compositional theories as the people referredto by names such as Richard Dawkins Richard NixonRichardThompson and SandraThompson are not obviouslysemantic composites of each other We therefore assignlexomes to names irrespective of whether the individu-als referred to are real or fictive alive or dead Further-more we assume that when the personal name and familyname receive their own fixations both names are under-stood as pertaining to the same named entity which istherefore coupled with its own unique lexome Thus forthe example sentences in Table 8 the letter trigrams ofJohn are paired with the lexome JohnClark and like-wise the letter trigrams of Clark are paired with this lex-ome

By way of illustration we obtained thematrix of semanticvectors training on 100 randomly ordered tokens of thesentences of Table 8 We then constructed a trigram cuematrix C specifying for each word (John Clark wrote )which trigrams it contains In parallel a matrix L specify-ing for each word its corresponding lexomes (JohnClarkJohnClark write ) was set up We then calculatedthe matrix F by solving CF = L and used F to cal-culate estimated (predicted) semantic vectors L Figure 10presents the correlations of the estimated semantic vectorsfor the word forms John John Welsch Janet and Clarkwith the targeted semantic vectors for the named entitiesJohnClark JohnWelsch JohnWiggam AnneHastieand JaneClark For the sequentially read words John andWelsch the semantic vector generated by John and thatgenerated by Welsch were summed to obtain the integratedsemantic vector for the composite name

Figure 10 upper left panel illustrates that upon readingthe personal name John there is considerable uncertainty

about which named entity is at issue When subsequentlythe family name is read (upper right panel) uncertainty isreduced and John Welsch now receives full support whereasother named entities have correlations close to zero or evennegative correlations As in the small world of the presenttoy example Janet is a unique personal name there is nouncertainty about what named entity is at issue when Janetis read For the family name Clark on its own by contrastJohn Clark and Janet Clark are both viable options

For the present example we assumed that every ortho-graphic word received one fixation However depending onwhether the eye lands far enough into the word namessuch as John Clark and compounds such as graph theorycan also be read with a single fixation For single-fixationreading learning events should comprise the joint trigrams ofthe two orthographic words which are then simultaneouslycalibrated against the targeted lexome (JohnClark graph-theory)

Obviously the true complexities of reading such asparafoveal preview are not captured by this example Never-theless as a rough first approximation this example showsa way forward for integrating time into the discriminativelexicon

7 General Discussion

We have presented a unified discrimination-driven modelfor visual and auditory word comprehension and wordproduction the discriminative lexicon The layout of thismodel is visualized in Figure 11 Input (cochlea and retina)and output (typing and speaking) systems are presented inlight gray the representations of themodel are shown in darkgray For auditory comprehension we make use of frequencyband summary (FBS) features low-level cues that we deriveautomatically from the speech signal The FBS features serveas input to the auditory network F119886 that maps vectors of FBSfeatures onto the semantic vectors of S For reading lettertrigrams represent the visual input at a level of granularitythat is optimal for a functional model of the lexicon (see [97]for a discussion of lower-level visual features) Trigrams arerelatively high-level features compared to the FBS featuresFor features implementing visual input at a much lower levelof visual granularity using histograms of oriented gradients[146] see Linke et al [15] Letter trigrams are mapped by thevisual network F119900 onto S

For reading we also implemented a network heredenoted by K119886 that maps trigrams onto auditory targets(auditory verbal images) represented by triphones Theseauditory triphones in turn are mapped by network H119886onto the semantic vectors S This network is motivated notonly by our observation that for predicting visual lexicaldecision latencies triphones outperform trigrams by a widemargin Network H119886 is also part of the control loop forspeech production see Hickok [147] for discussion of thiscontrol loop and the necessity of auditory targets in speechproduction Furthermore the acoustic durations with whichEnglish speakers realize the stem vowels of English verbs[148] as well as the duration of word final s in English [149]

Complexity 27

Table 8 Example sentences for the reading of proper names To keep the example simple inflectional lexomes are not taken into account

John Clark wrote great books about antsJohn Clark published a great book about dangerous antsJohn Welsch makes great photographs of airplanesJohn Welsch has a collection of great photographs of airplanes landingJohn Wiggam will build great harpsichordsJohn Wiggam will be a great expert in tuning harpsichordsAnne Hastie has taught statistics to great second year studentsAnne Hastie has taught probability to great first year studentsJanet Clark teaches graph theory to second year studentsJohnClark write great book about antJohnClark publish a great book about dangerous antJohnWelsch make great photograph of airplaneJohnWelsch have a collection of great photograph airplane landingJohnWiggam future build great harpsichordJohnWiggam future be a great expert in tuning harpsichordAnneHastie have teach statistics to great second year studentAnneHastie have teach probability to great first year studentJanetClark teach graphtheory to second year student

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

John

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

John Welsch

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

Janet

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

Clark

Pear

son

corr

elat

ion

minus0

50

00

51

0

Figure 10 Correlations of estimated semantic vectors for John JohnWelsch Janet andClarkwith the targeted semantic vectors of JohnClarkJohnWelsch JohnWiggam AnneHastie and JaneClark

can be predicted from ndl networks using auditory targets todiscriminate between lexomes [150]

Speech production is modeled by network G119886 whichmaps semantic vectors onto auditory targets which in turnserve as input for articulation Although current modelsof speech production assemble articulatory targets fromphonemes (see eg [126 147]) a possibility that is certainlycompatible with our model when phones are recontextual-ized as triphones we think that it is worthwhile investigatingwhether a network mapping semantic vectors onto vectors ofarticulatory parameters that in turn drive the articulators canbe made to work especially since many reduced forms arenot straightforwardly derivable from the phone sequences oftheir lsquocanonicalrsquo forms [111 144]

We have shown that the networks of our model havereasonable to high production and recognition accuraciesand also generate a wealth of well-supported predictions forlexical processing Central to all networks is the hypothesisthat the relation between form and meaning can be modeleddiscriminatively with large but otherwise surprisingly simplelinear networks the underlyingmathematics of which (linearalgebra) is well-understood Given the present results it isclear that the potential of linear algebra for understandingthe lexicon and lexical processing has thus far been severelyunderestimated (the model as outlined in Figure 11 is amodular one for reasons of computational convenience andinterpretabilityThe different networks probably interact (seefor some evidence [47]) One such interaction is explored

28 Complexity

typing speaking

orthographic targets

trigrams

auditory targets

triphones

semantic vectors

TASA

auditory cues

FBS features

orthographic cues

trigrams

cochlea retina

To Ta

Ha

GoGa

KaS

Fa Fo

CoCa

Figure 11 Overview of the discriminative lexicon Input and output systems are presented in light gray the representations of the model areshown in dark gray Each of these representations is a matrix with its own set of features Arcs are labeled with the discriminative networks(transformationmatrices) thatmapmatrices onto each otherThe arc from semantic vectors to orthographic vectors is in gray as thismappingis not explored in the present study

in Baayen et al [85] In their production model for Latininflection feedback from the projected triphone paths backto the semantics (synthesis by analysis) is allowed so thatof otherwise equally well-supported paths the path that bestexpresses the targeted semantics is selected For informationflows between systems in speech production see Hickok[147]) In what follows we touch upon a series of issues thatare relevant for evaluating our model

Incremental learning In the present study we esti-mated the weights on the connections of these networkswith matrix operations but importantly these weights canalso be estimated incrementally using the learning rule ofWidrow and Hoff [44] further improvements in accuracyare expected when using the Kalman filter [151] As allmatrices can be updated incrementally (and this holds aswell for the matrix with semantic vectors which are alsotime-variant) and in theory should be updated incrementally

whenever information about the order of learning events isavailable the present theory has potential for studying lexicalacquisition and the continuing development of the lexiconover the lifetime [79]

Morphology without compositional operations Wehave shown that our networks are predictive for a rangeof experimental findings Important from the perspective ofdiscriminative linguistics is that there are no compositionaloperations in the sense of Frege and Russell [1 2 152] Thework on compositionality in logic has deeply influencedformal linguistics (eg [3 153]) and has led to the beliefthat the ldquoarchitecture of the language facultyrdquo is groundedin a homomorphism between a calculus (or algebra) ofsyntactic representations and a calculus (or algebra) based onsemantic primitives Within this tradition compositionalityarises when rules combining representations of form arematched with rules combining representations of meaning

Complexity 29

The approach of Marelli and Baroni [60] who derive thesemantic vector of a complex word from the semantic vectorof its base and a dedicated affix-specific mapping from thepertinent base words to the corresponding derived wordsis in the spirit of compositionality in that a change in form(through affixation) goes hand in hand with a change inmeaning (modeled with a linear mapping in semantic space)

The perspective on morphological processing developedin the present study breaks with this tradition Althoughsemantic vectors for inflected words are obtained by summa-tion of the semantic vectors of the base word and those ofits inflectional functions (see [85] for modeling of the richverbal paradigms of Latin) the form vectors correspondingto the resulting semantic vectors are not obtained by thesummation of vectors for chunks of form nor by any otheroperations on forms Attempts to align chunks of wordsrsquo formwith their semantic structures mdash typically represented bygraphs constructed over semantic primitives (see eg [154])mdash are destined to end in quagmires of arbitrary decisionsabout the number of inflectional classes required (Estonian30 or 300) what the chunks should look like (does Germanhave a morpheme d in its articles and demonstratives derdie das diese ) whether Semitic morphology requiresoperations on form on one or two tiers [155ndash157] andhow many affixal position slots are required for Athebascanlanguages such asNavajo [158] A language that is particularlypainful for compositional theories is Yelı Dnye an Indo-Pacific language spoken on Rossel island (in the south-east ofPapuaNewGuinea)The language has a substantial inventoryof over a 1000 function words used for verb agreementTypically these words are monosyllables that simultaneouslyspecify values for negation tense aspect person and numberof subject deixis evidentiality associated motion and coun-terfactuality Furthermore verbs typically have no less thaneight different stems the use of which depends on particularconstellations of inflectional functions [159 160] Yelı Dnyethus provides a striking counterexample to the hypothesisthat a calculus of meaning would be paralleled by a calculusof form

Whereas the theory of linear discriminative learningradically rejects the idea that morphological processing isgrounded in compositionality as defined in mainstream for-mal linguistic theory it does allow for the conjoint expressionin form of lexical meanings and inflectional functions andfor forms to map onto such conjoint semantic vectors Butjust asmultiple inflectional functions can be expressed jointlyin a single word multiple words can map onto nonconjointsemantic vectors as was illustrated above for named entityrecognition John Wiggam can be understood to refer to avery specific person without this person being a composi-tional function of the proper name John and the surnameWiggam In both these cases there is no isomorphy betweeninflectional functions and lexical meanings on the one handand chunks of form on the other hand

Of course conjoint vectors for lexical meanings andinflectional functions can be implemented andmade to workonly because a Latin verb form such as sapıvissemus isunderstood when building the semantic space S to expresslexomes for person (first) number (plural) voice (active)

tense (pluperfect) mood (subjunctive) and lexical meaning(to know) (see [85] for computational modeling details)Although this could loosely be described as a decomposi-tional approach to inflected words we prefer not to use theterm lsquodecompositionalrsquo as in formal linguistics this termas outlined above has a very different meaning A moreadequate terminology would be that the system is conjointrealizational for production and conjoint inferential for com-prehension (we note here that we have focused on temporallyconjoint realization and inference leaving temporally disjointrealization and inference for sequences of words and possiblycompounds and fixed expression for further research)

Naive versus linear discriminative learning There isone important technical difference between the learningengine of the current study ldl and naive discriminativelearning (ndl) Because ndl makes the simplifying assump-tion that outcomes are orthogonal the weights from the cuesto a given outcome are independent of the weights from thecues to the other outcomes This independence assumptionwhich motivates the word naive in naive discriminativelearning makes it possible to very efficiently update networkweights during incremental learning In ldl by contrast thisindependence no longer holds learning is no longer naiveWe therefore refer to our networks not as ndl networksbut simply as linear discriminative learning networks Actualincremental updating of ldl networks is computationallymore costly but further numerical optimization is possibleand implementations are underway

Scaling An important property of the present approachis that good results are obtained already for datasets ofmodestsize Our semantic matrix is derived from the tasa corpuswhich with only 10 million words is dwarfed by the 28 billionwords of the ukWaC corpus used by Marelli and Baroni [60]more words than any individual speaker can ever encounterin their lifetime Similarly Arnold et al [18] report goodresults for word identification when models are trained on20 hours of speech which contrasts with the huge volumesof speech required for training deep learning networks forspeech recognition Although performance increases some-whatwithmore training data [161] it is clear that considerableheadway can be made with relatively moderate volumesof speech It thus becomes feasible to train models onfor instance Australian English and New Zealand Englishusing already existing speech corpora and to implementnetworks that make precise quantitative predictions for howunderstanding is mediated by listenersrsquo expectations abouttheir interlocutors (see eg [162])

In this study we have obtained good results with asemantic matrix with a dimensionality of around 4000lexomes By itself we find it surprising that a semantic vectorspace can be scaffolded with a mere 4000 words But thisfindingmay also shed light on the phenomenon of childhoodamnesia [163 164] Bauer and Larkina [165] pointed outthat young children have autobiographical memories thatthey may not remember a few years later They argue thatthe rate of forgetting diminishes as we grow older andstabilizes in adulthood Part of this process of forgettingmay relate to the changes in the semantic matrix Cruciallywe expect the correlational structure of lexomes to change

30 Complexity

considerably over time as experience accumulates If this isindeed the case the semantic vectors for the lexomes foryoung children are different from the corresponding lexomesfor older children which will again differ from those ofadults As a consequence autobiographical memories thatwere anchored to the lexomes at a young age can no longerbe accessed as the semantic vectors to which these memorieswere originally anchored no longer exist

No examplars Given how we estimate the transforma-tion matrices with which we navigate between form andmeaning it might seem that our approach is in essence animplementation of an exemplar-based theory of the mentallexicon After all for instance the C119886 matrix specifies foreach auditory word form (exemplar) its pertinent frequencyband summary features However the computational imple-mentation of the present study should not be confused withthe underlying algorithmic conceptualization The underly-ing conceptualization is that learning proceeds incrementallytrial by trial with the weights of transformation networksbeing updated with the learning rule of Widrow and Hoff[44] In the mean trial-by-trial updating with the Widrow-Hoff learning rule results in the same expected connectionweights as obtained by the present estimates using thegeneralized inverse The important conceptual advantage ofincremental learning is however that there is no need to storeexemplars By way of example for auditory comprehensionacoustic tokens are given with the input to the learningsystemThese tokens the ephemeral result of interactingwiththe environment leave their traces in the transformationmatrix F119886 but are not themselves stored The same holdsfor speech production We assume that speakers dynamicallyconstruct the semantic vectors from their past experiencesrather than retrieving these semantic vectors from somedictionary-like fixed representation familiar from currentfile systems The dynamic nature of these semantic vectorsemerged loud and clear from the modeling of novel complexwords for speech production under cross-validation In otherwords our hypothesis is that all input and output vectors areephemeral in the sense that they represent temporary statesof the cognitive system that are not themselves represented inthat system but that leave their traces in the transformationmatrices that jointly characterize and define the cognitivesystem that we approximate with the discriminative lexicon

This dynamic perspective on the mental lexicon asconsisting of several coupled networks that jointly bringthe cognitive system into states that in turn feed into fur-ther networks (not modeled here) for sentence integration(comprehension) and articulation (production) raises thequestion about the status of units in this theory Units playan important role when constructing numeric vectors forform and meaning Units for letter trigrams and phonetrigrams are of course context-enriched letter and phoneunits Content lexomes as well as lexomes for derivationaland inflectional functions are crutches for central semanticfunctions ranging from functions realizing relatively concreteonomasiological functions (lsquothis is a dog not a catrsquo) toabstract situational functions allowing entities events andstates to be specified on the dimensions of time spacequantity aspect etc However crucial to the central thrust of

the present study there are no morphological symbols (unitscombining form andmeaning) normorphological formunitssuch as stems and exponents of any kind in the modelAbove we have outlined the many linguistic considerationsthat have led us not to want to incorporate such units aspart of our theory Importantly as there are no hidden layersin our model it is not possible to seek for lsquosubsymbolicrsquoreflexes of morphological units in patterns of activation overhidden layers Here we depart from earlier connectionistmodels which rejected symbolic representations but retainedthe belief that there should be morphological representationalbeit subsymbolic ones Seidenberg and Gonnerman [166]for instance discuss morphological representations as beinglsquointerlevel representationsrsquo that are emergent reflections ofcorrelations among orthography phonology and semanticsThis is not to say that themore agglutinative a language is themore the present model will seem to the outside observer tobe operating withmorphemes But themore a language tendstowards fusional or polysyntheticmorphology the less strongthis impression will be

Model complexity Next consider the question of howdifficult lexical processing actually is Sidestepping the com-plexities of the neural embedding of lexical processing [126147] here we narrow down this question to algorithmiccomplexity State-of-the-art models in psychology [37 114167] implementmany layers of hierarchically organized unitsand many hold it for an established fact that such units arein some sense psychologically real (see eg [19 91 168])However empirical results can be equally well understoodwithout requiring the theoretical construct of the morpheme(see eg [47 54 58 166 169 170]) The present studyillustrates this point for lsquoboundaryrsquo effects in written andoral production When paths in the triphone graph branchuncertainty increases and processing slows down Althoughobserved lsquoboundaryrsquo effects are compatible with a post-Bloomfieldian lexicon the very same effects arise in ourmodel even though morphemes do not figure in any way inour computations

Importantly the simplicity of the linear mappings in ourmodel of the discriminative lexicon allows us to sidestep theproblem that typically arise in computationally implementedfull-scale hierarchical systems namely that errors arisingat lower levels propagate to higher levels The major stepsforward made in recent end-to-end models in languageengineering suggest that end-to-end cognitive models arealso likely to perform much better One might argue thatthe hidden layers of deep learning networks represent thetraditional lsquohidden layersrsquo of phonemes morphemes andword forms mediating between form and meaning in lin-guistic models and offshoots thereof in psychology Howeverthe case of baboon lexical learning [12 13] illustrates thatthis research strategy is not without risk and that simplerdiscriminative networks can substantially outperform deeplearning networks [15] We grant however that it is conceiv-able that model predictions will improve when the presentlinear networks are replaced by deep learning networks whenpresented with the same input and output representationsused here (see Zhao et al [171] for a discussion of lossfunctions for deep learning targeting numeric instead of

Complexity 31

categorical output vectors) What is unclear is whethersuch networks will improve our understandingmdashthe currentlinear networks offer the analyst connection weights betweeninput and output representations that are straightforwardlylinguistically interpretable Where we think deep learningnetworks really will come into their own is bridging the gapbetween for instance the cochlea and retina on the one handand the heuristic representations (letter trigrams) that wehave used as a starting point for our functional theory of themappings between form and meaning

Network flexibility An important challenge for deeplearning networks designed to model human lexical process-ing is to keep the networks flexible and open to rapid updat-ing This openness to continuous learning is required notonly by findings on categorization [40 41 41 42] includingphonetic categorization [172 173] but is also visible in timeseries of reaction times Baayen and Hendrix [17] reportedimproved prediction of visual lexical decision reaction timesin the British Lexicon Project when a naive discriminativelearning network is updated from trial to trial simulating thewithin-experiment learning that goes on as subjects proceedthrough the experiment In natural speech the phenomenonof phonetic convergence between interlocutors [174] likewisebears witness to a lexicon that is constantly recalibratedin interaction with its environment (for the many ways inwhich interlocutorsrsquo speech aligns see [175]) Likewise therapidity with which listeners adapt to foreign accents [176]within a few minutes shows that lexical networks exhibitfast local optimization As deep learning networks typicallyrequire huge numbers of cycles through the training dataonce trained they are at risk of not having the requiredflexibility for fast local adaptation This risk is reduced for(incremental) wide learning as illustrated by the phoneticconvergence evidenced by the model of Arnold et al [18] forauditory comprehension (a very similar point was made byLevelt [177] in his critique of connectionist models in thenineties)

Open questions This initial study necessarily leavesmany questions unanswered Compounds and inflectedderived words have not been addressed and morphologicaloperations such as reduplication infixation and nonconcate-native morphology provide further testing grounds for thepresent approach We also need to address the processingof compounds with multiple constituents ubiquitous inlanguages as different as German Mandarin and EstonianFurthermore in actual utterances words undergo assimila-tion at their boundaries and hence a crucial test case forthe discriminative lexicon is to model the production andcomprehension of words in multi-word utterances For ourproduction model we have also completely ignored stresssyllabification and pitch which however likely has renderedthe production task more difficult than necessary

A challenge for discriminative morphology is the devel-opment of proper semantic matrices For languages withrich morphology such as Estonian current morphologicalparsers will identify case-inflected words as nominativesgenitives or partitives (among others) but these labelsfor inflectional forms do not correspond to the semanticfunctions encoded in these forms (see eg [178 179]) What

is required are computational tools that detect the appropriateinflectional lexomes for these words in the sentences orutterances in which they are embedded

A related problem specifically for the speech productionmodel is how to predict the strongly reduced word formsthat are rampant in conversational speech [111] Our auditorycomprehension network F119886 is confronted with reduced formsin training and the study of Arnold et al [18] indicates thatpresent identification accuracymay approximate human per-formance for single-word recognition Whereas for auditorycomprehension we are making progress what is required forspeech production is a much more detailed understanding ofthe circumstances under which speakers actually use reducedword forms The factors driving reduction may be in partcaptured by the semantic vectors but we anticipate thataspects of the unfolding discourse play an important role aswell which brings us back to the unresolved question of howto best model not isolated words but words in context

An important challenge for our theory which formalizesincremental implicit learning is how to address and modelthe wealth of explicit knowledge that speakers have abouttheir language We play with the way words sound in poetrywe are perplexed by wordsrsquo meanings and reinterpret themso that they make more sense (eg the folk etymology [180]reflected in modern English crawfish a crustacean and nota fish originally Middle English crevis compare ecrevissein French) we teach morphology in schools and we evenfind that making morphological relations explicit may bebeneficial for aphasic patients [181] The rich culture of worduse cannot but interact with the implicit system but howexactly this interaction unfolds andwhat its consequences areat both sides of the consciousunconscious divide is presentlyunclear

Although many questions remain our model is remark-ably successful in capturing many aspects of lexical andmorphological processing in both comprehension and pro-duction Since themodel builds on a combination of linguisti-callymotivated lsquosmartrsquo low-level representations for form andmeaning and large but otherwise straightforward andmathe-matically well-understood two-layer linear networks (to beclear networks without any hidden layers) the conclusionseems justified that algorithmically the ldquomental lexiconrdquomaybe much simpler than previously thought

Appendix

A Vectors Matrices and MatrixMultiplication

Figure 12 presents two data points 119886 and 119887 with coordinates(3 4) and (-2 -3) Arrows drawn from the origin to thesepoints are shown in blue with a solid and dashed linerespectively We refer to these points as vectors Vectors aredenoted with lower case letters in bold and we place the 119909coordinate of a point next to the 119910 coordinate

a = (3 4) b = (minus2 minus3) (A1)

32 Complexity

minus4 minus2 0 2 4

minus4

minus2

02

4

x

y

a

b

x

y

Figure 12 Points 119886 and 119887 can be transformed into points 119901 and 119902 bya linear transformation

We can represent a and b jointly by placing them next to eachother in a matrix For matrices we use capital letters in boldfont

A = ( 3 4minus2 minus3) (A2)

In addition to the points 119886 and 119887 Figure 12 also shows twoother datapoints 119909 and 119910 The matrix for these two points is

B = (minus5 24 minus1) (A3)

Let us assume that points 119886 and 119887 represent the forms oftwo words 1205961 and 1205962 and that the points 119909 and 119910 representthe meanings of these two words (below we discuss in detailhow exactly wordsrsquo forms and meanings can be representedas vectors of numbers) As morphology is the study of therelation between wordsrsquo forms and their meanings we areinterested in how to transform points 119886 and b into points119909 and 119910 and likewise in how to transform points 119909 and119910 back into points 119886 and 119887 The first transformation isrequired for comprehension and the second transformationfor production

We begin with the transformation for comprehensionFormally we have matrix A which we want to transforminto matrix B A way of mapping A onto B is by means ofa linear transformation using matrix multiplication For 2 times2 matrices matrix multiplication is defined as follows(1198861 11988621198871 1198872) (1199091 11990921199101 1199102) = (11988611199091 + 11988621199101 11988611199092 + 1198862119910211988711199091 + 11988721199101 11988711199092 + 11988721199102) (A4)

Such a mapping of A onto B is given by a matrix F

F = ( 1 2minus2 minus1) (A5)

and it is straightforward to verify that indeed( 3 4minus2 minus3) ( 1 2minus2 minus1) = (minus5 24 minus1) (A6)

Using matrix notation we can write

AF = B (A7)

Given A and B how do we obtain F The answeris straightforward but we need two further concepts theidentity matrix and the matrix inverse For multiplication ofnumbers the identity multiplication is multiplication with 1For matrices multiplication with the identity matrix I leavesthe locations of the points unchangedThe identitymatrix hasones on the main diagonal and zeros elsewhere It is easilyverified that indeed(1 00 1) (1199091 11990921199101 1199102) = (1199091 11990921199101 1199102) (1 00 1) = (1199091 11990921199101 1199102) (A8)

For numbers the inverse of multiplication with 119904 is dividingby 119904 (1119904 ) (119904119909) = 119909 (A9)

For matrices the inverse of a square matrix X is that matrixXminus1 such that their product is the identity matrix

Xminus1X = XXminus1 = I (A10)

For nonsquare matrices the inverse Yminus1 is defined such that

Yminus1 (YX) = (XY) 119884minus1 = X (A11)

We find the square matrix F that maps the square matrices Aonto B as follows

AF = B

Aminus1AF = Aminus1B

IF = Aminus1B

F = Aminus1B(A12)

Since for the present example A and its inverse happen to beidentical (Aminus1 = A) we obtain (A6)

Fmaps wordsrsquo form vectors onto wordsrsquo semantic vectorsLet us now consider the reverse and see how we can obtainwordsrsquo form vectors from wordsrsquo semantic vectors That isgiven B we want to find that matrix G which maps B ontoA ie

BG = A (A13)

Solving this equation proceeds exactly as above

BG = A

Bminus1BG = Bminus1A

IG = Bminus1A

G = Bminus1A(A14)

Complexity 33

The inverse of B is

Bminus1 = (13 2343 53) (A15)

and hence we have

(13 2343 53) ( 3 4minus2 minus3) = (minus13 minus2323 13 ) = G (A16)

The inverse of a matrix needs not exist A square matrixthat does not have an inverse is referred to as a singularmatrix Almost all matrices with which we work in theremainder of this study are singular For singular matricesan approximation of the inverse can be used such as theMoore-Penrose generalized inverse (In R the generalizedinverse is available in theMASS package function ginv Thenumeric library that is used by ginv is LAPACK availableat httpwwwnetliborglapack) In this study wedenote the generalized inverse of a matrix X by X1015840

The examples presented here are restricted to 2 times 2matrices but themathematics generalize tomatrices of largerdimensions When multiplying two matrices X and Y theonly constraint is that the number of columns ofX is the sameas the number of rows ofYThe resultingmatrix has the samenumber of rows as X and the same number of columns as Y

In the present study the rows of matrices represent wordsand columns the features of thesewords For instance the rowvectors of A can represent wordsrsquo form features and the rowvectors of and B the corresponding semantic features Thefirst row vector of A is mapped onto the first row vector ofB as follows (3 4) ( 1 2minus2 minus1) = (minus5 2) (A17)

A transformationmatrix such as F can be represented as atwo-layer networkThe network corresponding to F is shownin Figure 13 When the input vector (3 4) is presented to thenetwork the nodes 1198941 and 1198942 are set to 3 and 4 respectively Toobtain the activations 1199001 = minus5 and 1199002 = 2 for the output vector(minus5 2) we cumulate the evidence at the input weighted bythe connection strengths specified on the edges of the graphThus for 1199001 we obtain the value 3 times 1 + 4 times minus2 = minus5 and for1199002 we have 3 times 2 + 4 times minus1 = 2 In other words the networkmaps the input vector (3 4) onto the output vector (minus5 2)

An important property of linear maps is that they areproductive We can present a novel form vector say

s = (2 2) (A18)

to the network and it will map this vector onto a novelsemantic vector Using matrices this new semantic vector isstraightforwardly calculated(2 2) ( 1 2minus2 minus1) = (minus2 2) (A19)

1 2 minus2 minus1

i1 i2

o1 o2

semantic vectors come out here

form features go in here

connection weightsare the transformationmatrix

Figure 13 The linear network corresponding to the transformationmatrix F There are two input nodes 1198941 and 1198942 (in blue)and twooutput nodes 1199001 and 1199002 in red When the input vector (3 4) ispresented to the network the value at 1199001 is 3 times 1 + 4 times minus2 = minus5and that at 1199002 is 3 times 2 + 4 times minus1 = 2 The network maps the point(3 4) onto the point (minus5 2)B Graph-Based Triphone Sequencing

The hypothesis underlying graph-based triphone sequencingis that in the directed graph that has triphones as verticesand an edge between any two vertices that overlap (iesegments 2 and 3 of triphone 1 are identical to segments 1and 2 of triphone 2) the path from a wordrsquos initial triphone(the triphone starting with ) to a wordrsquos final triphone (thetriphone ending with a ) is the path receiving the bestsupport of all possible paths from the initial to the finaltriphone Unfortunately the directed graph containing allvertices and edges is too large tomake computations tractablein a reasonable amount of computation time The followingalgorithm is a heuristic algorithm that first collects potentiallyrelevant vertices and edges and then calculates all pathsstarting fromany initial vertex using theall simple pathsfunction from the igraph package [127] As a first step allvertices that are supported by the stem and whole wordwith network support exceeding 01 are selected To theresulting set those vertices are added that are supported bythe affix which is conditional on these vertices having anetwork support exceeding 095 A directed graph can nowbe constructed and for each initial vertex all paths startingat these vertices are calculated The subset of those pathsreaching a final vertex is selected and the support for eachpath is calculated As longer paths trivially will have greatersupport the support for a path is weighted for its lengthsimply by dividing the raw support by the path length Pathsare ranked by weighted path support and the path withmaximal support is selected as the acoustic image drivingarticulation

It is possible that no path from an initial vertex to a finalvertex is found due to critical boundary triphones not beinginstantiated in the data on which the model was trainedThis happens when the model is assessed on novel inflected

34 Complexity

nobo

dyso

met

hing

ever

ythi

ngno

thin

gev

eryo

nean

ythi

ngan

yone

som

eone

hers

him his

via

than I

exce

ptou

tside

thei

rsw

hoev

erth

ence

min

ein

side

amon

gst its

thei

rou

rm

yyo

uyo

ur her

she

your

selv

esw

hate

ver us we it

they he

them m

em

yself

him

self

hers

elfou

rselv

esyo

urse

lfits

elfth

emse

lves

abou

tw

ithou

tw

ithin

beyo

ndam

ong

betw

een

thro

ugho

utdu

ring

since by

from in of

with at fo

r tobe

low

abov

e up offdo

wn

upon on

agai

nst

tow

ard

tow

ards

arou

ndne

aral

ong

unde

rbe

hind

besid

eth

roug

hon

toin

toov

erac

ross

nobodysomethingeverythingnothingeveryoneanythinganyonesomeonehershimhisviathanIexceptoutsidetheirswhoeverthencemineinsideamongstitstheirourmyyouyourhersheyourselveswhateverusweittheyhethemmemyselfhimselfherselfourselvesyourselfitselfthemselvesaboutwithoutwithinbeyondamongbetweenthroughoutduringsincebyfrominofwithatfortobelowaboveupoffdownupononagainsttowardtowardsaroundnearalongunderbehindbesidethroughontointooveracross

minus01 0 01 02Value

Color Keyand Histogram

050

015

0025

00C

ount

Figure 14 Heatmap for the correlations of the semantic vectors of pronouns and prepositions Note that quantifier pronouns form a groupof their own that prepositions and the other pronouns form distinguisable groups and that within groups further subclusters are visible egforwe and us she and her and across and over All diagonal elements represent correlations equal to 1 (this is not brought out by the heatmapwhich color-codes diagonal elements with the color for the highest correlation of the range we specified 025)

and derived words under cross-validationTherefore verticesand edges are added to the graph for all nonfinal verticesthat are at the end of any of the paths starting at any initialtriphone Such novel vertices and edges are assigned zeronetwork support Typically paths from the initial vertex tothe final vertex can now be constructed and path support isevaluated as above

For an algorithm that allows triphone paths to includecycles which may be the case in languages with much richer

morphology than English see Baayen et al [85] and for codethe R packageWpmWithLdl described therein

C A Heatmap for Function Words(Pronouns and Prepositions)

The heatmap for the correlations of the semantic vectors ofpronouns and prepositions is presented in Figure 14

Complexity 35

Data Availability

With the exception of the data from theDistributed Little RedHen Lab all primary data are publicly available from the citedsources

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The authors are indebted to Geoff Hollis Jessie Nixon ChrisWestbury and LuisMienhardt for their constructive feedbackon earlier versions of this manuscript This research wassupported by an ERC advancedGrant (no 742545) to the firstauthor

References

[1] G Frege ldquoBegriffsschrift a formula language modeled uponthat of arithmetic for pure thoughtrdquo in From Frege to GodelA Source Book in Mathematical Logic 1879-1931 pp 1ndash82 1967

[2] B RussellAn Inquiry intoMeaning andTruth Allen andUnwinLondon UK 1942

[3] R Montague ldquoThe proper treatment of quantification in ordi-nary englishrdquo in Approaches to Natural Language pp 221ndash242Springer 1973

[4] N Chomsky Syntactic Structures vol 33 Mouton The Hague1957

[5] W J M Levelt Formal Grammars in Linguistics And Psycholin-guistics Volume 1 An Introduction to the Theory of FormalLanguages and Automata Volume 2 Applications in LinguisticTheory Volume 3 Psycholinguistic Applications John BenjaminsPublishing 2008

[6] D F Kleinschmidt and T Florian Jaeger ldquoRobust speechperception Recognize the familiar generalize to the similar andadapt to the novelrdquo Psychological Review vol 122 no 2 pp 148ndash203 2015

[7] L Breiman ldquoRandom forestsrdquo Machine Learning vol 45 no 1pp 5ndash32 2001

[8] C Strobl A-L Boulesteix A Zeileis and T Hothorn ldquoBiasin random forest variable importance measures illustrationssources and a solutionrdquo BMC Bioinformatics vol 8 2007

[9] A Hannun C Case J Casper et al ldquoDeep speechScaling up end-to-end speech recognitionrdquo 2014httpsarxivorgabs14125567

[10] Z Tuske P Golik R Schluter and H Ney ldquoAcoustic modelingwith deep neural networks using raw time signal for lvcsrrdquo inProceedings of the INTERSPEECH pp 890ndash894 2014

[11] J Schmidhuber ldquoDeep learning in neural networks anoverviewrdquo Neural Networks vol 61 pp 85ndash117 2015

[12] T Hannagan J C Ziegler S Dufau J Fagot and J GraingerldquoDeep learning of orthographic representations in baboonsrdquoPLoS ONE vol 9 no 1 2014

[13] J Grainger S Dufau M Montant J C Ziegler and J FagotldquoOrthographic processing in baboons (Papio papio)rdquo Sciencevol 336 no 6078 pp 245ndash248 2012

[14] D Scarf K Boy A U Reinert J Devine O Gunturkun andM Colombo ldquoOrthographic processing in pigeons (columba

livia)rdquo Proceedings of the National Academy of Sciences vol 113no 40 pp 11272ndash11276 2016

[15] M Linke F Broker M Ramscar and R H Baayen ldquoArebaboons learning ldquoorthographicrdquo representations Probablynotrdquo PLoS ONE vol 12 no 8 2017

[16] D H Hubel and T N Wiesel ldquoReceptive fields binocularinteraction and functional architecture in the catrsquos visualcortexrdquo The Journal of Physiology vol 160 pp 106ndash154 1962

[17] R H Baayen and P Hendrix ldquoTwo-layer networks non-linear separation and human learningrdquo in From Semantics toDialectometry Festschrift in honor of JohnNerbonne Tributes 32M Wieling M Kroon G Van Noord and G Bouma Eds pp13ndash22 College Publications 2017

[18] D Arnold F Tomaschek K Sering F Lopez and R H BaayenldquoWords from spontaneous conversational speech can be rec-ognized with human-like accuracy by an error-driven learningalgorithm that discriminates between meanings straight fromsmart acoustic features bypassing the phoneme as recognitionunitrdquo PLoS ONE vol 12 no 4 2017

[19] P Zwitserlood ldquoProcessing and representation of morphologi-cal complexity in native language comprehension and produc-tionrdquo in The Construction of Words G E Booij Ed pp 583ndash602 Springer 2018

[20] G Booij ldquoConstruction morphologyrdquo Language and LinguisticsCompass vol 4 no 7 pp 543ndash555 2010

[21] J P Blevins ldquoWord-based morphologyrdquo Journal of Linguisticsvol 42 no 3 pp 531ndash573 2006

[22] G Stump Inflectional Morphology A Theory of ParadigmStructure Cambridge University Press 2001

[23] R Beard ldquoOn the extent and nature of irregularity in thelexiconrdquo Lingua vol 42 no 4 pp 305ndash341 1977

[24] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1974

[25] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1991

[26] C Hockett ldquoThe origin of speechrdquo Scientific American vol 203pp 89ndash97 1960

[27] M Erelt Estonian Language M Erelt Ed Estonian AcademyPublishers Tallinn Estonia 2003

[28] P H Matthews Grammatical Theory in the United States fromBloomfield to Chomsky Cambridge University Press Cam-bridge 1993

[29] N Chomsky andMHalleTheSound Pattern of English Harperand Row NY USA 1968

[30] T Landauer and S Dumais ldquoA solution to Platorsquos problem Thelatent semantic analysis theory of acquisition induction andrepresentation of knowledgerdquo Psychological Review vol 104 no2 pp 211ndash240 1997

[31] W Weaver ldquoTranslationrdquo in Machine Translation of LanguagesFourteen Essays W N Locke and A D Booth Eds pp 15ndash23MIT Press Cambridge 1955

[32] J R Firth Selected Papers of J R Firth 1952-59 IndianaUniversity Press 1968

[33] K Lund and C Burgess ldquoProducing high-dimensional seman-tic spaces from lexical co-occurrencerdquo Behavior Research Meth-ods Instruments and Computers vol 28 no 2 pp 203ndash2081996

[34] C Shaoul and C Westbury ldquoExploring lexical co-occurrencespace using HiDExrdquo Behavior Research Methods vol 42 no 2pp 393ndash413 2010

36 Complexity

[35] T Mikolov I Sutskever K Chen G Corrado and J DeanldquoDistributed representations of words and phrases and theircompositionalityrdquo inAdvances in Neural Information ProcessingSystems pp 3111ndash3119 2013

[36] A Roelofs ldquoThe WEAVER model of word-form encoding inspeech productionrdquo Cognition vol 64 no 3 pp 249ndash284 1997

[37] W J M Levelt A Roelofs and A S Meyer ldquoA theory of lexicalaccess in speech productionrdquoBehavioral and Brain Sciences vol22 no 1 pp 1ndash38 1999

[38] M Taft ldquoInteractive-activation as a Framework for Under-standing Morphological Processingrdquo Language and CognitiveProcesses vol 9 no 3 pp 271ndash294 1994

[39] R Schreuder and R H Baayen ldquoModeling morphologicalprocessingrdquo inMorphological Aspects of Language Processing LB Feldman Ed pp 131ndash154 Lawrence ErlbaumHillsdale NewJersey USA 1995

[40] B C Love D L Medin and T M Gureckis ldquoSUSTAIN ANetwork Model of Category Learningrdquo Psychological Reviewvol 111 no 2 pp 309ndash332 2004

[41] C J Marsolek ldquoWhat antipriming reveals about primingrdquoTrends in Cognitive Sciences vol 12 no 5 pp 176ndash181 2008

[42] M Ramscar and R Port ldquoCategorization (without categories)rdquoin Handbook of Cognitive Linguistics E Dabrowska and DDivjak Eds pp 75ndash99 De Gruyter Berlin Germany 2015

[43] R A Rescorla and A R Wagner ldquoA theory of Pavlovianconditioning Variations in the effectiveness of reinforcementand nonreinforcementrdquo in Classical Conditioning II CurrentResearch and Theory A H Black and W F Prokasy Eds pp64ndash99 Appleton Century Crofts NY USA 1972

[44] B Widrow and M E Hoff ldquoAdaptive switching circuitsrdquo 1960WESCON Convention Record Part IV pp 96ndash104 1960

[45] R H Baayen R Schreuder and R Sproat ldquoMorphology inthe Mental Lexicon A Computational Model for Visual WordRecognitionrdquo in Lexicon Development for Speech and LanguageProcessing F van Eynde and D Gibbon Eds pp 267ndash291Kluwer Academic Publishers 2000

[46] M Taft ldquoA morphological-decomposition model of lexicalrepresentationrdquo Linguistics vol 26 no 4 pp 657ndash667 1988

[47] MWHarmandM S Seidenberg ldquoComputing themeanings ofwords in reading Cooperative division of labor between visualand phonological processesrdquo Psychological Review vol 111 no3 pp 662ndash720 2004

[48] P C Trimmer J M McNamara A Houston and J A RMarshall ldquoDoes natural selection favour the Rescorla-Wagnerrulerdquo Journal of Theoretical Biology vol 302 pp 39ndash52 2012

[49] R R Miller R C Barnet and N J Grahame ldquoAssessment ofthe Rescorla-Wagner modelrdquo Psychological Bulletin vol 117 no3 pp 363ndash386 1995

[50] L J Kamin ldquoPredictability surprise attention and condition-ingrdquo in Punishment and Aversive Behavior B A Campbell andRMChurch Eds pp 276ndash296Appleton-Century-CroftsNYUSA 1969

[51] R A Rescorla ldquoPavlovian conditioning Itrsquos not what you thinkit isrdquo American Psychologist vol 43 no 3 pp 151ndash160 1988

[52] M Ramscar D Yarlett M Dye K Denny and K ThorpeldquoThe Effects of Feature-Label-Order and Their Implications forSymbolic Learningrdquo Cognitive Science vol 34 no 6 pp 909ndash957 2010

[53] M Ramscar and D Yarlett ldquoLinguistic self-correction in theabsence of feedback A new approach to the logical problem oflanguage acquisitionrdquo Cognitive Science vol 31 no 6 pp 927ndash960 2007

[54] R H Baayen P Milin D F ETHurđevic P Hendrix and MMarelli ldquoAn Amorphous Model for Morphological Process-ing in Visual Comprehension Based on Naive DiscriminativeLearningrdquo Psychological Review vol 118 no 3 pp 438ndash4812011

[55] F M Del Prado Martın R Bertram T Haikio R Schreuderand R H Baayen ldquoMorphological family size in a morphologi-cally rich languageThe case of Finnish compared to Dutch andHebrewrdquo Journal of Experimental Psychology LearningMemoryand Cognition vol 30 pp 1271ndash1278 2004

[56] P Milin D F Durdevic and F M Del Prado Martın ldquoThesimultaneous effects of inflectional paradigms and classes onlexical recognition Evidence from Serbianrdquo Journal of Memoryand Language vol 60 no 1 pp 50ndash64 2009

[57] B K Bergen ldquoThe psychological reality of phonaesthemesrdquoLanguage vol 80 no 2 pp 290ndash311 2004

[58] P Milin L B Feldman M Ramscar P Hendrix and R HBaayen ldquoDiscrimination in lexical decisionrdquo PLoS ONE vol 12no 2 2017

[59] R H Baayen P Milin and M Ramscar ldquoFrequency in lexicalprocessingrdquo Aphasiology vol 30 no 11 pp 1174ndash1220 2016

[60] M Marelli and M Baroni ldquoAffixation in semantic space Mod-eling morpheme meanings with compositional distributionalsemanticsrdquo Psychological Review vol 122 no 3 pp 485ndash5152015

[61] T Sering PMilin andRH Baayen ldquoLanguage comprehensionas amultiple label classification problemrdquo StatisticaNeerlandicapp 1ndash15 2018

[62] R Kaye and RWilson Linear Algebra Oxford University Press1998

[63] S H Ivens and B L Koslin Demands for Reading LiteracyRequire New Accountability Methods Touchstone Applied Sci-ence Associates 1991

[64] T K Landauer P W Foltz and D Laham ldquoAn introduction tolatent semantic analysisrdquoDiscourse Processes vol 25 no 2-3 pp259ndash284 1998

[65] H Schmid ldquoImprovements in Part-of-Speech Tagging with anApplication to Germanrdquo in Proceedings of the ACL SIGDAT-Workshop pp 13ndash25 Dublin Ireland 1995

[66] R H Baayen R Piepenbrock and L Gulikers The CELEXLexical Database (CD-ROM) Linguistic Data ConsortiumUniversity of Pennsylvania Philadelphia PA 1995

[67] J Mitchell and M Lapata ldquoVector-based models of semanticcompositionrdquo in Proceedings of the ACL pp 236ndash244 2008

[68] A Lazaridou M Marelli R Zamparelli and M BaronildquoCompositionally derived representations of morphologicallycomplex words in distributional semanticsrdquo in Proceedings ofthe ACL (1) pp 1517ndash1526 2013

[69] R Cotterell H Schutze and J Eisner ldquoMorphological smooth-ing and extrapolation of word embeddingsrdquo in Proceedings ofthe 54th Annual Meeting of the Association for ComputationalLinguistics vol 1 pp 1651ndash1660 2016

[70] B D Zeller S Pado and J Snajder ldquoTowards semantic valida-tion of a derivational lexiconrdquo in Proceedings of the COLING2014 the 25th International Conference on Computational Lin-guistics Technical Papers pp 1728ndash1739 2014

[71] H Marchand The Categories and Types of Present-Day EnglishWord Formation A Synchronic-Diachronic Approach BeckrsquoscheVerlagsbuchhandlung Munchen Germany 1969

[72] L Bauer EnglishWord-Formation Cambridge University PressCambridge 1983

Complexity 37

[73] I PlagWord-Formation in English CambridgeUniversity PressCambridge 2003

[74] R Beard Lexeme-Morpheme Base Morphology A GeneralTheory of Inflection and Word Formation State University ofNew York Press Albany NY USA 1995

[75] K Geeraert J Newman and R H Baayen ldquoIdiom VariationExperimental Data and a Blueprint of a Computational ModelrdquoTopics in Cognitive Science vol 9 no 3 pp 653ndash669 2017

[76] C Shaoul S Bitschau N Schilling et al ldquondl2 Naive discrim-inative learning an implementation in R R packagerdquo 2015

[77] P Milin D Divjak and R H Baayen ldquoA learning perspectiveon individual differences in skilled reading Exploring andexploiting orthographic and semantic discrimination cuesrdquoJournal of Experimental Psychology Learning Memory andCognition vol 43 no 11 pp 1730ndash1751 2017

[78] M Ramscar P Hendrix C Shaoul P Milin and R H BaayenldquoNonlinear dynamics of lifelong learning the myth of cognitivedeclinerdquo Topics in Cognitive Science vol 6 pp 5ndash42 2014

[79] M Ramscar C C Sun P Hendrix and R H Baayen ldquoTheMis-measurement of Mind Life-Span Changes in Paired-Associate-Learning Scores Reflect the ldquoCostrdquo of Learning Not CognitiveDeclinerdquo Psychological Science vol 28 no 8 pp 1171ndash1179 2017

[80] G desRosiers and D Ivison ldquoPaired associate learning Form 1and form 2 of the Wechsler memory scalerdquo Archives of ClinicalNeuropsychology vol 3 no 1 pp 47ndash67 1988

[81] E Bruni N K Tran andM Baroni ldquoMultimodal distributionalsemanticsrdquo Journal of Artificial Intelligence Research vol 49 pp1ndash47 2014

[82] W N Venables and B D RipleyModern Applied Statistics withS-PLUS Springer NY USA 2002

[83] A B Warriner V Kuperman and M Brysbaert ldquoNorms ofvalence arousal and dominance for 13915 English lemmasrdquoBehavior Research Methods vol 45 no 4 pp 1191ndash1207 2013

[84] D Kastovsky ldquoProductivity in word formationrdquo Linguistics vol24 no 3 pp 585ndash600 1986

[85] R H Baayen Y Chuang and J P Blevins ldquoInflectionalmorphology with linear mappingsrdquoTheMental Lexicon vol 13no 2 pp 232ndash270 2018

[86] Burnage CELEX A Guide for Users Centre for Lexical Infor-mation Nijmegen The Netherlands 1988

[87] C McBride-Chang ldquoModels of Speech Perception and Phono-logical Processing in Readingrdquo Child Development vol 67 no4 pp 1836ndash1856 1996

[88] G C Van Orden H Kloos et al ldquoThe question of phonologyand readingrdquo inThe Science of Reading A Handbook pp 61ndash782005

[89] D Jared and K OrsquoDonnell ldquoSkilled adult readers activate themeanings of high-frequency words using phonology Evidencefrom eye trackingrdquoMemoryampCognition vol 45 no 2 pp 334ndash346 2017

[90] G S Dell ldquoA Spreading-Activation Theory of Retrieval inSentence Productionrdquo Psychological Review vol 93 no 3 pp283ndash321 1986

[91] A Marantz ldquoNo escape from morphemes in morphologicalprocessingrdquo Language and Cognitive Processes vol 28 no 7 pp905ndash916 2013

[92] M Bozic W D Marslen-Wilson E A Stamatakis M HDavis and L K Tyler ldquoDifferentiating morphology formand meaning Neural correlates of morphological complexityrdquoCognitive Neuroscience vol 19 no 9 pp 1464ndash1475 2007

[93] R F Port andA P Leary ldquoAgainst formal phonologyrdquoLanguagevol 81 no 4 pp 927ndash964 2005

[94] J P Blevins ldquoStems and paradigmsrdquo Language vol 79 no 4 pp737ndash767 2003

[95] R Forsyth and D Holmes ldquoFeature-finding for test classifica-tionrdquo Literary and Linguistic Computing vol 11 no 4 pp 163ndash174 1996

[96] H Pham and R H Baayen ldquoVietnamese compounds showan anti-frequency effect in visual lexical decisionrdquo LanguageCognition and Neuroscience vol 30 no 9 pp 1077ndash1095 2015

[97] L Cohen and S Dehaene ldquoVentral and dorsal contributions toword readingrdquo inTheCognitive Neurosciences M S GazzanigaEd pp 789ndash804 Massachusetts Institute of Technology 2009

[98] E Keuleers P Lacey K Rastle and M Brysbaert ldquoThe BritishLexicon Project Lexical decision data for 28730 monosyllabicand disyllabic English wordsrdquo Behavior Research Methods vol44 no 1 pp 287ndash304 2012

[99] R K Wagner J K Torgesen and C A Rashotte ldquoDevel-opment of reading-related phonological processing abilitiesnew evidence of bidirectional causality from a latent variablelongitudinal studyrdquoDevelopmental Psychology vol 30 no 1 pp73ndash87 1994

[100] K F EWong andH-C Chen ldquoOrthographic and phonologicalprocessing in reading Chinese text Evidence from eye fixa-tionsrdquo Language and Cognitive Processes vol 14 no 5-6 pp461ndash480 1999

[101] R L Newman D Jared and C A Haigh ldquoDoes phonologyplay a role when skilled readers read high-frequency wordsEvidence from ERPsrdquo Language and Cognitive Processes vol 27no 9 pp 1361ndash1384 2012

[102] D Jared J Ashby S J Agauas and B A Levy ldquoPhonologicalactivation of word meanings in grade 5 readersrdquo Journal ofExperimental Psychology LearningMemory andCognition vol42 no 4 pp 524ndash541 2016

[103] T Bitan A Kaftory A Meiri-Leib Z Eviatar and O PelegldquoPhonological ambiguity modulates resolution of semanticambiguity during reading An fMRI study of Hebrewrdquo Neu-ropsychology vol 31 no 7 pp 759ndash777 2017

[104] D Jared and S Bainbridge ldquoReading homophone puns Evi-dence from eye trackingrdquo Canadian Journal of ExperimentalPsychology vol 71 no 1 pp 2ndash13 2017

[105] S Amenta M Marelli and S Sulpizio ldquoFrom sound tomeaning Phonology-to-Semantics mapping in visual wordrecognitionrdquo Psychonomic Bulletin amp Review vol 24 no 3 pp887ndash893 2017

[106] M Perrone-Bertolotti J Kujala J R Vidal et al ldquoHow silent issilent reading intracerebral evidence for top-down activationof temporal voice areas during readingrdquo The Journal of Neuro-science vol 32 no 49 pp 17554ndash17562 2012

[107] B Yao P Belin andC Scheepers ldquoSilent reading of direct versusindirect speech activates voice-selective areas in the auditorycortexrdquo Journal of Cognitive Neuroscience vol 23 no 10 pp3146ndash3152 2011

[108] M Coltheart B Curtis P Atkins and M Haller ldquoMod-els of Reading Aloud Dual-Route and Parallel-Distributed-Processing Approachesrdquo Psychological Review vol 100 no 4pp 589ndash608 1993

[109] M Coltheart ldquoModeling reading The dual-route approachrdquo inThe Science of Reading A Handbook pp 6ndash23 2005

38 Complexity

[110] M Pitt K Johnson E Hume S Kiesling and W RaymondldquoThe Buckeye corpus of conversational speech Labeling con-ventions and a test of transcriber reliabilityrdquo Speech Communi-cation vol 45 no 1 pp 89ndash95 2005

[111] K Johnson ldquoMassive reduction in conversational AmericanEnglishrdquo in Proceedings of the Spontaneous Speech Data andAnalysis Proceedings of the 1st Session of the 10th InternationalSymposium pp 29ndash54 The National International Institute forJapanese Language Tokyo Japan 2004

[112] C Phillips ldquoLevels of representation in the electrophysiology ofspeech perceptionrdquo Cognitive Science vol 25 no 5 pp 711ndash7312001

[113] R L Diehl A J Lotto and L L Holt ldquoSpeech perceptionrdquoAnnual Review of Psychology vol 55 pp 149ndash179 2004

[114] D Norris and J M McQueen ldquoShortlist B a Bayesian model ofcontinuous speech recognitionrdquo Psychological Review vol 115no 2 pp 357ndash395 2008

[115] S Hawkins ldquoRoles and representations of systematic finephonetic detail in speech understandingrdquo Journal of Phoneticsvol 31 no 3-4 pp 373ndash405 2003

[116] CCucchiarini andH Strik ldquoAutomatic Phonetic TranscriptionAn overviewrdquo in Proceedings of the 15th ICPhS pp 347ndash350Barcelona Spain 2003

[117] K JohnsonTheAuditoryPerceptual Basis for Speech Segmenta-tion Ohio State University Working Papers in Linguistics 1997

[118] H Fletcher ldquoAuditory patternsrdquo Reviews of Modern Physics vol12 no 1 pp 47ndash65 1940

[119] D Arnold ldquoAcousticndlcoder Coding sound files for use withndl R package version 101rdquo 2017

[120] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2016

[121] J Pickett and I Pollack ldquoIntelligibility of excerpts from fluentspeech Effects of rate of utterance and duration of excerptrdquoLanguage and Speech vol 6 pp 151ndash164 1963

[122] L Shockey ldquoPerception of reduced forms by non-native speak-ers of Englishrdquo in Sound Patterns of Spontaneous Speech DDuez Ed pp 97ndash100 ESCA Aix 1998

[123] M Ernestus H Baayen and R Schreuder ldquoThe recognition ofreduced word formsrdquo Brain and Language vol 81 no 1-3 pp162ndash173 2002

[124] R H Baayen C Shaoul J Willits and M Ramscar ldquoCom-prehension without segmentation a proof of concept withnaive discriminative learningrdquo Language Cognition and Neu-roscience vol 31 no 1 pp 106ndash128 2016

[125] C P Browman and L Goldstein ldquoArticulatory phonology Anoverviewrdquo Phonetica vol 49 no 3-4 pp 155ndash180 1992

[126] J A Tourville and F H Guenther ldquoThe DIVA model A neuraltheory of speech acquisition and productionrdquo Language andCognitive Processes vol 26 no 7 pp 952ndash981 2011

[127] G Csardi and T Nepusz ldquoThe igraph software package forcomplex network researchrdquo InterJournal Complex Systems vol1695 no 5 pp 1ndash9 2006

[128] M Ernestus and R H Baayen ldquoPredicting the unpredictableInterpreting neutralized segments in Dutchrdquo Language vol 79no 1 pp 5ndash38 2003

[129] R Weingarten G Nottbusch and U Will ldquoMorphemes sylla-bles and graphemes in written word productionrdquo inMultidisci-plinary Approaches to Speech Production T Pechmann and CHabel Eds pp 529ndash572 Mouton de Gruyter Berlin Germany2004

[130] R Bertram F E Toslashnnessen S Stromqvist J Hyona andP Niemi ldquoCascaded processing in written compound wordproductionrdquo Frontiers in Human Neuroscience vol 9 pp 1ndash102015

[131] T Cho ldquoEffects of morpheme boundaries on intergesturaltiming Evidence from Koreanrdquo Phonetica vol 58 no 3 pp129ndash162 2001

[132] S N Wood Generalized Additive Models Chapman ampHallCRC NY USA 2017

[133] M Seidenberg ldquoSublexical structures in visual word recogni-tion Access units or orthographic redundancyrdquo in Attentionand Performance XII M Coltheart Ed pp 245ndash264 LawrenceErlbaum Associates Hove 1987

[134] J M McQueen ldquoSegmentation of continuous speech usingphonotacticsrdquo Journal of Memory and Language vol 39 no 1pp 21ndash46 1998

[135] J B Hay Causes and Consequences of Word Structure Rout-ledge NY USA London UK 2003

[136] J B Hay ldquoFrom speech perception to morphology Affix-ordering revisitedrdquo Language vol 78 pp 527ndash555 2002

[137] J B Hay and R H Baayen ldquoPhonotactics parsing and produc-tivityrdquo Italian Journal of Linguistics vol 15 no 1 pp 99ndash1302003

[138] J Hay J Pierrehumbert and M Beckman ldquoSpeech perceptionwell-formedness and the statistics of the lexiconrdquo Papers inlaboratory phonology VI pp 58ndash74 2004

[139] M Ferro C Marzi and V Pirrelli ldquoA self-organizing modelof word storage and processing implications for morphologylearningrdquo Lingue e Linguaggio vol 10 no 2 pp 209ndash226 2011

[140] F ChersiM Ferro G Pezzulo andV Pirrelli ldquoTopological self-organization and prediction learning support both action andlexical chains in the brainrdquo Topics in Cognitive Science vol 6no 3 pp 476ndash491 2014

[141] V Pirrelli M Ferro and C Marzi ldquoComputational complexityof abstractive morphologyrdquo in Understanding and MeasuringMorphological Complexity M Bearman D Brown and G GCorbett Eds pp 141ndash166 Oxford University Press 2015

[142] C Marzi M Ferro and V Pirrelli ldquoIs inflectional irregularitydysfunctional to human processingrdquo in Abstract Booklet TheMental Lexicon 2018 V Kuperman Ed University of AlbertaEdmonton 2018

[143] E Keuleers M Stevens P Mandera and M Brysbaert ldquoWordknowledge in the crowd Measuring vocabulary size and wordprevalence in a massive online experimentrdquo The QuarterlyJournal of Experimental Psychology vol 68 no 8 pp 1665ndash16922015

[144] M Ernestus Voice Assimilation And Segment Reduction inCasual Dutch A Corpus-Based Study of the Phonology-PhoneticsInterface LOT Utrecht the Netherlands 2000

[145] R Kemps M Ernestus R Schreuder and H Baayen ldquoProcess-ing reducedword formsThe suffix restoration effectrdquoBrain andLanguage vol 90 no 1-3 pp 117ndash127 2004

[146] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) vol 1 pp 886ndash893 June 2005

[147] G Hickok ldquoThe architecture of speech production and the roleof the phoneme in speech processingrdquo Language Cognition andNeuroscience vol 29 no 1 pp 2ndash20 2014

[148] B V Tucker M Sims and R H Baayen Opposing Forceson Acoustic Duration Manuscript University of Alberta andUniversity of Tubingen 2018

Complexity 39

[149] I Plag J Homann and G Kunter ldquoHomophony and mor-phology The acoustics of word-final S in Englishrdquo Journal ofLinguistics vol 53 no 1 pp 181ndash216 2017

[150] F Tomaschek I Plag M Ernestus and R H BaayenModeling the Duration of Word-Final s in English withNaive Discriminative Learning Manuscript University ofSiegenTubingenNijmegen 2018

[151] R E Kalman ldquoA new approach to linear filtering and predictionproblemsrdquo Journal of Fluids Engineering vol 82 no 1 pp 35ndash451960

[152] B Russell ldquoOn denotingrdquo Mind vol 14 no 56 pp 479ndash4931905

[153] N Hornstein Logical Form FromGB toMinimalism Blackwell1995

[154] R Jackendoff Semantic StructuresMITPress Cambridge 1990[155] J J McCarthy ldquoA prosodic theory of non-concatenative mor-

phologyrdquo Linguistic Inquiry vol 12 pp 373ndash418 1981[156] A Ussishkin ldquoA Fixed Prosodic Theory of Nonconcatenative

Templaticmorphologyrdquo Natural Language amp Linguistic Theoryvol 23 no 1 pp 169ndash218 2005

[157] A Ussishkin ldquoAffix-favored contrast inequity and psycholin-guistic grounding for non-concatenative morphologyrdquo Mor-phology vol 16 no 1 pp 107ndash125 2006

[158] RW Young andWMorganTheNavajo Language A Grammarand Colloquial Dictionary University of New Mexico Press1980

[159] S C Levinson and A Majid ldquoThe island of time Yelı Dnye thelanguage of Rossel islandrdquo Frontiers in psychology vol 4 2013

[160] S C Levinson ldquoThe language of space in Yelı Dnyerdquo inGrammars of Space Explorations in Cognitive Diversity pp 157ndash203 Cambridge University Press 2006

[161] E Shafaei-Bajestan and R H Baayen ldquoWide Learning forAuditory Comprehensionrdquo in Proceedings of the Interspeech2018 2018

[162] J Hay and K Drager ldquoStuffed toys and speech perceptionrdquoLinguistics vol 48 no 4 pp 865ndash892 2010

[163] V Henri and C Henri ldquoOn our earliest recollections ofchildhoodrdquo Psychological Review vol 2 pp 215-216 1895

[164] S Freud ldquoChildhood and concealing memoriesrdquo in The BasicWritings of Sigmund Freud The Modern Library NY USA19051953

[165] P J Bauer and M Larkina ldquoThe onset of childhood amnesiain childhood A prospective investigation of the course anddeterminants of forgetting of early-life eventsrdquoMemory vol 22no 8 pp 907ndash924 2014

[166] M S Seidenberg and L M Gonnerman ldquoExplaining deriva-tional morphology as the convergence of codesrdquo Trends inCognitive Sciences vol 4 no 9 pp 353ndash361 2000

[167] M Coltheart K Rastle C Perry R Langdon and J ZieglerldquoDRC a dual route cascaded model of visual word recognitionand reading aloudrdquoPsychological Review vol 108 no 1 pp 204ndash256 2001

[168] M V Butz and E F Kutter How the Mind Comes into BeingIntroducing Cognitive Science from a Functional and Computa-tional Perspective Oxford University Press 2016

[169] M W Harm and M S Seidenberg ldquoPhonology readingacquisition and dyslexia Insights from connectionist modelsrdquoPsychological Review vol 106 no 3 pp 491ndash528 1999

[170] L M Gonnerman M S Seidenberg and E S AndersenldquoGraded semantic and phonological similarity effects in prim-ing Evidence for a distributed connectionist approach to

morphologyrdquo Journal of Experimental Psychology General vol136 no 2 pp 323ndash345 2007

[171] H Zhao O Gallo I Frosio and J Kautz ldquoLoss Functions forImage Restoration With Neural Networksrdquo IEEE Transactionson Computational Imaging vol 3 no 1 pp 47ndash57 2017

[172] C Clarke and P Luce ldquoPerceptual adaptation to speakercharacteristics Vot boundaries in stop voicing categorizationrdquoin Proceedings of the ISCA Workshop on Plasticity in SpeechPerception 2005

[173] DNorris JMMcQueen andACutler ldquoPerceptual learning inspeechrdquo Cognitive Psychology vol 47 no 2 pp 204ndash238 2003

[174] A Schweitzer and N Lewandowski ldquoSocial factors in conver-gence of f1 and f2 in spontaneous speechrdquo in Proceedings ofthe 10th International Seminar on Speech Production CologneGermany 2014

[175] M J Pickering and S Garrod ldquoToward a mechanistic psychol-ogy of dialoguerdquo Behavioral and Brain Sciences vol 27 no 2 pp169ndash190 2004

[176] C M Clarke and M F Garrett ldquoRapid adaptation to foreign-accented Englishrdquo The Journal of the Acoustical Society ofAmerica vol 116 no 6 pp 3647ndash3658 2004

[177] W J Levelt ldquoDie konnektionistische moderdquo Sprache UndKognition vol 10 no 2 pp 61ndash72 1991

[178] A Kostic ldquoInformational load constraints on processinginflected morphologyrdquo in Morphological Aspects of LanguageProcessing L B Feldman Ed Lawrence Erlbaum Inc Publish-ers New Jersey USA 1995

[179] J P BlevinsWord andParadigmMorphology OxfordUniversityPress 2016

[180] E Forstemann Uber Deutsche volksetymologie Zeitschrift furvergleichende Sprachforschung auf dem Gebiete des DeutschenGriechischen und Lateinischen 1852

[181] K Nault Morphological Therapy Protocal [Ph D thesis] Uni-versity of Alberta Edmonton 2010

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 26: The Discriminative Lexicon: A Unified Computational Model ...downloads.hindawi.com/journals/complexity/2019/4895891.pdf · as developed within computational linguistics and com-puter

26 Complexity

lexomes which makes them productive Novel forms do notpartake in the idiosyncracies of lexicalized existing wordsbut instead remain bound to base and derivational lexomesIt is exactly this property that makes them pronounceableOnce produced a novel form will then gravitate as experi-ence accumulates towards the cloud of semantic vectors ofits morphological category meanwhile also developing itsown idiosyncracies in pronunciation including sometimeshighly reduced pronunciation variants (eg tyk for Dutchnatuurlijk ([natyrlk]) [111 144 145] It is perhaps possibleto merge the two routes into one but for this much morefine-grained semantic vectors are required that incorporateinformation about discourse and the speakerrsquos stance withrespect to the adressee (see eg [115] for detailed discussionof such factors)

6 Bringing in Time

Thus far we have not considered the role of time The audiosignal comes in over time longerwords are typically readwithmore than one fixation and likewise articulation is a temporalprocess In this section we briefly outline how time can bebrought into the model We do so by discussing the readingof proper names

Proper names (morphologically compounds) pose achallenge to compositional theories as the people referredto by names such as Richard Dawkins Richard NixonRichardThompson and SandraThompson are not obviouslysemantic composites of each other We therefore assignlexomes to names irrespective of whether the individu-als referred to are real or fictive alive or dead Further-more we assume that when the personal name and familyname receive their own fixations both names are under-stood as pertaining to the same named entity which istherefore coupled with its own unique lexome Thus forthe example sentences in Table 8 the letter trigrams ofJohn are paired with the lexome JohnClark and like-wise the letter trigrams of Clark are paired with this lex-ome

By way of illustration we obtained thematrix of semanticvectors training on 100 randomly ordered tokens of thesentences of Table 8 We then constructed a trigram cuematrix C specifying for each word (John Clark wrote )which trigrams it contains In parallel a matrix L specify-ing for each word its corresponding lexomes (JohnClarkJohnClark write ) was set up We then calculatedthe matrix F by solving CF = L and used F to cal-culate estimated (predicted) semantic vectors L Figure 10presents the correlations of the estimated semantic vectorsfor the word forms John John Welsch Janet and Clarkwith the targeted semantic vectors for the named entitiesJohnClark JohnWelsch JohnWiggam AnneHastieand JaneClark For the sequentially read words John andWelsch the semantic vector generated by John and thatgenerated by Welsch were summed to obtain the integratedsemantic vector for the composite name

Figure 10 upper left panel illustrates that upon readingthe personal name John there is considerable uncertainty

about which named entity is at issue When subsequentlythe family name is read (upper right panel) uncertainty isreduced and John Welsch now receives full support whereasother named entities have correlations close to zero or evennegative correlations As in the small world of the presenttoy example Janet is a unique personal name there is nouncertainty about what named entity is at issue when Janetis read For the family name Clark on its own by contrastJohn Clark and Janet Clark are both viable options

For the present example we assumed that every ortho-graphic word received one fixation However depending onwhether the eye lands far enough into the word namessuch as John Clark and compounds such as graph theorycan also be read with a single fixation For single-fixationreading learning events should comprise the joint trigrams ofthe two orthographic words which are then simultaneouslycalibrated against the targeted lexome (JohnClark graph-theory)

Obviously the true complexities of reading such asparafoveal preview are not captured by this example Never-theless as a rough first approximation this example showsa way forward for integrating time into the discriminativelexicon

7 General Discussion

We have presented a unified discrimination-driven modelfor visual and auditory word comprehension and wordproduction the discriminative lexicon The layout of thismodel is visualized in Figure 11 Input (cochlea and retina)and output (typing and speaking) systems are presented inlight gray the representations of themodel are shown in darkgray For auditory comprehension we make use of frequencyband summary (FBS) features low-level cues that we deriveautomatically from the speech signal The FBS features serveas input to the auditory network F119886 that maps vectors of FBSfeatures onto the semantic vectors of S For reading lettertrigrams represent the visual input at a level of granularitythat is optimal for a functional model of the lexicon (see [97]for a discussion of lower-level visual features) Trigrams arerelatively high-level features compared to the FBS featuresFor features implementing visual input at a much lower levelof visual granularity using histograms of oriented gradients[146] see Linke et al [15] Letter trigrams are mapped by thevisual network F119900 onto S

For reading we also implemented a network heredenoted by K119886 that maps trigrams onto auditory targets(auditory verbal images) represented by triphones Theseauditory triphones in turn are mapped by network H119886onto the semantic vectors S This network is motivated notonly by our observation that for predicting visual lexicaldecision latencies triphones outperform trigrams by a widemargin Network H119886 is also part of the control loop forspeech production see Hickok [147] for discussion of thiscontrol loop and the necessity of auditory targets in speechproduction Furthermore the acoustic durations with whichEnglish speakers realize the stem vowels of English verbs[148] as well as the duration of word final s in English [149]

Complexity 27

Table 8 Example sentences for the reading of proper names To keep the example simple inflectional lexomes are not taken into account

John Clark wrote great books about antsJohn Clark published a great book about dangerous antsJohn Welsch makes great photographs of airplanesJohn Welsch has a collection of great photographs of airplanes landingJohn Wiggam will build great harpsichordsJohn Wiggam will be a great expert in tuning harpsichordsAnne Hastie has taught statistics to great second year studentsAnne Hastie has taught probability to great first year studentsJanet Clark teaches graph theory to second year studentsJohnClark write great book about antJohnClark publish a great book about dangerous antJohnWelsch make great photograph of airplaneJohnWelsch have a collection of great photograph airplane landingJohnWiggam future build great harpsichordJohnWiggam future be a great expert in tuning harpsichordAnneHastie have teach statistics to great second year studentAnneHastie have teach probability to great first year studentJanetClark teach graphtheory to second year student

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

John

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

John Welsch

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

Janet

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

Clark

Pear

son

corr

elat

ion

minus0

50

00

51

0

Figure 10 Correlations of estimated semantic vectors for John JohnWelsch Janet andClarkwith the targeted semantic vectors of JohnClarkJohnWelsch JohnWiggam AnneHastie and JaneClark

can be predicted from ndl networks using auditory targets todiscriminate between lexomes [150]

Speech production is modeled by network G119886 whichmaps semantic vectors onto auditory targets which in turnserve as input for articulation Although current modelsof speech production assemble articulatory targets fromphonemes (see eg [126 147]) a possibility that is certainlycompatible with our model when phones are recontextual-ized as triphones we think that it is worthwhile investigatingwhether a network mapping semantic vectors onto vectors ofarticulatory parameters that in turn drive the articulators canbe made to work especially since many reduced forms arenot straightforwardly derivable from the phone sequences oftheir lsquocanonicalrsquo forms [111 144]

We have shown that the networks of our model havereasonable to high production and recognition accuraciesand also generate a wealth of well-supported predictions forlexical processing Central to all networks is the hypothesisthat the relation between form and meaning can be modeleddiscriminatively with large but otherwise surprisingly simplelinear networks the underlyingmathematics of which (linearalgebra) is well-understood Given the present results it isclear that the potential of linear algebra for understandingthe lexicon and lexical processing has thus far been severelyunderestimated (the model as outlined in Figure 11 is amodular one for reasons of computational convenience andinterpretabilityThe different networks probably interact (seefor some evidence [47]) One such interaction is explored

28 Complexity

typing speaking

orthographic targets

trigrams

auditory targets

triphones

semantic vectors

TASA

auditory cues

FBS features

orthographic cues

trigrams

cochlea retina

To Ta

Ha

GoGa

KaS

Fa Fo

CoCa

Figure 11 Overview of the discriminative lexicon Input and output systems are presented in light gray the representations of the model areshown in dark gray Each of these representations is a matrix with its own set of features Arcs are labeled with the discriminative networks(transformationmatrices) thatmapmatrices onto each otherThe arc from semantic vectors to orthographic vectors is in gray as thismappingis not explored in the present study

in Baayen et al [85] In their production model for Latininflection feedback from the projected triphone paths backto the semantics (synthesis by analysis) is allowed so thatof otherwise equally well-supported paths the path that bestexpresses the targeted semantics is selected For informationflows between systems in speech production see Hickok[147]) In what follows we touch upon a series of issues thatare relevant for evaluating our model

Incremental learning In the present study we esti-mated the weights on the connections of these networkswith matrix operations but importantly these weights canalso be estimated incrementally using the learning rule ofWidrow and Hoff [44] further improvements in accuracyare expected when using the Kalman filter [151] As allmatrices can be updated incrementally (and this holds aswell for the matrix with semantic vectors which are alsotime-variant) and in theory should be updated incrementally

whenever information about the order of learning events isavailable the present theory has potential for studying lexicalacquisition and the continuing development of the lexiconover the lifetime [79]

Morphology without compositional operations Wehave shown that our networks are predictive for a rangeof experimental findings Important from the perspective ofdiscriminative linguistics is that there are no compositionaloperations in the sense of Frege and Russell [1 2 152] Thework on compositionality in logic has deeply influencedformal linguistics (eg [3 153]) and has led to the beliefthat the ldquoarchitecture of the language facultyrdquo is groundedin a homomorphism between a calculus (or algebra) ofsyntactic representations and a calculus (or algebra) based onsemantic primitives Within this tradition compositionalityarises when rules combining representations of form arematched with rules combining representations of meaning

Complexity 29

The approach of Marelli and Baroni [60] who derive thesemantic vector of a complex word from the semantic vectorof its base and a dedicated affix-specific mapping from thepertinent base words to the corresponding derived wordsis in the spirit of compositionality in that a change in form(through affixation) goes hand in hand with a change inmeaning (modeled with a linear mapping in semantic space)

The perspective on morphological processing developedin the present study breaks with this tradition Althoughsemantic vectors for inflected words are obtained by summa-tion of the semantic vectors of the base word and those ofits inflectional functions (see [85] for modeling of the richverbal paradigms of Latin) the form vectors correspondingto the resulting semantic vectors are not obtained by thesummation of vectors for chunks of form nor by any otheroperations on forms Attempts to align chunks of wordsrsquo formwith their semantic structures mdash typically represented bygraphs constructed over semantic primitives (see eg [154])mdash are destined to end in quagmires of arbitrary decisionsabout the number of inflectional classes required (Estonian30 or 300) what the chunks should look like (does Germanhave a morpheme d in its articles and demonstratives derdie das diese ) whether Semitic morphology requiresoperations on form on one or two tiers [155ndash157] andhow many affixal position slots are required for Athebascanlanguages such asNavajo [158] A language that is particularlypainful for compositional theories is Yelı Dnye an Indo-Pacific language spoken on Rossel island (in the south-east ofPapuaNewGuinea)The language has a substantial inventoryof over a 1000 function words used for verb agreementTypically these words are monosyllables that simultaneouslyspecify values for negation tense aspect person and numberof subject deixis evidentiality associated motion and coun-terfactuality Furthermore verbs typically have no less thaneight different stems the use of which depends on particularconstellations of inflectional functions [159 160] Yelı Dnyethus provides a striking counterexample to the hypothesisthat a calculus of meaning would be paralleled by a calculusof form

Whereas the theory of linear discriminative learningradically rejects the idea that morphological processing isgrounded in compositionality as defined in mainstream for-mal linguistic theory it does allow for the conjoint expressionin form of lexical meanings and inflectional functions andfor forms to map onto such conjoint semantic vectors Butjust asmultiple inflectional functions can be expressed jointlyin a single word multiple words can map onto nonconjointsemantic vectors as was illustrated above for named entityrecognition John Wiggam can be understood to refer to avery specific person without this person being a composi-tional function of the proper name John and the surnameWiggam In both these cases there is no isomorphy betweeninflectional functions and lexical meanings on the one handand chunks of form on the other hand

Of course conjoint vectors for lexical meanings andinflectional functions can be implemented andmade to workonly because a Latin verb form such as sapıvissemus isunderstood when building the semantic space S to expresslexomes for person (first) number (plural) voice (active)

tense (pluperfect) mood (subjunctive) and lexical meaning(to know) (see [85] for computational modeling details)Although this could loosely be described as a decomposi-tional approach to inflected words we prefer not to use theterm lsquodecompositionalrsquo as in formal linguistics this termas outlined above has a very different meaning A moreadequate terminology would be that the system is conjointrealizational for production and conjoint inferential for com-prehension (we note here that we have focused on temporallyconjoint realization and inference leaving temporally disjointrealization and inference for sequences of words and possiblycompounds and fixed expression for further research)

Naive versus linear discriminative learning There isone important technical difference between the learningengine of the current study ldl and naive discriminativelearning (ndl) Because ndl makes the simplifying assump-tion that outcomes are orthogonal the weights from the cuesto a given outcome are independent of the weights from thecues to the other outcomes This independence assumptionwhich motivates the word naive in naive discriminativelearning makes it possible to very efficiently update networkweights during incremental learning In ldl by contrast thisindependence no longer holds learning is no longer naiveWe therefore refer to our networks not as ndl networksbut simply as linear discriminative learning networks Actualincremental updating of ldl networks is computationallymore costly but further numerical optimization is possibleand implementations are underway

Scaling An important property of the present approachis that good results are obtained already for datasets ofmodestsize Our semantic matrix is derived from the tasa corpuswhich with only 10 million words is dwarfed by the 28 billionwords of the ukWaC corpus used by Marelli and Baroni [60]more words than any individual speaker can ever encounterin their lifetime Similarly Arnold et al [18] report goodresults for word identification when models are trained on20 hours of speech which contrasts with the huge volumesof speech required for training deep learning networks forspeech recognition Although performance increases some-whatwithmore training data [161] it is clear that considerableheadway can be made with relatively moderate volumesof speech It thus becomes feasible to train models onfor instance Australian English and New Zealand Englishusing already existing speech corpora and to implementnetworks that make precise quantitative predictions for howunderstanding is mediated by listenersrsquo expectations abouttheir interlocutors (see eg [162])

In this study we have obtained good results with asemantic matrix with a dimensionality of around 4000lexomes By itself we find it surprising that a semantic vectorspace can be scaffolded with a mere 4000 words But thisfindingmay also shed light on the phenomenon of childhoodamnesia [163 164] Bauer and Larkina [165] pointed outthat young children have autobiographical memories thatthey may not remember a few years later They argue thatthe rate of forgetting diminishes as we grow older andstabilizes in adulthood Part of this process of forgettingmay relate to the changes in the semantic matrix Cruciallywe expect the correlational structure of lexomes to change

30 Complexity

considerably over time as experience accumulates If this isindeed the case the semantic vectors for the lexomes foryoung children are different from the corresponding lexomesfor older children which will again differ from those ofadults As a consequence autobiographical memories thatwere anchored to the lexomes at a young age can no longerbe accessed as the semantic vectors to which these memorieswere originally anchored no longer exist

No examplars Given how we estimate the transforma-tion matrices with which we navigate between form andmeaning it might seem that our approach is in essence animplementation of an exemplar-based theory of the mentallexicon After all for instance the C119886 matrix specifies foreach auditory word form (exemplar) its pertinent frequencyband summary features However the computational imple-mentation of the present study should not be confused withthe underlying algorithmic conceptualization The underly-ing conceptualization is that learning proceeds incrementallytrial by trial with the weights of transformation networksbeing updated with the learning rule of Widrow and Hoff[44] In the mean trial-by-trial updating with the Widrow-Hoff learning rule results in the same expected connectionweights as obtained by the present estimates using thegeneralized inverse The important conceptual advantage ofincremental learning is however that there is no need to storeexemplars By way of example for auditory comprehensionacoustic tokens are given with the input to the learningsystemThese tokens the ephemeral result of interactingwiththe environment leave their traces in the transformationmatrix F119886 but are not themselves stored The same holdsfor speech production We assume that speakers dynamicallyconstruct the semantic vectors from their past experiencesrather than retrieving these semantic vectors from somedictionary-like fixed representation familiar from currentfile systems The dynamic nature of these semantic vectorsemerged loud and clear from the modeling of novel complexwords for speech production under cross-validation In otherwords our hypothesis is that all input and output vectors areephemeral in the sense that they represent temporary statesof the cognitive system that are not themselves represented inthat system but that leave their traces in the transformationmatrices that jointly characterize and define the cognitivesystem that we approximate with the discriminative lexicon

This dynamic perspective on the mental lexicon asconsisting of several coupled networks that jointly bringthe cognitive system into states that in turn feed into fur-ther networks (not modeled here) for sentence integration(comprehension) and articulation (production) raises thequestion about the status of units in this theory Units playan important role when constructing numeric vectors forform and meaning Units for letter trigrams and phonetrigrams are of course context-enriched letter and phoneunits Content lexomes as well as lexomes for derivationaland inflectional functions are crutches for central semanticfunctions ranging from functions realizing relatively concreteonomasiological functions (lsquothis is a dog not a catrsquo) toabstract situational functions allowing entities events andstates to be specified on the dimensions of time spacequantity aspect etc However crucial to the central thrust of

the present study there are no morphological symbols (unitscombining form andmeaning) normorphological formunitssuch as stems and exponents of any kind in the modelAbove we have outlined the many linguistic considerationsthat have led us not to want to incorporate such units aspart of our theory Importantly as there are no hidden layersin our model it is not possible to seek for lsquosubsymbolicrsquoreflexes of morphological units in patterns of activation overhidden layers Here we depart from earlier connectionistmodels which rejected symbolic representations but retainedthe belief that there should be morphological representationalbeit subsymbolic ones Seidenberg and Gonnerman [166]for instance discuss morphological representations as beinglsquointerlevel representationsrsquo that are emergent reflections ofcorrelations among orthography phonology and semanticsThis is not to say that themore agglutinative a language is themore the present model will seem to the outside observer tobe operating withmorphemes But themore a language tendstowards fusional or polysyntheticmorphology the less strongthis impression will be

Model complexity Next consider the question of howdifficult lexical processing actually is Sidestepping the com-plexities of the neural embedding of lexical processing [126147] here we narrow down this question to algorithmiccomplexity State-of-the-art models in psychology [37 114167] implementmany layers of hierarchically organized unitsand many hold it for an established fact that such units arein some sense psychologically real (see eg [19 91 168])However empirical results can be equally well understoodwithout requiring the theoretical construct of the morpheme(see eg [47 54 58 166 169 170]) The present studyillustrates this point for lsquoboundaryrsquo effects in written andoral production When paths in the triphone graph branchuncertainty increases and processing slows down Althoughobserved lsquoboundaryrsquo effects are compatible with a post-Bloomfieldian lexicon the very same effects arise in ourmodel even though morphemes do not figure in any way inour computations

Importantly the simplicity of the linear mappings in ourmodel of the discriminative lexicon allows us to sidestep theproblem that typically arise in computationally implementedfull-scale hierarchical systems namely that errors arisingat lower levels propagate to higher levels The major stepsforward made in recent end-to-end models in languageengineering suggest that end-to-end cognitive models arealso likely to perform much better One might argue thatthe hidden layers of deep learning networks represent thetraditional lsquohidden layersrsquo of phonemes morphemes andword forms mediating between form and meaning in lin-guistic models and offshoots thereof in psychology Howeverthe case of baboon lexical learning [12 13] illustrates thatthis research strategy is not without risk and that simplerdiscriminative networks can substantially outperform deeplearning networks [15] We grant however that it is conceiv-able that model predictions will improve when the presentlinear networks are replaced by deep learning networks whenpresented with the same input and output representationsused here (see Zhao et al [171] for a discussion of lossfunctions for deep learning targeting numeric instead of

Complexity 31

categorical output vectors) What is unclear is whethersuch networks will improve our understandingmdashthe currentlinear networks offer the analyst connection weights betweeninput and output representations that are straightforwardlylinguistically interpretable Where we think deep learningnetworks really will come into their own is bridging the gapbetween for instance the cochlea and retina on the one handand the heuristic representations (letter trigrams) that wehave used as a starting point for our functional theory of themappings between form and meaning

Network flexibility An important challenge for deeplearning networks designed to model human lexical process-ing is to keep the networks flexible and open to rapid updat-ing This openness to continuous learning is required notonly by findings on categorization [40 41 41 42] includingphonetic categorization [172 173] but is also visible in timeseries of reaction times Baayen and Hendrix [17] reportedimproved prediction of visual lexical decision reaction timesin the British Lexicon Project when a naive discriminativelearning network is updated from trial to trial simulating thewithin-experiment learning that goes on as subjects proceedthrough the experiment In natural speech the phenomenonof phonetic convergence between interlocutors [174] likewisebears witness to a lexicon that is constantly recalibratedin interaction with its environment (for the many ways inwhich interlocutorsrsquo speech aligns see [175]) Likewise therapidity with which listeners adapt to foreign accents [176]within a few minutes shows that lexical networks exhibitfast local optimization As deep learning networks typicallyrequire huge numbers of cycles through the training dataonce trained they are at risk of not having the requiredflexibility for fast local adaptation This risk is reduced for(incremental) wide learning as illustrated by the phoneticconvergence evidenced by the model of Arnold et al [18] forauditory comprehension (a very similar point was made byLevelt [177] in his critique of connectionist models in thenineties)

Open questions This initial study necessarily leavesmany questions unanswered Compounds and inflectedderived words have not been addressed and morphologicaloperations such as reduplication infixation and nonconcate-native morphology provide further testing grounds for thepresent approach We also need to address the processingof compounds with multiple constituents ubiquitous inlanguages as different as German Mandarin and EstonianFurthermore in actual utterances words undergo assimila-tion at their boundaries and hence a crucial test case forthe discriminative lexicon is to model the production andcomprehension of words in multi-word utterances For ourproduction model we have also completely ignored stresssyllabification and pitch which however likely has renderedthe production task more difficult than necessary

A challenge for discriminative morphology is the devel-opment of proper semantic matrices For languages withrich morphology such as Estonian current morphologicalparsers will identify case-inflected words as nominativesgenitives or partitives (among others) but these labelsfor inflectional forms do not correspond to the semanticfunctions encoded in these forms (see eg [178 179]) What

is required are computational tools that detect the appropriateinflectional lexomes for these words in the sentences orutterances in which they are embedded

A related problem specifically for the speech productionmodel is how to predict the strongly reduced word formsthat are rampant in conversational speech [111] Our auditorycomprehension network F119886 is confronted with reduced formsin training and the study of Arnold et al [18] indicates thatpresent identification accuracymay approximate human per-formance for single-word recognition Whereas for auditorycomprehension we are making progress what is required forspeech production is a much more detailed understanding ofthe circumstances under which speakers actually use reducedword forms The factors driving reduction may be in partcaptured by the semantic vectors but we anticipate thataspects of the unfolding discourse play an important role aswell which brings us back to the unresolved question of howto best model not isolated words but words in context

An important challenge for our theory which formalizesincremental implicit learning is how to address and modelthe wealth of explicit knowledge that speakers have abouttheir language We play with the way words sound in poetrywe are perplexed by wordsrsquo meanings and reinterpret themso that they make more sense (eg the folk etymology [180]reflected in modern English crawfish a crustacean and nota fish originally Middle English crevis compare ecrevissein French) we teach morphology in schools and we evenfind that making morphological relations explicit may bebeneficial for aphasic patients [181] The rich culture of worduse cannot but interact with the implicit system but howexactly this interaction unfolds andwhat its consequences areat both sides of the consciousunconscious divide is presentlyunclear

Although many questions remain our model is remark-ably successful in capturing many aspects of lexical andmorphological processing in both comprehension and pro-duction Since themodel builds on a combination of linguisti-callymotivated lsquosmartrsquo low-level representations for form andmeaning and large but otherwise straightforward andmathe-matically well-understood two-layer linear networks (to beclear networks without any hidden layers) the conclusionseems justified that algorithmically the ldquomental lexiconrdquomaybe much simpler than previously thought

Appendix

A Vectors Matrices and MatrixMultiplication

Figure 12 presents two data points 119886 and 119887 with coordinates(3 4) and (-2 -3) Arrows drawn from the origin to thesepoints are shown in blue with a solid and dashed linerespectively We refer to these points as vectors Vectors aredenoted with lower case letters in bold and we place the 119909coordinate of a point next to the 119910 coordinate

a = (3 4) b = (minus2 minus3) (A1)

32 Complexity

minus4 minus2 0 2 4

minus4

minus2

02

4

x

y

a

b

x

y

Figure 12 Points 119886 and 119887 can be transformed into points 119901 and 119902 bya linear transformation

We can represent a and b jointly by placing them next to eachother in a matrix For matrices we use capital letters in boldfont

A = ( 3 4minus2 minus3) (A2)

In addition to the points 119886 and 119887 Figure 12 also shows twoother datapoints 119909 and 119910 The matrix for these two points is

B = (minus5 24 minus1) (A3)

Let us assume that points 119886 and 119887 represent the forms oftwo words 1205961 and 1205962 and that the points 119909 and 119910 representthe meanings of these two words (below we discuss in detailhow exactly wordsrsquo forms and meanings can be representedas vectors of numbers) As morphology is the study of therelation between wordsrsquo forms and their meanings we areinterested in how to transform points 119886 and b into points119909 and 119910 and likewise in how to transform points 119909 and119910 back into points 119886 and 119887 The first transformation isrequired for comprehension and the second transformationfor production

We begin with the transformation for comprehensionFormally we have matrix A which we want to transforminto matrix B A way of mapping A onto B is by means ofa linear transformation using matrix multiplication For 2 times2 matrices matrix multiplication is defined as follows(1198861 11988621198871 1198872) (1199091 11990921199101 1199102) = (11988611199091 + 11988621199101 11988611199092 + 1198862119910211988711199091 + 11988721199101 11988711199092 + 11988721199102) (A4)

Such a mapping of A onto B is given by a matrix F

F = ( 1 2minus2 minus1) (A5)

and it is straightforward to verify that indeed( 3 4minus2 minus3) ( 1 2minus2 minus1) = (minus5 24 minus1) (A6)

Using matrix notation we can write

AF = B (A7)

Given A and B how do we obtain F The answeris straightforward but we need two further concepts theidentity matrix and the matrix inverse For multiplication ofnumbers the identity multiplication is multiplication with 1For matrices multiplication with the identity matrix I leavesthe locations of the points unchangedThe identitymatrix hasones on the main diagonal and zeros elsewhere It is easilyverified that indeed(1 00 1) (1199091 11990921199101 1199102) = (1199091 11990921199101 1199102) (1 00 1) = (1199091 11990921199101 1199102) (A8)

For numbers the inverse of multiplication with 119904 is dividingby 119904 (1119904 ) (119904119909) = 119909 (A9)

For matrices the inverse of a square matrix X is that matrixXminus1 such that their product is the identity matrix

Xminus1X = XXminus1 = I (A10)

For nonsquare matrices the inverse Yminus1 is defined such that

Yminus1 (YX) = (XY) 119884minus1 = X (A11)

We find the square matrix F that maps the square matrices Aonto B as follows

AF = B

Aminus1AF = Aminus1B

IF = Aminus1B

F = Aminus1B(A12)

Since for the present example A and its inverse happen to beidentical (Aminus1 = A) we obtain (A6)

Fmaps wordsrsquo form vectors onto wordsrsquo semantic vectorsLet us now consider the reverse and see how we can obtainwordsrsquo form vectors from wordsrsquo semantic vectors That isgiven B we want to find that matrix G which maps B ontoA ie

BG = A (A13)

Solving this equation proceeds exactly as above

BG = A

Bminus1BG = Bminus1A

IG = Bminus1A

G = Bminus1A(A14)

Complexity 33

The inverse of B is

Bminus1 = (13 2343 53) (A15)

and hence we have

(13 2343 53) ( 3 4minus2 minus3) = (minus13 minus2323 13 ) = G (A16)

The inverse of a matrix needs not exist A square matrixthat does not have an inverse is referred to as a singularmatrix Almost all matrices with which we work in theremainder of this study are singular For singular matricesan approximation of the inverse can be used such as theMoore-Penrose generalized inverse (In R the generalizedinverse is available in theMASS package function ginv Thenumeric library that is used by ginv is LAPACK availableat httpwwwnetliborglapack) In this study wedenote the generalized inverse of a matrix X by X1015840

The examples presented here are restricted to 2 times 2matrices but themathematics generalize tomatrices of largerdimensions When multiplying two matrices X and Y theonly constraint is that the number of columns ofX is the sameas the number of rows ofYThe resultingmatrix has the samenumber of rows as X and the same number of columns as Y

In the present study the rows of matrices represent wordsand columns the features of thesewords For instance the rowvectors of A can represent wordsrsquo form features and the rowvectors of and B the corresponding semantic features Thefirst row vector of A is mapped onto the first row vector ofB as follows (3 4) ( 1 2minus2 minus1) = (minus5 2) (A17)

A transformationmatrix such as F can be represented as atwo-layer networkThe network corresponding to F is shownin Figure 13 When the input vector (3 4) is presented to thenetwork the nodes 1198941 and 1198942 are set to 3 and 4 respectively Toobtain the activations 1199001 = minus5 and 1199002 = 2 for the output vector(minus5 2) we cumulate the evidence at the input weighted bythe connection strengths specified on the edges of the graphThus for 1199001 we obtain the value 3 times 1 + 4 times minus2 = minus5 and for1199002 we have 3 times 2 + 4 times minus1 = 2 In other words the networkmaps the input vector (3 4) onto the output vector (minus5 2)

An important property of linear maps is that they areproductive We can present a novel form vector say

s = (2 2) (A18)

to the network and it will map this vector onto a novelsemantic vector Using matrices this new semantic vector isstraightforwardly calculated(2 2) ( 1 2minus2 minus1) = (minus2 2) (A19)

1 2 minus2 minus1

i1 i2

o1 o2

semantic vectors come out here

form features go in here

connection weightsare the transformationmatrix

Figure 13 The linear network corresponding to the transformationmatrix F There are two input nodes 1198941 and 1198942 (in blue)and twooutput nodes 1199001 and 1199002 in red When the input vector (3 4) ispresented to the network the value at 1199001 is 3 times 1 + 4 times minus2 = minus5and that at 1199002 is 3 times 2 + 4 times minus1 = 2 The network maps the point(3 4) onto the point (minus5 2)B Graph-Based Triphone Sequencing

The hypothesis underlying graph-based triphone sequencingis that in the directed graph that has triphones as verticesand an edge between any two vertices that overlap (iesegments 2 and 3 of triphone 1 are identical to segments 1and 2 of triphone 2) the path from a wordrsquos initial triphone(the triphone starting with ) to a wordrsquos final triphone (thetriphone ending with a ) is the path receiving the bestsupport of all possible paths from the initial to the finaltriphone Unfortunately the directed graph containing allvertices and edges is too large tomake computations tractablein a reasonable amount of computation time The followingalgorithm is a heuristic algorithm that first collects potentiallyrelevant vertices and edges and then calculates all pathsstarting fromany initial vertex using theall simple pathsfunction from the igraph package [127] As a first step allvertices that are supported by the stem and whole wordwith network support exceeding 01 are selected To theresulting set those vertices are added that are supported bythe affix which is conditional on these vertices having anetwork support exceeding 095 A directed graph can nowbe constructed and for each initial vertex all paths startingat these vertices are calculated The subset of those pathsreaching a final vertex is selected and the support for eachpath is calculated As longer paths trivially will have greatersupport the support for a path is weighted for its lengthsimply by dividing the raw support by the path length Pathsare ranked by weighted path support and the path withmaximal support is selected as the acoustic image drivingarticulation

It is possible that no path from an initial vertex to a finalvertex is found due to critical boundary triphones not beinginstantiated in the data on which the model was trainedThis happens when the model is assessed on novel inflected

34 Complexity

nobo

dyso

met

hing

ever

ythi

ngno

thin

gev

eryo

nean

ythi

ngan

yone

som

eone

hers

him his

via

than I

exce

ptou

tside

thei

rsw

hoev

erth

ence

min

ein

side

amon

gst its

thei

rou

rm

yyo

uyo

ur her

she

your

selv

esw

hate

ver us we it

they he

them m

em

yself

him

self

hers

elfou

rselv

esyo

urse

lfits

elfth

emse

lves

abou

tw

ithou

tw

ithin

beyo

ndam

ong

betw

een

thro

ugho

utdu

ring

since by

from in of

with at fo

r tobe

low

abov

e up offdo

wn

upon on

agai

nst

tow

ard

tow

ards

arou

ndne

aral

ong

unde

rbe

hind

besid

eth

roug

hon

toin

toov

erac

ross

nobodysomethingeverythingnothingeveryoneanythinganyonesomeonehershimhisviathanIexceptoutsidetheirswhoeverthencemineinsideamongstitstheirourmyyouyourhersheyourselveswhateverusweittheyhethemmemyselfhimselfherselfourselvesyourselfitselfthemselvesaboutwithoutwithinbeyondamongbetweenthroughoutduringsincebyfrominofwithatfortobelowaboveupoffdownupononagainsttowardtowardsaroundnearalongunderbehindbesidethroughontointooveracross

minus01 0 01 02Value

Color Keyand Histogram

050

015

0025

00C

ount

Figure 14 Heatmap for the correlations of the semantic vectors of pronouns and prepositions Note that quantifier pronouns form a groupof their own that prepositions and the other pronouns form distinguisable groups and that within groups further subclusters are visible egforwe and us she and her and across and over All diagonal elements represent correlations equal to 1 (this is not brought out by the heatmapwhich color-codes diagonal elements with the color for the highest correlation of the range we specified 025)

and derived words under cross-validationTherefore verticesand edges are added to the graph for all nonfinal verticesthat are at the end of any of the paths starting at any initialtriphone Such novel vertices and edges are assigned zeronetwork support Typically paths from the initial vertex tothe final vertex can now be constructed and path support isevaluated as above

For an algorithm that allows triphone paths to includecycles which may be the case in languages with much richer

morphology than English see Baayen et al [85] and for codethe R packageWpmWithLdl described therein

C A Heatmap for Function Words(Pronouns and Prepositions)

The heatmap for the correlations of the semantic vectors ofpronouns and prepositions is presented in Figure 14

Complexity 35

Data Availability

With the exception of the data from theDistributed Little RedHen Lab all primary data are publicly available from the citedsources

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The authors are indebted to Geoff Hollis Jessie Nixon ChrisWestbury and LuisMienhardt for their constructive feedbackon earlier versions of this manuscript This research wassupported by an ERC advancedGrant (no 742545) to the firstauthor

References

[1] G Frege ldquoBegriffsschrift a formula language modeled uponthat of arithmetic for pure thoughtrdquo in From Frege to GodelA Source Book in Mathematical Logic 1879-1931 pp 1ndash82 1967

[2] B RussellAn Inquiry intoMeaning andTruth Allen andUnwinLondon UK 1942

[3] R Montague ldquoThe proper treatment of quantification in ordi-nary englishrdquo in Approaches to Natural Language pp 221ndash242Springer 1973

[4] N Chomsky Syntactic Structures vol 33 Mouton The Hague1957

[5] W J M Levelt Formal Grammars in Linguistics And Psycholin-guistics Volume 1 An Introduction to the Theory of FormalLanguages and Automata Volume 2 Applications in LinguisticTheory Volume 3 Psycholinguistic Applications John BenjaminsPublishing 2008

[6] D F Kleinschmidt and T Florian Jaeger ldquoRobust speechperception Recognize the familiar generalize to the similar andadapt to the novelrdquo Psychological Review vol 122 no 2 pp 148ndash203 2015

[7] L Breiman ldquoRandom forestsrdquo Machine Learning vol 45 no 1pp 5ndash32 2001

[8] C Strobl A-L Boulesteix A Zeileis and T Hothorn ldquoBiasin random forest variable importance measures illustrationssources and a solutionrdquo BMC Bioinformatics vol 8 2007

[9] A Hannun C Case J Casper et al ldquoDeep speechScaling up end-to-end speech recognitionrdquo 2014httpsarxivorgabs14125567

[10] Z Tuske P Golik R Schluter and H Ney ldquoAcoustic modelingwith deep neural networks using raw time signal for lvcsrrdquo inProceedings of the INTERSPEECH pp 890ndash894 2014

[11] J Schmidhuber ldquoDeep learning in neural networks anoverviewrdquo Neural Networks vol 61 pp 85ndash117 2015

[12] T Hannagan J C Ziegler S Dufau J Fagot and J GraingerldquoDeep learning of orthographic representations in baboonsrdquoPLoS ONE vol 9 no 1 2014

[13] J Grainger S Dufau M Montant J C Ziegler and J FagotldquoOrthographic processing in baboons (Papio papio)rdquo Sciencevol 336 no 6078 pp 245ndash248 2012

[14] D Scarf K Boy A U Reinert J Devine O Gunturkun andM Colombo ldquoOrthographic processing in pigeons (columba

livia)rdquo Proceedings of the National Academy of Sciences vol 113no 40 pp 11272ndash11276 2016

[15] M Linke F Broker M Ramscar and R H Baayen ldquoArebaboons learning ldquoorthographicrdquo representations Probablynotrdquo PLoS ONE vol 12 no 8 2017

[16] D H Hubel and T N Wiesel ldquoReceptive fields binocularinteraction and functional architecture in the catrsquos visualcortexrdquo The Journal of Physiology vol 160 pp 106ndash154 1962

[17] R H Baayen and P Hendrix ldquoTwo-layer networks non-linear separation and human learningrdquo in From Semantics toDialectometry Festschrift in honor of JohnNerbonne Tributes 32M Wieling M Kroon G Van Noord and G Bouma Eds pp13ndash22 College Publications 2017

[18] D Arnold F Tomaschek K Sering F Lopez and R H BaayenldquoWords from spontaneous conversational speech can be rec-ognized with human-like accuracy by an error-driven learningalgorithm that discriminates between meanings straight fromsmart acoustic features bypassing the phoneme as recognitionunitrdquo PLoS ONE vol 12 no 4 2017

[19] P Zwitserlood ldquoProcessing and representation of morphologi-cal complexity in native language comprehension and produc-tionrdquo in The Construction of Words G E Booij Ed pp 583ndash602 Springer 2018

[20] G Booij ldquoConstruction morphologyrdquo Language and LinguisticsCompass vol 4 no 7 pp 543ndash555 2010

[21] J P Blevins ldquoWord-based morphologyrdquo Journal of Linguisticsvol 42 no 3 pp 531ndash573 2006

[22] G Stump Inflectional Morphology A Theory of ParadigmStructure Cambridge University Press 2001

[23] R Beard ldquoOn the extent and nature of irregularity in thelexiconrdquo Lingua vol 42 no 4 pp 305ndash341 1977

[24] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1974

[25] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1991

[26] C Hockett ldquoThe origin of speechrdquo Scientific American vol 203pp 89ndash97 1960

[27] M Erelt Estonian Language M Erelt Ed Estonian AcademyPublishers Tallinn Estonia 2003

[28] P H Matthews Grammatical Theory in the United States fromBloomfield to Chomsky Cambridge University Press Cam-bridge 1993

[29] N Chomsky andMHalleTheSound Pattern of English Harperand Row NY USA 1968

[30] T Landauer and S Dumais ldquoA solution to Platorsquos problem Thelatent semantic analysis theory of acquisition induction andrepresentation of knowledgerdquo Psychological Review vol 104 no2 pp 211ndash240 1997

[31] W Weaver ldquoTranslationrdquo in Machine Translation of LanguagesFourteen Essays W N Locke and A D Booth Eds pp 15ndash23MIT Press Cambridge 1955

[32] J R Firth Selected Papers of J R Firth 1952-59 IndianaUniversity Press 1968

[33] K Lund and C Burgess ldquoProducing high-dimensional seman-tic spaces from lexical co-occurrencerdquo Behavior Research Meth-ods Instruments and Computers vol 28 no 2 pp 203ndash2081996

[34] C Shaoul and C Westbury ldquoExploring lexical co-occurrencespace using HiDExrdquo Behavior Research Methods vol 42 no 2pp 393ndash413 2010

36 Complexity

[35] T Mikolov I Sutskever K Chen G Corrado and J DeanldquoDistributed representations of words and phrases and theircompositionalityrdquo inAdvances in Neural Information ProcessingSystems pp 3111ndash3119 2013

[36] A Roelofs ldquoThe WEAVER model of word-form encoding inspeech productionrdquo Cognition vol 64 no 3 pp 249ndash284 1997

[37] W J M Levelt A Roelofs and A S Meyer ldquoA theory of lexicalaccess in speech productionrdquoBehavioral and Brain Sciences vol22 no 1 pp 1ndash38 1999

[38] M Taft ldquoInteractive-activation as a Framework for Under-standing Morphological Processingrdquo Language and CognitiveProcesses vol 9 no 3 pp 271ndash294 1994

[39] R Schreuder and R H Baayen ldquoModeling morphologicalprocessingrdquo inMorphological Aspects of Language Processing LB Feldman Ed pp 131ndash154 Lawrence ErlbaumHillsdale NewJersey USA 1995

[40] B C Love D L Medin and T M Gureckis ldquoSUSTAIN ANetwork Model of Category Learningrdquo Psychological Reviewvol 111 no 2 pp 309ndash332 2004

[41] C J Marsolek ldquoWhat antipriming reveals about primingrdquoTrends in Cognitive Sciences vol 12 no 5 pp 176ndash181 2008

[42] M Ramscar and R Port ldquoCategorization (without categories)rdquoin Handbook of Cognitive Linguistics E Dabrowska and DDivjak Eds pp 75ndash99 De Gruyter Berlin Germany 2015

[43] R A Rescorla and A R Wagner ldquoA theory of Pavlovianconditioning Variations in the effectiveness of reinforcementand nonreinforcementrdquo in Classical Conditioning II CurrentResearch and Theory A H Black and W F Prokasy Eds pp64ndash99 Appleton Century Crofts NY USA 1972

[44] B Widrow and M E Hoff ldquoAdaptive switching circuitsrdquo 1960WESCON Convention Record Part IV pp 96ndash104 1960

[45] R H Baayen R Schreuder and R Sproat ldquoMorphology inthe Mental Lexicon A Computational Model for Visual WordRecognitionrdquo in Lexicon Development for Speech and LanguageProcessing F van Eynde and D Gibbon Eds pp 267ndash291Kluwer Academic Publishers 2000

[46] M Taft ldquoA morphological-decomposition model of lexicalrepresentationrdquo Linguistics vol 26 no 4 pp 657ndash667 1988

[47] MWHarmandM S Seidenberg ldquoComputing themeanings ofwords in reading Cooperative division of labor between visualand phonological processesrdquo Psychological Review vol 111 no3 pp 662ndash720 2004

[48] P C Trimmer J M McNamara A Houston and J A RMarshall ldquoDoes natural selection favour the Rescorla-Wagnerrulerdquo Journal of Theoretical Biology vol 302 pp 39ndash52 2012

[49] R R Miller R C Barnet and N J Grahame ldquoAssessment ofthe Rescorla-Wagner modelrdquo Psychological Bulletin vol 117 no3 pp 363ndash386 1995

[50] L J Kamin ldquoPredictability surprise attention and condition-ingrdquo in Punishment and Aversive Behavior B A Campbell andRMChurch Eds pp 276ndash296Appleton-Century-CroftsNYUSA 1969

[51] R A Rescorla ldquoPavlovian conditioning Itrsquos not what you thinkit isrdquo American Psychologist vol 43 no 3 pp 151ndash160 1988

[52] M Ramscar D Yarlett M Dye K Denny and K ThorpeldquoThe Effects of Feature-Label-Order and Their Implications forSymbolic Learningrdquo Cognitive Science vol 34 no 6 pp 909ndash957 2010

[53] M Ramscar and D Yarlett ldquoLinguistic self-correction in theabsence of feedback A new approach to the logical problem oflanguage acquisitionrdquo Cognitive Science vol 31 no 6 pp 927ndash960 2007

[54] R H Baayen P Milin D F ETHurđevic P Hendrix and MMarelli ldquoAn Amorphous Model for Morphological Process-ing in Visual Comprehension Based on Naive DiscriminativeLearningrdquo Psychological Review vol 118 no 3 pp 438ndash4812011

[55] F M Del Prado Martın R Bertram T Haikio R Schreuderand R H Baayen ldquoMorphological family size in a morphologi-cally rich languageThe case of Finnish compared to Dutch andHebrewrdquo Journal of Experimental Psychology LearningMemoryand Cognition vol 30 pp 1271ndash1278 2004

[56] P Milin D F Durdevic and F M Del Prado Martın ldquoThesimultaneous effects of inflectional paradigms and classes onlexical recognition Evidence from Serbianrdquo Journal of Memoryand Language vol 60 no 1 pp 50ndash64 2009

[57] B K Bergen ldquoThe psychological reality of phonaesthemesrdquoLanguage vol 80 no 2 pp 290ndash311 2004

[58] P Milin L B Feldman M Ramscar P Hendrix and R HBaayen ldquoDiscrimination in lexical decisionrdquo PLoS ONE vol 12no 2 2017

[59] R H Baayen P Milin and M Ramscar ldquoFrequency in lexicalprocessingrdquo Aphasiology vol 30 no 11 pp 1174ndash1220 2016

[60] M Marelli and M Baroni ldquoAffixation in semantic space Mod-eling morpheme meanings with compositional distributionalsemanticsrdquo Psychological Review vol 122 no 3 pp 485ndash5152015

[61] T Sering PMilin andRH Baayen ldquoLanguage comprehensionas amultiple label classification problemrdquo StatisticaNeerlandicapp 1ndash15 2018

[62] R Kaye and RWilson Linear Algebra Oxford University Press1998

[63] S H Ivens and B L Koslin Demands for Reading LiteracyRequire New Accountability Methods Touchstone Applied Sci-ence Associates 1991

[64] T K Landauer P W Foltz and D Laham ldquoAn introduction tolatent semantic analysisrdquoDiscourse Processes vol 25 no 2-3 pp259ndash284 1998

[65] H Schmid ldquoImprovements in Part-of-Speech Tagging with anApplication to Germanrdquo in Proceedings of the ACL SIGDAT-Workshop pp 13ndash25 Dublin Ireland 1995

[66] R H Baayen R Piepenbrock and L Gulikers The CELEXLexical Database (CD-ROM) Linguistic Data ConsortiumUniversity of Pennsylvania Philadelphia PA 1995

[67] J Mitchell and M Lapata ldquoVector-based models of semanticcompositionrdquo in Proceedings of the ACL pp 236ndash244 2008

[68] A Lazaridou M Marelli R Zamparelli and M BaronildquoCompositionally derived representations of morphologicallycomplex words in distributional semanticsrdquo in Proceedings ofthe ACL (1) pp 1517ndash1526 2013

[69] R Cotterell H Schutze and J Eisner ldquoMorphological smooth-ing and extrapolation of word embeddingsrdquo in Proceedings ofthe 54th Annual Meeting of the Association for ComputationalLinguistics vol 1 pp 1651ndash1660 2016

[70] B D Zeller S Pado and J Snajder ldquoTowards semantic valida-tion of a derivational lexiconrdquo in Proceedings of the COLING2014 the 25th International Conference on Computational Lin-guistics Technical Papers pp 1728ndash1739 2014

[71] H Marchand The Categories and Types of Present-Day EnglishWord Formation A Synchronic-Diachronic Approach BeckrsquoscheVerlagsbuchhandlung Munchen Germany 1969

[72] L Bauer EnglishWord-Formation Cambridge University PressCambridge 1983

Complexity 37

[73] I PlagWord-Formation in English CambridgeUniversity PressCambridge 2003

[74] R Beard Lexeme-Morpheme Base Morphology A GeneralTheory of Inflection and Word Formation State University ofNew York Press Albany NY USA 1995

[75] K Geeraert J Newman and R H Baayen ldquoIdiom VariationExperimental Data and a Blueprint of a Computational ModelrdquoTopics in Cognitive Science vol 9 no 3 pp 653ndash669 2017

[76] C Shaoul S Bitschau N Schilling et al ldquondl2 Naive discrim-inative learning an implementation in R R packagerdquo 2015

[77] P Milin D Divjak and R H Baayen ldquoA learning perspectiveon individual differences in skilled reading Exploring andexploiting orthographic and semantic discrimination cuesrdquoJournal of Experimental Psychology Learning Memory andCognition vol 43 no 11 pp 1730ndash1751 2017

[78] M Ramscar P Hendrix C Shaoul P Milin and R H BaayenldquoNonlinear dynamics of lifelong learning the myth of cognitivedeclinerdquo Topics in Cognitive Science vol 6 pp 5ndash42 2014

[79] M Ramscar C C Sun P Hendrix and R H Baayen ldquoTheMis-measurement of Mind Life-Span Changes in Paired-Associate-Learning Scores Reflect the ldquoCostrdquo of Learning Not CognitiveDeclinerdquo Psychological Science vol 28 no 8 pp 1171ndash1179 2017

[80] G desRosiers and D Ivison ldquoPaired associate learning Form 1and form 2 of the Wechsler memory scalerdquo Archives of ClinicalNeuropsychology vol 3 no 1 pp 47ndash67 1988

[81] E Bruni N K Tran andM Baroni ldquoMultimodal distributionalsemanticsrdquo Journal of Artificial Intelligence Research vol 49 pp1ndash47 2014

[82] W N Venables and B D RipleyModern Applied Statistics withS-PLUS Springer NY USA 2002

[83] A B Warriner V Kuperman and M Brysbaert ldquoNorms ofvalence arousal and dominance for 13915 English lemmasrdquoBehavior Research Methods vol 45 no 4 pp 1191ndash1207 2013

[84] D Kastovsky ldquoProductivity in word formationrdquo Linguistics vol24 no 3 pp 585ndash600 1986

[85] R H Baayen Y Chuang and J P Blevins ldquoInflectionalmorphology with linear mappingsrdquoTheMental Lexicon vol 13no 2 pp 232ndash270 2018

[86] Burnage CELEX A Guide for Users Centre for Lexical Infor-mation Nijmegen The Netherlands 1988

[87] C McBride-Chang ldquoModels of Speech Perception and Phono-logical Processing in Readingrdquo Child Development vol 67 no4 pp 1836ndash1856 1996

[88] G C Van Orden H Kloos et al ldquoThe question of phonologyand readingrdquo inThe Science of Reading A Handbook pp 61ndash782005

[89] D Jared and K OrsquoDonnell ldquoSkilled adult readers activate themeanings of high-frequency words using phonology Evidencefrom eye trackingrdquoMemoryampCognition vol 45 no 2 pp 334ndash346 2017

[90] G S Dell ldquoA Spreading-Activation Theory of Retrieval inSentence Productionrdquo Psychological Review vol 93 no 3 pp283ndash321 1986

[91] A Marantz ldquoNo escape from morphemes in morphologicalprocessingrdquo Language and Cognitive Processes vol 28 no 7 pp905ndash916 2013

[92] M Bozic W D Marslen-Wilson E A Stamatakis M HDavis and L K Tyler ldquoDifferentiating morphology formand meaning Neural correlates of morphological complexityrdquoCognitive Neuroscience vol 19 no 9 pp 1464ndash1475 2007

[93] R F Port andA P Leary ldquoAgainst formal phonologyrdquoLanguagevol 81 no 4 pp 927ndash964 2005

[94] J P Blevins ldquoStems and paradigmsrdquo Language vol 79 no 4 pp737ndash767 2003

[95] R Forsyth and D Holmes ldquoFeature-finding for test classifica-tionrdquo Literary and Linguistic Computing vol 11 no 4 pp 163ndash174 1996

[96] H Pham and R H Baayen ldquoVietnamese compounds showan anti-frequency effect in visual lexical decisionrdquo LanguageCognition and Neuroscience vol 30 no 9 pp 1077ndash1095 2015

[97] L Cohen and S Dehaene ldquoVentral and dorsal contributions toword readingrdquo inTheCognitive Neurosciences M S GazzanigaEd pp 789ndash804 Massachusetts Institute of Technology 2009

[98] E Keuleers P Lacey K Rastle and M Brysbaert ldquoThe BritishLexicon Project Lexical decision data for 28730 monosyllabicand disyllabic English wordsrdquo Behavior Research Methods vol44 no 1 pp 287ndash304 2012

[99] R K Wagner J K Torgesen and C A Rashotte ldquoDevel-opment of reading-related phonological processing abilitiesnew evidence of bidirectional causality from a latent variablelongitudinal studyrdquoDevelopmental Psychology vol 30 no 1 pp73ndash87 1994

[100] K F EWong andH-C Chen ldquoOrthographic and phonologicalprocessing in reading Chinese text Evidence from eye fixa-tionsrdquo Language and Cognitive Processes vol 14 no 5-6 pp461ndash480 1999

[101] R L Newman D Jared and C A Haigh ldquoDoes phonologyplay a role when skilled readers read high-frequency wordsEvidence from ERPsrdquo Language and Cognitive Processes vol 27no 9 pp 1361ndash1384 2012

[102] D Jared J Ashby S J Agauas and B A Levy ldquoPhonologicalactivation of word meanings in grade 5 readersrdquo Journal ofExperimental Psychology LearningMemory andCognition vol42 no 4 pp 524ndash541 2016

[103] T Bitan A Kaftory A Meiri-Leib Z Eviatar and O PelegldquoPhonological ambiguity modulates resolution of semanticambiguity during reading An fMRI study of Hebrewrdquo Neu-ropsychology vol 31 no 7 pp 759ndash777 2017

[104] D Jared and S Bainbridge ldquoReading homophone puns Evi-dence from eye trackingrdquo Canadian Journal of ExperimentalPsychology vol 71 no 1 pp 2ndash13 2017

[105] S Amenta M Marelli and S Sulpizio ldquoFrom sound tomeaning Phonology-to-Semantics mapping in visual wordrecognitionrdquo Psychonomic Bulletin amp Review vol 24 no 3 pp887ndash893 2017

[106] M Perrone-Bertolotti J Kujala J R Vidal et al ldquoHow silent issilent reading intracerebral evidence for top-down activationof temporal voice areas during readingrdquo The Journal of Neuro-science vol 32 no 49 pp 17554ndash17562 2012

[107] B Yao P Belin andC Scheepers ldquoSilent reading of direct versusindirect speech activates voice-selective areas in the auditorycortexrdquo Journal of Cognitive Neuroscience vol 23 no 10 pp3146ndash3152 2011

[108] M Coltheart B Curtis P Atkins and M Haller ldquoMod-els of Reading Aloud Dual-Route and Parallel-Distributed-Processing Approachesrdquo Psychological Review vol 100 no 4pp 589ndash608 1993

[109] M Coltheart ldquoModeling reading The dual-route approachrdquo inThe Science of Reading A Handbook pp 6ndash23 2005

38 Complexity

[110] M Pitt K Johnson E Hume S Kiesling and W RaymondldquoThe Buckeye corpus of conversational speech Labeling con-ventions and a test of transcriber reliabilityrdquo Speech Communi-cation vol 45 no 1 pp 89ndash95 2005

[111] K Johnson ldquoMassive reduction in conversational AmericanEnglishrdquo in Proceedings of the Spontaneous Speech Data andAnalysis Proceedings of the 1st Session of the 10th InternationalSymposium pp 29ndash54 The National International Institute forJapanese Language Tokyo Japan 2004

[112] C Phillips ldquoLevels of representation in the electrophysiology ofspeech perceptionrdquo Cognitive Science vol 25 no 5 pp 711ndash7312001

[113] R L Diehl A J Lotto and L L Holt ldquoSpeech perceptionrdquoAnnual Review of Psychology vol 55 pp 149ndash179 2004

[114] D Norris and J M McQueen ldquoShortlist B a Bayesian model ofcontinuous speech recognitionrdquo Psychological Review vol 115no 2 pp 357ndash395 2008

[115] S Hawkins ldquoRoles and representations of systematic finephonetic detail in speech understandingrdquo Journal of Phoneticsvol 31 no 3-4 pp 373ndash405 2003

[116] CCucchiarini andH Strik ldquoAutomatic Phonetic TranscriptionAn overviewrdquo in Proceedings of the 15th ICPhS pp 347ndash350Barcelona Spain 2003

[117] K JohnsonTheAuditoryPerceptual Basis for Speech Segmenta-tion Ohio State University Working Papers in Linguistics 1997

[118] H Fletcher ldquoAuditory patternsrdquo Reviews of Modern Physics vol12 no 1 pp 47ndash65 1940

[119] D Arnold ldquoAcousticndlcoder Coding sound files for use withndl R package version 101rdquo 2017

[120] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2016

[121] J Pickett and I Pollack ldquoIntelligibility of excerpts from fluentspeech Effects of rate of utterance and duration of excerptrdquoLanguage and Speech vol 6 pp 151ndash164 1963

[122] L Shockey ldquoPerception of reduced forms by non-native speak-ers of Englishrdquo in Sound Patterns of Spontaneous Speech DDuez Ed pp 97ndash100 ESCA Aix 1998

[123] M Ernestus H Baayen and R Schreuder ldquoThe recognition ofreduced word formsrdquo Brain and Language vol 81 no 1-3 pp162ndash173 2002

[124] R H Baayen C Shaoul J Willits and M Ramscar ldquoCom-prehension without segmentation a proof of concept withnaive discriminative learningrdquo Language Cognition and Neu-roscience vol 31 no 1 pp 106ndash128 2016

[125] C P Browman and L Goldstein ldquoArticulatory phonology Anoverviewrdquo Phonetica vol 49 no 3-4 pp 155ndash180 1992

[126] J A Tourville and F H Guenther ldquoThe DIVA model A neuraltheory of speech acquisition and productionrdquo Language andCognitive Processes vol 26 no 7 pp 952ndash981 2011

[127] G Csardi and T Nepusz ldquoThe igraph software package forcomplex network researchrdquo InterJournal Complex Systems vol1695 no 5 pp 1ndash9 2006

[128] M Ernestus and R H Baayen ldquoPredicting the unpredictableInterpreting neutralized segments in Dutchrdquo Language vol 79no 1 pp 5ndash38 2003

[129] R Weingarten G Nottbusch and U Will ldquoMorphemes sylla-bles and graphemes in written word productionrdquo inMultidisci-plinary Approaches to Speech Production T Pechmann and CHabel Eds pp 529ndash572 Mouton de Gruyter Berlin Germany2004

[130] R Bertram F E Toslashnnessen S Stromqvist J Hyona andP Niemi ldquoCascaded processing in written compound wordproductionrdquo Frontiers in Human Neuroscience vol 9 pp 1ndash102015

[131] T Cho ldquoEffects of morpheme boundaries on intergesturaltiming Evidence from Koreanrdquo Phonetica vol 58 no 3 pp129ndash162 2001

[132] S N Wood Generalized Additive Models Chapman ampHallCRC NY USA 2017

[133] M Seidenberg ldquoSublexical structures in visual word recogni-tion Access units or orthographic redundancyrdquo in Attentionand Performance XII M Coltheart Ed pp 245ndash264 LawrenceErlbaum Associates Hove 1987

[134] J M McQueen ldquoSegmentation of continuous speech usingphonotacticsrdquo Journal of Memory and Language vol 39 no 1pp 21ndash46 1998

[135] J B Hay Causes and Consequences of Word Structure Rout-ledge NY USA London UK 2003

[136] J B Hay ldquoFrom speech perception to morphology Affix-ordering revisitedrdquo Language vol 78 pp 527ndash555 2002

[137] J B Hay and R H Baayen ldquoPhonotactics parsing and produc-tivityrdquo Italian Journal of Linguistics vol 15 no 1 pp 99ndash1302003

[138] J Hay J Pierrehumbert and M Beckman ldquoSpeech perceptionwell-formedness and the statistics of the lexiconrdquo Papers inlaboratory phonology VI pp 58ndash74 2004

[139] M Ferro C Marzi and V Pirrelli ldquoA self-organizing modelof word storage and processing implications for morphologylearningrdquo Lingue e Linguaggio vol 10 no 2 pp 209ndash226 2011

[140] F ChersiM Ferro G Pezzulo andV Pirrelli ldquoTopological self-organization and prediction learning support both action andlexical chains in the brainrdquo Topics in Cognitive Science vol 6no 3 pp 476ndash491 2014

[141] V Pirrelli M Ferro and C Marzi ldquoComputational complexityof abstractive morphologyrdquo in Understanding and MeasuringMorphological Complexity M Bearman D Brown and G GCorbett Eds pp 141ndash166 Oxford University Press 2015

[142] C Marzi M Ferro and V Pirrelli ldquoIs inflectional irregularitydysfunctional to human processingrdquo in Abstract Booklet TheMental Lexicon 2018 V Kuperman Ed University of AlbertaEdmonton 2018

[143] E Keuleers M Stevens P Mandera and M Brysbaert ldquoWordknowledge in the crowd Measuring vocabulary size and wordprevalence in a massive online experimentrdquo The QuarterlyJournal of Experimental Psychology vol 68 no 8 pp 1665ndash16922015

[144] M Ernestus Voice Assimilation And Segment Reduction inCasual Dutch A Corpus-Based Study of the Phonology-PhoneticsInterface LOT Utrecht the Netherlands 2000

[145] R Kemps M Ernestus R Schreuder and H Baayen ldquoProcess-ing reducedword formsThe suffix restoration effectrdquoBrain andLanguage vol 90 no 1-3 pp 117ndash127 2004

[146] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) vol 1 pp 886ndash893 June 2005

[147] G Hickok ldquoThe architecture of speech production and the roleof the phoneme in speech processingrdquo Language Cognition andNeuroscience vol 29 no 1 pp 2ndash20 2014

[148] B V Tucker M Sims and R H Baayen Opposing Forceson Acoustic Duration Manuscript University of Alberta andUniversity of Tubingen 2018

Complexity 39

[149] I Plag J Homann and G Kunter ldquoHomophony and mor-phology The acoustics of word-final S in Englishrdquo Journal ofLinguistics vol 53 no 1 pp 181ndash216 2017

[150] F Tomaschek I Plag M Ernestus and R H BaayenModeling the Duration of Word-Final s in English withNaive Discriminative Learning Manuscript University ofSiegenTubingenNijmegen 2018

[151] R E Kalman ldquoA new approach to linear filtering and predictionproblemsrdquo Journal of Fluids Engineering vol 82 no 1 pp 35ndash451960

[152] B Russell ldquoOn denotingrdquo Mind vol 14 no 56 pp 479ndash4931905

[153] N Hornstein Logical Form FromGB toMinimalism Blackwell1995

[154] R Jackendoff Semantic StructuresMITPress Cambridge 1990[155] J J McCarthy ldquoA prosodic theory of non-concatenative mor-

phologyrdquo Linguistic Inquiry vol 12 pp 373ndash418 1981[156] A Ussishkin ldquoA Fixed Prosodic Theory of Nonconcatenative

Templaticmorphologyrdquo Natural Language amp Linguistic Theoryvol 23 no 1 pp 169ndash218 2005

[157] A Ussishkin ldquoAffix-favored contrast inequity and psycholin-guistic grounding for non-concatenative morphologyrdquo Mor-phology vol 16 no 1 pp 107ndash125 2006

[158] RW Young andWMorganTheNavajo Language A Grammarand Colloquial Dictionary University of New Mexico Press1980

[159] S C Levinson and A Majid ldquoThe island of time Yelı Dnye thelanguage of Rossel islandrdquo Frontiers in psychology vol 4 2013

[160] S C Levinson ldquoThe language of space in Yelı Dnyerdquo inGrammars of Space Explorations in Cognitive Diversity pp 157ndash203 Cambridge University Press 2006

[161] E Shafaei-Bajestan and R H Baayen ldquoWide Learning forAuditory Comprehensionrdquo in Proceedings of the Interspeech2018 2018

[162] J Hay and K Drager ldquoStuffed toys and speech perceptionrdquoLinguistics vol 48 no 4 pp 865ndash892 2010

[163] V Henri and C Henri ldquoOn our earliest recollections ofchildhoodrdquo Psychological Review vol 2 pp 215-216 1895

[164] S Freud ldquoChildhood and concealing memoriesrdquo in The BasicWritings of Sigmund Freud The Modern Library NY USA19051953

[165] P J Bauer and M Larkina ldquoThe onset of childhood amnesiain childhood A prospective investigation of the course anddeterminants of forgetting of early-life eventsrdquoMemory vol 22no 8 pp 907ndash924 2014

[166] M S Seidenberg and L M Gonnerman ldquoExplaining deriva-tional morphology as the convergence of codesrdquo Trends inCognitive Sciences vol 4 no 9 pp 353ndash361 2000

[167] M Coltheart K Rastle C Perry R Langdon and J ZieglerldquoDRC a dual route cascaded model of visual word recognitionand reading aloudrdquoPsychological Review vol 108 no 1 pp 204ndash256 2001

[168] M V Butz and E F Kutter How the Mind Comes into BeingIntroducing Cognitive Science from a Functional and Computa-tional Perspective Oxford University Press 2016

[169] M W Harm and M S Seidenberg ldquoPhonology readingacquisition and dyslexia Insights from connectionist modelsrdquoPsychological Review vol 106 no 3 pp 491ndash528 1999

[170] L M Gonnerman M S Seidenberg and E S AndersenldquoGraded semantic and phonological similarity effects in prim-ing Evidence for a distributed connectionist approach to

morphologyrdquo Journal of Experimental Psychology General vol136 no 2 pp 323ndash345 2007

[171] H Zhao O Gallo I Frosio and J Kautz ldquoLoss Functions forImage Restoration With Neural Networksrdquo IEEE Transactionson Computational Imaging vol 3 no 1 pp 47ndash57 2017

[172] C Clarke and P Luce ldquoPerceptual adaptation to speakercharacteristics Vot boundaries in stop voicing categorizationrdquoin Proceedings of the ISCA Workshop on Plasticity in SpeechPerception 2005

[173] DNorris JMMcQueen andACutler ldquoPerceptual learning inspeechrdquo Cognitive Psychology vol 47 no 2 pp 204ndash238 2003

[174] A Schweitzer and N Lewandowski ldquoSocial factors in conver-gence of f1 and f2 in spontaneous speechrdquo in Proceedings ofthe 10th International Seminar on Speech Production CologneGermany 2014

[175] M J Pickering and S Garrod ldquoToward a mechanistic psychol-ogy of dialoguerdquo Behavioral and Brain Sciences vol 27 no 2 pp169ndash190 2004

[176] C M Clarke and M F Garrett ldquoRapid adaptation to foreign-accented Englishrdquo The Journal of the Acoustical Society ofAmerica vol 116 no 6 pp 3647ndash3658 2004

[177] W J Levelt ldquoDie konnektionistische moderdquo Sprache UndKognition vol 10 no 2 pp 61ndash72 1991

[178] A Kostic ldquoInformational load constraints on processinginflected morphologyrdquo in Morphological Aspects of LanguageProcessing L B Feldman Ed Lawrence Erlbaum Inc Publish-ers New Jersey USA 1995

[179] J P BlevinsWord andParadigmMorphology OxfordUniversityPress 2016

[180] E Forstemann Uber Deutsche volksetymologie Zeitschrift furvergleichende Sprachforschung auf dem Gebiete des DeutschenGriechischen und Lateinischen 1852

[181] K Nault Morphological Therapy Protocal [Ph D thesis] Uni-versity of Alberta Edmonton 2010

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 27: The Discriminative Lexicon: A Unified Computational Model ...downloads.hindawi.com/journals/complexity/2019/4895891.pdf · as developed within computational linguistics and com-puter

Complexity 27

Table 8 Example sentences for the reading of proper names To keep the example simple inflectional lexomes are not taken into account

John Clark wrote great books about antsJohn Clark published a great book about dangerous antsJohn Welsch makes great photographs of airplanesJohn Welsch has a collection of great photographs of airplanes landingJohn Wiggam will build great harpsichordsJohn Wiggam will be a great expert in tuning harpsichordsAnne Hastie has taught statistics to great second year studentsAnne Hastie has taught probability to great first year studentsJanet Clark teaches graph theory to second year studentsJohnClark write great book about antJohnClark publish a great book about dangerous antJohnWelsch make great photograph of airplaneJohnWelsch have a collection of great photograph airplane landingJohnWiggam future build great harpsichordJohnWiggam future be a great expert in tuning harpsichordAnneHastie have teach statistics to great second year studentAnneHastie have teach probability to great first year studentJanetClark teach graphtheory to second year student

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

John

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

John Welsch

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

Janet

Pear

son

corr

elat

ion

minus0

50

00

51

0

JohnClark JohnWelsch JohnWiggam AnneHastie JanetClark

Clark

Pear

son

corr

elat

ion

minus0

50

00

51

0

Figure 10 Correlations of estimated semantic vectors for John JohnWelsch Janet andClarkwith the targeted semantic vectors of JohnClarkJohnWelsch JohnWiggam AnneHastie and JaneClark

can be predicted from ndl networks using auditory targets todiscriminate between lexomes [150]

Speech production is modeled by network G119886 whichmaps semantic vectors onto auditory targets which in turnserve as input for articulation Although current modelsof speech production assemble articulatory targets fromphonemes (see eg [126 147]) a possibility that is certainlycompatible with our model when phones are recontextual-ized as triphones we think that it is worthwhile investigatingwhether a network mapping semantic vectors onto vectors ofarticulatory parameters that in turn drive the articulators canbe made to work especially since many reduced forms arenot straightforwardly derivable from the phone sequences oftheir lsquocanonicalrsquo forms [111 144]

We have shown that the networks of our model havereasonable to high production and recognition accuraciesand also generate a wealth of well-supported predictions forlexical processing Central to all networks is the hypothesisthat the relation between form and meaning can be modeleddiscriminatively with large but otherwise surprisingly simplelinear networks the underlyingmathematics of which (linearalgebra) is well-understood Given the present results it isclear that the potential of linear algebra for understandingthe lexicon and lexical processing has thus far been severelyunderestimated (the model as outlined in Figure 11 is amodular one for reasons of computational convenience andinterpretabilityThe different networks probably interact (seefor some evidence [47]) One such interaction is explored

28 Complexity

typing speaking

orthographic targets

trigrams

auditory targets

triphones

semantic vectors

TASA

auditory cues

FBS features

orthographic cues

trigrams

cochlea retina

To Ta

Ha

GoGa

KaS

Fa Fo

CoCa

Figure 11 Overview of the discriminative lexicon Input and output systems are presented in light gray the representations of the model areshown in dark gray Each of these representations is a matrix with its own set of features Arcs are labeled with the discriminative networks(transformationmatrices) thatmapmatrices onto each otherThe arc from semantic vectors to orthographic vectors is in gray as thismappingis not explored in the present study

in Baayen et al [85] In their production model for Latininflection feedback from the projected triphone paths backto the semantics (synthesis by analysis) is allowed so thatof otherwise equally well-supported paths the path that bestexpresses the targeted semantics is selected For informationflows between systems in speech production see Hickok[147]) In what follows we touch upon a series of issues thatare relevant for evaluating our model

Incremental learning In the present study we esti-mated the weights on the connections of these networkswith matrix operations but importantly these weights canalso be estimated incrementally using the learning rule ofWidrow and Hoff [44] further improvements in accuracyare expected when using the Kalman filter [151] As allmatrices can be updated incrementally (and this holds aswell for the matrix with semantic vectors which are alsotime-variant) and in theory should be updated incrementally

whenever information about the order of learning events isavailable the present theory has potential for studying lexicalacquisition and the continuing development of the lexiconover the lifetime [79]

Morphology without compositional operations Wehave shown that our networks are predictive for a rangeof experimental findings Important from the perspective ofdiscriminative linguistics is that there are no compositionaloperations in the sense of Frege and Russell [1 2 152] Thework on compositionality in logic has deeply influencedformal linguistics (eg [3 153]) and has led to the beliefthat the ldquoarchitecture of the language facultyrdquo is groundedin a homomorphism between a calculus (or algebra) ofsyntactic representations and a calculus (or algebra) based onsemantic primitives Within this tradition compositionalityarises when rules combining representations of form arematched with rules combining representations of meaning

Complexity 29

The approach of Marelli and Baroni [60] who derive thesemantic vector of a complex word from the semantic vectorof its base and a dedicated affix-specific mapping from thepertinent base words to the corresponding derived wordsis in the spirit of compositionality in that a change in form(through affixation) goes hand in hand with a change inmeaning (modeled with a linear mapping in semantic space)

The perspective on morphological processing developedin the present study breaks with this tradition Althoughsemantic vectors for inflected words are obtained by summa-tion of the semantic vectors of the base word and those ofits inflectional functions (see [85] for modeling of the richverbal paradigms of Latin) the form vectors correspondingto the resulting semantic vectors are not obtained by thesummation of vectors for chunks of form nor by any otheroperations on forms Attempts to align chunks of wordsrsquo formwith their semantic structures mdash typically represented bygraphs constructed over semantic primitives (see eg [154])mdash are destined to end in quagmires of arbitrary decisionsabout the number of inflectional classes required (Estonian30 or 300) what the chunks should look like (does Germanhave a morpheme d in its articles and demonstratives derdie das diese ) whether Semitic morphology requiresoperations on form on one or two tiers [155ndash157] andhow many affixal position slots are required for Athebascanlanguages such asNavajo [158] A language that is particularlypainful for compositional theories is Yelı Dnye an Indo-Pacific language spoken on Rossel island (in the south-east ofPapuaNewGuinea)The language has a substantial inventoryof over a 1000 function words used for verb agreementTypically these words are monosyllables that simultaneouslyspecify values for negation tense aspect person and numberof subject deixis evidentiality associated motion and coun-terfactuality Furthermore verbs typically have no less thaneight different stems the use of which depends on particularconstellations of inflectional functions [159 160] Yelı Dnyethus provides a striking counterexample to the hypothesisthat a calculus of meaning would be paralleled by a calculusof form

Whereas the theory of linear discriminative learningradically rejects the idea that morphological processing isgrounded in compositionality as defined in mainstream for-mal linguistic theory it does allow for the conjoint expressionin form of lexical meanings and inflectional functions andfor forms to map onto such conjoint semantic vectors Butjust asmultiple inflectional functions can be expressed jointlyin a single word multiple words can map onto nonconjointsemantic vectors as was illustrated above for named entityrecognition John Wiggam can be understood to refer to avery specific person without this person being a composi-tional function of the proper name John and the surnameWiggam In both these cases there is no isomorphy betweeninflectional functions and lexical meanings on the one handand chunks of form on the other hand

Of course conjoint vectors for lexical meanings andinflectional functions can be implemented andmade to workonly because a Latin verb form such as sapıvissemus isunderstood when building the semantic space S to expresslexomes for person (first) number (plural) voice (active)

tense (pluperfect) mood (subjunctive) and lexical meaning(to know) (see [85] for computational modeling details)Although this could loosely be described as a decomposi-tional approach to inflected words we prefer not to use theterm lsquodecompositionalrsquo as in formal linguistics this termas outlined above has a very different meaning A moreadequate terminology would be that the system is conjointrealizational for production and conjoint inferential for com-prehension (we note here that we have focused on temporallyconjoint realization and inference leaving temporally disjointrealization and inference for sequences of words and possiblycompounds and fixed expression for further research)

Naive versus linear discriminative learning There isone important technical difference between the learningengine of the current study ldl and naive discriminativelearning (ndl) Because ndl makes the simplifying assump-tion that outcomes are orthogonal the weights from the cuesto a given outcome are independent of the weights from thecues to the other outcomes This independence assumptionwhich motivates the word naive in naive discriminativelearning makes it possible to very efficiently update networkweights during incremental learning In ldl by contrast thisindependence no longer holds learning is no longer naiveWe therefore refer to our networks not as ndl networksbut simply as linear discriminative learning networks Actualincremental updating of ldl networks is computationallymore costly but further numerical optimization is possibleand implementations are underway

Scaling An important property of the present approachis that good results are obtained already for datasets ofmodestsize Our semantic matrix is derived from the tasa corpuswhich with only 10 million words is dwarfed by the 28 billionwords of the ukWaC corpus used by Marelli and Baroni [60]more words than any individual speaker can ever encounterin their lifetime Similarly Arnold et al [18] report goodresults for word identification when models are trained on20 hours of speech which contrasts with the huge volumesof speech required for training deep learning networks forspeech recognition Although performance increases some-whatwithmore training data [161] it is clear that considerableheadway can be made with relatively moderate volumesof speech It thus becomes feasible to train models onfor instance Australian English and New Zealand Englishusing already existing speech corpora and to implementnetworks that make precise quantitative predictions for howunderstanding is mediated by listenersrsquo expectations abouttheir interlocutors (see eg [162])

In this study we have obtained good results with asemantic matrix with a dimensionality of around 4000lexomes By itself we find it surprising that a semantic vectorspace can be scaffolded with a mere 4000 words But thisfindingmay also shed light on the phenomenon of childhoodamnesia [163 164] Bauer and Larkina [165] pointed outthat young children have autobiographical memories thatthey may not remember a few years later They argue thatthe rate of forgetting diminishes as we grow older andstabilizes in adulthood Part of this process of forgettingmay relate to the changes in the semantic matrix Cruciallywe expect the correlational structure of lexomes to change

30 Complexity

considerably over time as experience accumulates If this isindeed the case the semantic vectors for the lexomes foryoung children are different from the corresponding lexomesfor older children which will again differ from those ofadults As a consequence autobiographical memories thatwere anchored to the lexomes at a young age can no longerbe accessed as the semantic vectors to which these memorieswere originally anchored no longer exist

No examplars Given how we estimate the transforma-tion matrices with which we navigate between form andmeaning it might seem that our approach is in essence animplementation of an exemplar-based theory of the mentallexicon After all for instance the C119886 matrix specifies foreach auditory word form (exemplar) its pertinent frequencyband summary features However the computational imple-mentation of the present study should not be confused withthe underlying algorithmic conceptualization The underly-ing conceptualization is that learning proceeds incrementallytrial by trial with the weights of transformation networksbeing updated with the learning rule of Widrow and Hoff[44] In the mean trial-by-trial updating with the Widrow-Hoff learning rule results in the same expected connectionweights as obtained by the present estimates using thegeneralized inverse The important conceptual advantage ofincremental learning is however that there is no need to storeexemplars By way of example for auditory comprehensionacoustic tokens are given with the input to the learningsystemThese tokens the ephemeral result of interactingwiththe environment leave their traces in the transformationmatrix F119886 but are not themselves stored The same holdsfor speech production We assume that speakers dynamicallyconstruct the semantic vectors from their past experiencesrather than retrieving these semantic vectors from somedictionary-like fixed representation familiar from currentfile systems The dynamic nature of these semantic vectorsemerged loud and clear from the modeling of novel complexwords for speech production under cross-validation In otherwords our hypothesis is that all input and output vectors areephemeral in the sense that they represent temporary statesof the cognitive system that are not themselves represented inthat system but that leave their traces in the transformationmatrices that jointly characterize and define the cognitivesystem that we approximate with the discriminative lexicon

This dynamic perspective on the mental lexicon asconsisting of several coupled networks that jointly bringthe cognitive system into states that in turn feed into fur-ther networks (not modeled here) for sentence integration(comprehension) and articulation (production) raises thequestion about the status of units in this theory Units playan important role when constructing numeric vectors forform and meaning Units for letter trigrams and phonetrigrams are of course context-enriched letter and phoneunits Content lexomes as well as lexomes for derivationaland inflectional functions are crutches for central semanticfunctions ranging from functions realizing relatively concreteonomasiological functions (lsquothis is a dog not a catrsquo) toabstract situational functions allowing entities events andstates to be specified on the dimensions of time spacequantity aspect etc However crucial to the central thrust of

the present study there are no morphological symbols (unitscombining form andmeaning) normorphological formunitssuch as stems and exponents of any kind in the modelAbove we have outlined the many linguistic considerationsthat have led us not to want to incorporate such units aspart of our theory Importantly as there are no hidden layersin our model it is not possible to seek for lsquosubsymbolicrsquoreflexes of morphological units in patterns of activation overhidden layers Here we depart from earlier connectionistmodels which rejected symbolic representations but retainedthe belief that there should be morphological representationalbeit subsymbolic ones Seidenberg and Gonnerman [166]for instance discuss morphological representations as beinglsquointerlevel representationsrsquo that are emergent reflections ofcorrelations among orthography phonology and semanticsThis is not to say that themore agglutinative a language is themore the present model will seem to the outside observer tobe operating withmorphemes But themore a language tendstowards fusional or polysyntheticmorphology the less strongthis impression will be

Model complexity Next consider the question of howdifficult lexical processing actually is Sidestepping the com-plexities of the neural embedding of lexical processing [126147] here we narrow down this question to algorithmiccomplexity State-of-the-art models in psychology [37 114167] implementmany layers of hierarchically organized unitsand many hold it for an established fact that such units arein some sense psychologically real (see eg [19 91 168])However empirical results can be equally well understoodwithout requiring the theoretical construct of the morpheme(see eg [47 54 58 166 169 170]) The present studyillustrates this point for lsquoboundaryrsquo effects in written andoral production When paths in the triphone graph branchuncertainty increases and processing slows down Althoughobserved lsquoboundaryrsquo effects are compatible with a post-Bloomfieldian lexicon the very same effects arise in ourmodel even though morphemes do not figure in any way inour computations

Importantly the simplicity of the linear mappings in ourmodel of the discriminative lexicon allows us to sidestep theproblem that typically arise in computationally implementedfull-scale hierarchical systems namely that errors arisingat lower levels propagate to higher levels The major stepsforward made in recent end-to-end models in languageengineering suggest that end-to-end cognitive models arealso likely to perform much better One might argue thatthe hidden layers of deep learning networks represent thetraditional lsquohidden layersrsquo of phonemes morphemes andword forms mediating between form and meaning in lin-guistic models and offshoots thereof in psychology Howeverthe case of baboon lexical learning [12 13] illustrates thatthis research strategy is not without risk and that simplerdiscriminative networks can substantially outperform deeplearning networks [15] We grant however that it is conceiv-able that model predictions will improve when the presentlinear networks are replaced by deep learning networks whenpresented with the same input and output representationsused here (see Zhao et al [171] for a discussion of lossfunctions for deep learning targeting numeric instead of

Complexity 31

categorical output vectors) What is unclear is whethersuch networks will improve our understandingmdashthe currentlinear networks offer the analyst connection weights betweeninput and output representations that are straightforwardlylinguistically interpretable Where we think deep learningnetworks really will come into their own is bridging the gapbetween for instance the cochlea and retina on the one handand the heuristic representations (letter trigrams) that wehave used as a starting point for our functional theory of themappings between form and meaning

Network flexibility An important challenge for deeplearning networks designed to model human lexical process-ing is to keep the networks flexible and open to rapid updat-ing This openness to continuous learning is required notonly by findings on categorization [40 41 41 42] includingphonetic categorization [172 173] but is also visible in timeseries of reaction times Baayen and Hendrix [17] reportedimproved prediction of visual lexical decision reaction timesin the British Lexicon Project when a naive discriminativelearning network is updated from trial to trial simulating thewithin-experiment learning that goes on as subjects proceedthrough the experiment In natural speech the phenomenonof phonetic convergence between interlocutors [174] likewisebears witness to a lexicon that is constantly recalibratedin interaction with its environment (for the many ways inwhich interlocutorsrsquo speech aligns see [175]) Likewise therapidity with which listeners adapt to foreign accents [176]within a few minutes shows that lexical networks exhibitfast local optimization As deep learning networks typicallyrequire huge numbers of cycles through the training dataonce trained they are at risk of not having the requiredflexibility for fast local adaptation This risk is reduced for(incremental) wide learning as illustrated by the phoneticconvergence evidenced by the model of Arnold et al [18] forauditory comprehension (a very similar point was made byLevelt [177] in his critique of connectionist models in thenineties)

Open questions This initial study necessarily leavesmany questions unanswered Compounds and inflectedderived words have not been addressed and morphologicaloperations such as reduplication infixation and nonconcate-native morphology provide further testing grounds for thepresent approach We also need to address the processingof compounds with multiple constituents ubiquitous inlanguages as different as German Mandarin and EstonianFurthermore in actual utterances words undergo assimila-tion at their boundaries and hence a crucial test case forthe discriminative lexicon is to model the production andcomprehension of words in multi-word utterances For ourproduction model we have also completely ignored stresssyllabification and pitch which however likely has renderedthe production task more difficult than necessary

A challenge for discriminative morphology is the devel-opment of proper semantic matrices For languages withrich morphology such as Estonian current morphologicalparsers will identify case-inflected words as nominativesgenitives or partitives (among others) but these labelsfor inflectional forms do not correspond to the semanticfunctions encoded in these forms (see eg [178 179]) What

is required are computational tools that detect the appropriateinflectional lexomes for these words in the sentences orutterances in which they are embedded

A related problem specifically for the speech productionmodel is how to predict the strongly reduced word formsthat are rampant in conversational speech [111] Our auditorycomprehension network F119886 is confronted with reduced formsin training and the study of Arnold et al [18] indicates thatpresent identification accuracymay approximate human per-formance for single-word recognition Whereas for auditorycomprehension we are making progress what is required forspeech production is a much more detailed understanding ofthe circumstances under which speakers actually use reducedword forms The factors driving reduction may be in partcaptured by the semantic vectors but we anticipate thataspects of the unfolding discourse play an important role aswell which brings us back to the unresolved question of howto best model not isolated words but words in context

An important challenge for our theory which formalizesincremental implicit learning is how to address and modelthe wealth of explicit knowledge that speakers have abouttheir language We play with the way words sound in poetrywe are perplexed by wordsrsquo meanings and reinterpret themso that they make more sense (eg the folk etymology [180]reflected in modern English crawfish a crustacean and nota fish originally Middle English crevis compare ecrevissein French) we teach morphology in schools and we evenfind that making morphological relations explicit may bebeneficial for aphasic patients [181] The rich culture of worduse cannot but interact with the implicit system but howexactly this interaction unfolds andwhat its consequences areat both sides of the consciousunconscious divide is presentlyunclear

Although many questions remain our model is remark-ably successful in capturing many aspects of lexical andmorphological processing in both comprehension and pro-duction Since themodel builds on a combination of linguisti-callymotivated lsquosmartrsquo low-level representations for form andmeaning and large but otherwise straightforward andmathe-matically well-understood two-layer linear networks (to beclear networks without any hidden layers) the conclusionseems justified that algorithmically the ldquomental lexiconrdquomaybe much simpler than previously thought

Appendix

A Vectors Matrices and MatrixMultiplication

Figure 12 presents two data points 119886 and 119887 with coordinates(3 4) and (-2 -3) Arrows drawn from the origin to thesepoints are shown in blue with a solid and dashed linerespectively We refer to these points as vectors Vectors aredenoted with lower case letters in bold and we place the 119909coordinate of a point next to the 119910 coordinate

a = (3 4) b = (minus2 minus3) (A1)

32 Complexity

minus4 minus2 0 2 4

minus4

minus2

02

4

x

y

a

b

x

y

Figure 12 Points 119886 and 119887 can be transformed into points 119901 and 119902 bya linear transformation

We can represent a and b jointly by placing them next to eachother in a matrix For matrices we use capital letters in boldfont

A = ( 3 4minus2 minus3) (A2)

In addition to the points 119886 and 119887 Figure 12 also shows twoother datapoints 119909 and 119910 The matrix for these two points is

B = (minus5 24 minus1) (A3)

Let us assume that points 119886 and 119887 represent the forms oftwo words 1205961 and 1205962 and that the points 119909 and 119910 representthe meanings of these two words (below we discuss in detailhow exactly wordsrsquo forms and meanings can be representedas vectors of numbers) As morphology is the study of therelation between wordsrsquo forms and their meanings we areinterested in how to transform points 119886 and b into points119909 and 119910 and likewise in how to transform points 119909 and119910 back into points 119886 and 119887 The first transformation isrequired for comprehension and the second transformationfor production

We begin with the transformation for comprehensionFormally we have matrix A which we want to transforminto matrix B A way of mapping A onto B is by means ofa linear transformation using matrix multiplication For 2 times2 matrices matrix multiplication is defined as follows(1198861 11988621198871 1198872) (1199091 11990921199101 1199102) = (11988611199091 + 11988621199101 11988611199092 + 1198862119910211988711199091 + 11988721199101 11988711199092 + 11988721199102) (A4)

Such a mapping of A onto B is given by a matrix F

F = ( 1 2minus2 minus1) (A5)

and it is straightforward to verify that indeed( 3 4minus2 minus3) ( 1 2minus2 minus1) = (minus5 24 minus1) (A6)

Using matrix notation we can write

AF = B (A7)

Given A and B how do we obtain F The answeris straightforward but we need two further concepts theidentity matrix and the matrix inverse For multiplication ofnumbers the identity multiplication is multiplication with 1For matrices multiplication with the identity matrix I leavesthe locations of the points unchangedThe identitymatrix hasones on the main diagonal and zeros elsewhere It is easilyverified that indeed(1 00 1) (1199091 11990921199101 1199102) = (1199091 11990921199101 1199102) (1 00 1) = (1199091 11990921199101 1199102) (A8)

For numbers the inverse of multiplication with 119904 is dividingby 119904 (1119904 ) (119904119909) = 119909 (A9)

For matrices the inverse of a square matrix X is that matrixXminus1 such that their product is the identity matrix

Xminus1X = XXminus1 = I (A10)

For nonsquare matrices the inverse Yminus1 is defined such that

Yminus1 (YX) = (XY) 119884minus1 = X (A11)

We find the square matrix F that maps the square matrices Aonto B as follows

AF = B

Aminus1AF = Aminus1B

IF = Aminus1B

F = Aminus1B(A12)

Since for the present example A and its inverse happen to beidentical (Aminus1 = A) we obtain (A6)

Fmaps wordsrsquo form vectors onto wordsrsquo semantic vectorsLet us now consider the reverse and see how we can obtainwordsrsquo form vectors from wordsrsquo semantic vectors That isgiven B we want to find that matrix G which maps B ontoA ie

BG = A (A13)

Solving this equation proceeds exactly as above

BG = A

Bminus1BG = Bminus1A

IG = Bminus1A

G = Bminus1A(A14)

Complexity 33

The inverse of B is

Bminus1 = (13 2343 53) (A15)

and hence we have

(13 2343 53) ( 3 4minus2 minus3) = (minus13 minus2323 13 ) = G (A16)

The inverse of a matrix needs not exist A square matrixthat does not have an inverse is referred to as a singularmatrix Almost all matrices with which we work in theremainder of this study are singular For singular matricesan approximation of the inverse can be used such as theMoore-Penrose generalized inverse (In R the generalizedinverse is available in theMASS package function ginv Thenumeric library that is used by ginv is LAPACK availableat httpwwwnetliborglapack) In this study wedenote the generalized inverse of a matrix X by X1015840

The examples presented here are restricted to 2 times 2matrices but themathematics generalize tomatrices of largerdimensions When multiplying two matrices X and Y theonly constraint is that the number of columns ofX is the sameas the number of rows ofYThe resultingmatrix has the samenumber of rows as X and the same number of columns as Y

In the present study the rows of matrices represent wordsand columns the features of thesewords For instance the rowvectors of A can represent wordsrsquo form features and the rowvectors of and B the corresponding semantic features Thefirst row vector of A is mapped onto the first row vector ofB as follows (3 4) ( 1 2minus2 minus1) = (minus5 2) (A17)

A transformationmatrix such as F can be represented as atwo-layer networkThe network corresponding to F is shownin Figure 13 When the input vector (3 4) is presented to thenetwork the nodes 1198941 and 1198942 are set to 3 and 4 respectively Toobtain the activations 1199001 = minus5 and 1199002 = 2 for the output vector(minus5 2) we cumulate the evidence at the input weighted bythe connection strengths specified on the edges of the graphThus for 1199001 we obtain the value 3 times 1 + 4 times minus2 = minus5 and for1199002 we have 3 times 2 + 4 times minus1 = 2 In other words the networkmaps the input vector (3 4) onto the output vector (minus5 2)

An important property of linear maps is that they areproductive We can present a novel form vector say

s = (2 2) (A18)

to the network and it will map this vector onto a novelsemantic vector Using matrices this new semantic vector isstraightforwardly calculated(2 2) ( 1 2minus2 minus1) = (minus2 2) (A19)

1 2 minus2 minus1

i1 i2

o1 o2

semantic vectors come out here

form features go in here

connection weightsare the transformationmatrix

Figure 13 The linear network corresponding to the transformationmatrix F There are two input nodes 1198941 and 1198942 (in blue)and twooutput nodes 1199001 and 1199002 in red When the input vector (3 4) ispresented to the network the value at 1199001 is 3 times 1 + 4 times minus2 = minus5and that at 1199002 is 3 times 2 + 4 times minus1 = 2 The network maps the point(3 4) onto the point (minus5 2)B Graph-Based Triphone Sequencing

The hypothesis underlying graph-based triphone sequencingis that in the directed graph that has triphones as verticesand an edge between any two vertices that overlap (iesegments 2 and 3 of triphone 1 are identical to segments 1and 2 of triphone 2) the path from a wordrsquos initial triphone(the triphone starting with ) to a wordrsquos final triphone (thetriphone ending with a ) is the path receiving the bestsupport of all possible paths from the initial to the finaltriphone Unfortunately the directed graph containing allvertices and edges is too large tomake computations tractablein a reasonable amount of computation time The followingalgorithm is a heuristic algorithm that first collects potentiallyrelevant vertices and edges and then calculates all pathsstarting fromany initial vertex using theall simple pathsfunction from the igraph package [127] As a first step allvertices that are supported by the stem and whole wordwith network support exceeding 01 are selected To theresulting set those vertices are added that are supported bythe affix which is conditional on these vertices having anetwork support exceeding 095 A directed graph can nowbe constructed and for each initial vertex all paths startingat these vertices are calculated The subset of those pathsreaching a final vertex is selected and the support for eachpath is calculated As longer paths trivially will have greatersupport the support for a path is weighted for its lengthsimply by dividing the raw support by the path length Pathsare ranked by weighted path support and the path withmaximal support is selected as the acoustic image drivingarticulation

It is possible that no path from an initial vertex to a finalvertex is found due to critical boundary triphones not beinginstantiated in the data on which the model was trainedThis happens when the model is assessed on novel inflected

34 Complexity

nobo

dyso

met

hing

ever

ythi

ngno

thin

gev

eryo

nean

ythi

ngan

yone

som

eone

hers

him his

via

than I

exce

ptou

tside

thei

rsw

hoev

erth

ence

min

ein

side

amon

gst its

thei

rou

rm

yyo

uyo

ur her

she

your

selv

esw

hate

ver us we it

they he

them m

em

yself

him

self

hers

elfou

rselv

esyo

urse

lfits

elfth

emse

lves

abou

tw

ithou

tw

ithin

beyo

ndam

ong

betw

een

thro

ugho

utdu

ring

since by

from in of

with at fo

r tobe

low

abov

e up offdo

wn

upon on

agai

nst

tow

ard

tow

ards

arou

ndne

aral

ong

unde

rbe

hind

besid

eth

roug

hon

toin

toov

erac

ross

nobodysomethingeverythingnothingeveryoneanythinganyonesomeonehershimhisviathanIexceptoutsidetheirswhoeverthencemineinsideamongstitstheirourmyyouyourhersheyourselveswhateverusweittheyhethemmemyselfhimselfherselfourselvesyourselfitselfthemselvesaboutwithoutwithinbeyondamongbetweenthroughoutduringsincebyfrominofwithatfortobelowaboveupoffdownupononagainsttowardtowardsaroundnearalongunderbehindbesidethroughontointooveracross

minus01 0 01 02Value

Color Keyand Histogram

050

015

0025

00C

ount

Figure 14 Heatmap for the correlations of the semantic vectors of pronouns and prepositions Note that quantifier pronouns form a groupof their own that prepositions and the other pronouns form distinguisable groups and that within groups further subclusters are visible egforwe and us she and her and across and over All diagonal elements represent correlations equal to 1 (this is not brought out by the heatmapwhich color-codes diagonal elements with the color for the highest correlation of the range we specified 025)

and derived words under cross-validationTherefore verticesand edges are added to the graph for all nonfinal verticesthat are at the end of any of the paths starting at any initialtriphone Such novel vertices and edges are assigned zeronetwork support Typically paths from the initial vertex tothe final vertex can now be constructed and path support isevaluated as above

For an algorithm that allows triphone paths to includecycles which may be the case in languages with much richer

morphology than English see Baayen et al [85] and for codethe R packageWpmWithLdl described therein

C A Heatmap for Function Words(Pronouns and Prepositions)

The heatmap for the correlations of the semantic vectors ofpronouns and prepositions is presented in Figure 14

Complexity 35

Data Availability

With the exception of the data from theDistributed Little RedHen Lab all primary data are publicly available from the citedsources

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The authors are indebted to Geoff Hollis Jessie Nixon ChrisWestbury and LuisMienhardt for their constructive feedbackon earlier versions of this manuscript This research wassupported by an ERC advancedGrant (no 742545) to the firstauthor

References

[1] G Frege ldquoBegriffsschrift a formula language modeled uponthat of arithmetic for pure thoughtrdquo in From Frege to GodelA Source Book in Mathematical Logic 1879-1931 pp 1ndash82 1967

[2] B RussellAn Inquiry intoMeaning andTruth Allen andUnwinLondon UK 1942

[3] R Montague ldquoThe proper treatment of quantification in ordi-nary englishrdquo in Approaches to Natural Language pp 221ndash242Springer 1973

[4] N Chomsky Syntactic Structures vol 33 Mouton The Hague1957

[5] W J M Levelt Formal Grammars in Linguistics And Psycholin-guistics Volume 1 An Introduction to the Theory of FormalLanguages and Automata Volume 2 Applications in LinguisticTheory Volume 3 Psycholinguistic Applications John BenjaminsPublishing 2008

[6] D F Kleinschmidt and T Florian Jaeger ldquoRobust speechperception Recognize the familiar generalize to the similar andadapt to the novelrdquo Psychological Review vol 122 no 2 pp 148ndash203 2015

[7] L Breiman ldquoRandom forestsrdquo Machine Learning vol 45 no 1pp 5ndash32 2001

[8] C Strobl A-L Boulesteix A Zeileis and T Hothorn ldquoBiasin random forest variable importance measures illustrationssources and a solutionrdquo BMC Bioinformatics vol 8 2007

[9] A Hannun C Case J Casper et al ldquoDeep speechScaling up end-to-end speech recognitionrdquo 2014httpsarxivorgabs14125567

[10] Z Tuske P Golik R Schluter and H Ney ldquoAcoustic modelingwith deep neural networks using raw time signal for lvcsrrdquo inProceedings of the INTERSPEECH pp 890ndash894 2014

[11] J Schmidhuber ldquoDeep learning in neural networks anoverviewrdquo Neural Networks vol 61 pp 85ndash117 2015

[12] T Hannagan J C Ziegler S Dufau J Fagot and J GraingerldquoDeep learning of orthographic representations in baboonsrdquoPLoS ONE vol 9 no 1 2014

[13] J Grainger S Dufau M Montant J C Ziegler and J FagotldquoOrthographic processing in baboons (Papio papio)rdquo Sciencevol 336 no 6078 pp 245ndash248 2012

[14] D Scarf K Boy A U Reinert J Devine O Gunturkun andM Colombo ldquoOrthographic processing in pigeons (columba

livia)rdquo Proceedings of the National Academy of Sciences vol 113no 40 pp 11272ndash11276 2016

[15] M Linke F Broker M Ramscar and R H Baayen ldquoArebaboons learning ldquoorthographicrdquo representations Probablynotrdquo PLoS ONE vol 12 no 8 2017

[16] D H Hubel and T N Wiesel ldquoReceptive fields binocularinteraction and functional architecture in the catrsquos visualcortexrdquo The Journal of Physiology vol 160 pp 106ndash154 1962

[17] R H Baayen and P Hendrix ldquoTwo-layer networks non-linear separation and human learningrdquo in From Semantics toDialectometry Festschrift in honor of JohnNerbonne Tributes 32M Wieling M Kroon G Van Noord and G Bouma Eds pp13ndash22 College Publications 2017

[18] D Arnold F Tomaschek K Sering F Lopez and R H BaayenldquoWords from spontaneous conversational speech can be rec-ognized with human-like accuracy by an error-driven learningalgorithm that discriminates between meanings straight fromsmart acoustic features bypassing the phoneme as recognitionunitrdquo PLoS ONE vol 12 no 4 2017

[19] P Zwitserlood ldquoProcessing and representation of morphologi-cal complexity in native language comprehension and produc-tionrdquo in The Construction of Words G E Booij Ed pp 583ndash602 Springer 2018

[20] G Booij ldquoConstruction morphologyrdquo Language and LinguisticsCompass vol 4 no 7 pp 543ndash555 2010

[21] J P Blevins ldquoWord-based morphologyrdquo Journal of Linguisticsvol 42 no 3 pp 531ndash573 2006

[22] G Stump Inflectional Morphology A Theory of ParadigmStructure Cambridge University Press 2001

[23] R Beard ldquoOn the extent and nature of irregularity in thelexiconrdquo Lingua vol 42 no 4 pp 305ndash341 1977

[24] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1974

[25] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1991

[26] C Hockett ldquoThe origin of speechrdquo Scientific American vol 203pp 89ndash97 1960

[27] M Erelt Estonian Language M Erelt Ed Estonian AcademyPublishers Tallinn Estonia 2003

[28] P H Matthews Grammatical Theory in the United States fromBloomfield to Chomsky Cambridge University Press Cam-bridge 1993

[29] N Chomsky andMHalleTheSound Pattern of English Harperand Row NY USA 1968

[30] T Landauer and S Dumais ldquoA solution to Platorsquos problem Thelatent semantic analysis theory of acquisition induction andrepresentation of knowledgerdquo Psychological Review vol 104 no2 pp 211ndash240 1997

[31] W Weaver ldquoTranslationrdquo in Machine Translation of LanguagesFourteen Essays W N Locke and A D Booth Eds pp 15ndash23MIT Press Cambridge 1955

[32] J R Firth Selected Papers of J R Firth 1952-59 IndianaUniversity Press 1968

[33] K Lund and C Burgess ldquoProducing high-dimensional seman-tic spaces from lexical co-occurrencerdquo Behavior Research Meth-ods Instruments and Computers vol 28 no 2 pp 203ndash2081996

[34] C Shaoul and C Westbury ldquoExploring lexical co-occurrencespace using HiDExrdquo Behavior Research Methods vol 42 no 2pp 393ndash413 2010

36 Complexity

[35] T Mikolov I Sutskever K Chen G Corrado and J DeanldquoDistributed representations of words and phrases and theircompositionalityrdquo inAdvances in Neural Information ProcessingSystems pp 3111ndash3119 2013

[36] A Roelofs ldquoThe WEAVER model of word-form encoding inspeech productionrdquo Cognition vol 64 no 3 pp 249ndash284 1997

[37] W J M Levelt A Roelofs and A S Meyer ldquoA theory of lexicalaccess in speech productionrdquoBehavioral and Brain Sciences vol22 no 1 pp 1ndash38 1999

[38] M Taft ldquoInteractive-activation as a Framework for Under-standing Morphological Processingrdquo Language and CognitiveProcesses vol 9 no 3 pp 271ndash294 1994

[39] R Schreuder and R H Baayen ldquoModeling morphologicalprocessingrdquo inMorphological Aspects of Language Processing LB Feldman Ed pp 131ndash154 Lawrence ErlbaumHillsdale NewJersey USA 1995

[40] B C Love D L Medin and T M Gureckis ldquoSUSTAIN ANetwork Model of Category Learningrdquo Psychological Reviewvol 111 no 2 pp 309ndash332 2004

[41] C J Marsolek ldquoWhat antipriming reveals about primingrdquoTrends in Cognitive Sciences vol 12 no 5 pp 176ndash181 2008

[42] M Ramscar and R Port ldquoCategorization (without categories)rdquoin Handbook of Cognitive Linguistics E Dabrowska and DDivjak Eds pp 75ndash99 De Gruyter Berlin Germany 2015

[43] R A Rescorla and A R Wagner ldquoA theory of Pavlovianconditioning Variations in the effectiveness of reinforcementand nonreinforcementrdquo in Classical Conditioning II CurrentResearch and Theory A H Black and W F Prokasy Eds pp64ndash99 Appleton Century Crofts NY USA 1972

[44] B Widrow and M E Hoff ldquoAdaptive switching circuitsrdquo 1960WESCON Convention Record Part IV pp 96ndash104 1960

[45] R H Baayen R Schreuder and R Sproat ldquoMorphology inthe Mental Lexicon A Computational Model for Visual WordRecognitionrdquo in Lexicon Development for Speech and LanguageProcessing F van Eynde and D Gibbon Eds pp 267ndash291Kluwer Academic Publishers 2000

[46] M Taft ldquoA morphological-decomposition model of lexicalrepresentationrdquo Linguistics vol 26 no 4 pp 657ndash667 1988

[47] MWHarmandM S Seidenberg ldquoComputing themeanings ofwords in reading Cooperative division of labor between visualand phonological processesrdquo Psychological Review vol 111 no3 pp 662ndash720 2004

[48] P C Trimmer J M McNamara A Houston and J A RMarshall ldquoDoes natural selection favour the Rescorla-Wagnerrulerdquo Journal of Theoretical Biology vol 302 pp 39ndash52 2012

[49] R R Miller R C Barnet and N J Grahame ldquoAssessment ofthe Rescorla-Wagner modelrdquo Psychological Bulletin vol 117 no3 pp 363ndash386 1995

[50] L J Kamin ldquoPredictability surprise attention and condition-ingrdquo in Punishment and Aversive Behavior B A Campbell andRMChurch Eds pp 276ndash296Appleton-Century-CroftsNYUSA 1969

[51] R A Rescorla ldquoPavlovian conditioning Itrsquos not what you thinkit isrdquo American Psychologist vol 43 no 3 pp 151ndash160 1988

[52] M Ramscar D Yarlett M Dye K Denny and K ThorpeldquoThe Effects of Feature-Label-Order and Their Implications forSymbolic Learningrdquo Cognitive Science vol 34 no 6 pp 909ndash957 2010

[53] M Ramscar and D Yarlett ldquoLinguistic self-correction in theabsence of feedback A new approach to the logical problem oflanguage acquisitionrdquo Cognitive Science vol 31 no 6 pp 927ndash960 2007

[54] R H Baayen P Milin D F ETHurđevic P Hendrix and MMarelli ldquoAn Amorphous Model for Morphological Process-ing in Visual Comprehension Based on Naive DiscriminativeLearningrdquo Psychological Review vol 118 no 3 pp 438ndash4812011

[55] F M Del Prado Martın R Bertram T Haikio R Schreuderand R H Baayen ldquoMorphological family size in a morphologi-cally rich languageThe case of Finnish compared to Dutch andHebrewrdquo Journal of Experimental Psychology LearningMemoryand Cognition vol 30 pp 1271ndash1278 2004

[56] P Milin D F Durdevic and F M Del Prado Martın ldquoThesimultaneous effects of inflectional paradigms and classes onlexical recognition Evidence from Serbianrdquo Journal of Memoryand Language vol 60 no 1 pp 50ndash64 2009

[57] B K Bergen ldquoThe psychological reality of phonaesthemesrdquoLanguage vol 80 no 2 pp 290ndash311 2004

[58] P Milin L B Feldman M Ramscar P Hendrix and R HBaayen ldquoDiscrimination in lexical decisionrdquo PLoS ONE vol 12no 2 2017

[59] R H Baayen P Milin and M Ramscar ldquoFrequency in lexicalprocessingrdquo Aphasiology vol 30 no 11 pp 1174ndash1220 2016

[60] M Marelli and M Baroni ldquoAffixation in semantic space Mod-eling morpheme meanings with compositional distributionalsemanticsrdquo Psychological Review vol 122 no 3 pp 485ndash5152015

[61] T Sering PMilin andRH Baayen ldquoLanguage comprehensionas amultiple label classification problemrdquo StatisticaNeerlandicapp 1ndash15 2018

[62] R Kaye and RWilson Linear Algebra Oxford University Press1998

[63] S H Ivens and B L Koslin Demands for Reading LiteracyRequire New Accountability Methods Touchstone Applied Sci-ence Associates 1991

[64] T K Landauer P W Foltz and D Laham ldquoAn introduction tolatent semantic analysisrdquoDiscourse Processes vol 25 no 2-3 pp259ndash284 1998

[65] H Schmid ldquoImprovements in Part-of-Speech Tagging with anApplication to Germanrdquo in Proceedings of the ACL SIGDAT-Workshop pp 13ndash25 Dublin Ireland 1995

[66] R H Baayen R Piepenbrock and L Gulikers The CELEXLexical Database (CD-ROM) Linguistic Data ConsortiumUniversity of Pennsylvania Philadelphia PA 1995

[67] J Mitchell and M Lapata ldquoVector-based models of semanticcompositionrdquo in Proceedings of the ACL pp 236ndash244 2008

[68] A Lazaridou M Marelli R Zamparelli and M BaronildquoCompositionally derived representations of morphologicallycomplex words in distributional semanticsrdquo in Proceedings ofthe ACL (1) pp 1517ndash1526 2013

[69] R Cotterell H Schutze and J Eisner ldquoMorphological smooth-ing and extrapolation of word embeddingsrdquo in Proceedings ofthe 54th Annual Meeting of the Association for ComputationalLinguistics vol 1 pp 1651ndash1660 2016

[70] B D Zeller S Pado and J Snajder ldquoTowards semantic valida-tion of a derivational lexiconrdquo in Proceedings of the COLING2014 the 25th International Conference on Computational Lin-guistics Technical Papers pp 1728ndash1739 2014

[71] H Marchand The Categories and Types of Present-Day EnglishWord Formation A Synchronic-Diachronic Approach BeckrsquoscheVerlagsbuchhandlung Munchen Germany 1969

[72] L Bauer EnglishWord-Formation Cambridge University PressCambridge 1983

Complexity 37

[73] I PlagWord-Formation in English CambridgeUniversity PressCambridge 2003

[74] R Beard Lexeme-Morpheme Base Morphology A GeneralTheory of Inflection and Word Formation State University ofNew York Press Albany NY USA 1995

[75] K Geeraert J Newman and R H Baayen ldquoIdiom VariationExperimental Data and a Blueprint of a Computational ModelrdquoTopics in Cognitive Science vol 9 no 3 pp 653ndash669 2017

[76] C Shaoul S Bitschau N Schilling et al ldquondl2 Naive discrim-inative learning an implementation in R R packagerdquo 2015

[77] P Milin D Divjak and R H Baayen ldquoA learning perspectiveon individual differences in skilled reading Exploring andexploiting orthographic and semantic discrimination cuesrdquoJournal of Experimental Psychology Learning Memory andCognition vol 43 no 11 pp 1730ndash1751 2017

[78] M Ramscar P Hendrix C Shaoul P Milin and R H BaayenldquoNonlinear dynamics of lifelong learning the myth of cognitivedeclinerdquo Topics in Cognitive Science vol 6 pp 5ndash42 2014

[79] M Ramscar C C Sun P Hendrix and R H Baayen ldquoTheMis-measurement of Mind Life-Span Changes in Paired-Associate-Learning Scores Reflect the ldquoCostrdquo of Learning Not CognitiveDeclinerdquo Psychological Science vol 28 no 8 pp 1171ndash1179 2017

[80] G desRosiers and D Ivison ldquoPaired associate learning Form 1and form 2 of the Wechsler memory scalerdquo Archives of ClinicalNeuropsychology vol 3 no 1 pp 47ndash67 1988

[81] E Bruni N K Tran andM Baroni ldquoMultimodal distributionalsemanticsrdquo Journal of Artificial Intelligence Research vol 49 pp1ndash47 2014

[82] W N Venables and B D RipleyModern Applied Statistics withS-PLUS Springer NY USA 2002

[83] A B Warriner V Kuperman and M Brysbaert ldquoNorms ofvalence arousal and dominance for 13915 English lemmasrdquoBehavior Research Methods vol 45 no 4 pp 1191ndash1207 2013

[84] D Kastovsky ldquoProductivity in word formationrdquo Linguistics vol24 no 3 pp 585ndash600 1986

[85] R H Baayen Y Chuang and J P Blevins ldquoInflectionalmorphology with linear mappingsrdquoTheMental Lexicon vol 13no 2 pp 232ndash270 2018

[86] Burnage CELEX A Guide for Users Centre for Lexical Infor-mation Nijmegen The Netherlands 1988

[87] C McBride-Chang ldquoModels of Speech Perception and Phono-logical Processing in Readingrdquo Child Development vol 67 no4 pp 1836ndash1856 1996

[88] G C Van Orden H Kloos et al ldquoThe question of phonologyand readingrdquo inThe Science of Reading A Handbook pp 61ndash782005

[89] D Jared and K OrsquoDonnell ldquoSkilled adult readers activate themeanings of high-frequency words using phonology Evidencefrom eye trackingrdquoMemoryampCognition vol 45 no 2 pp 334ndash346 2017

[90] G S Dell ldquoA Spreading-Activation Theory of Retrieval inSentence Productionrdquo Psychological Review vol 93 no 3 pp283ndash321 1986

[91] A Marantz ldquoNo escape from morphemes in morphologicalprocessingrdquo Language and Cognitive Processes vol 28 no 7 pp905ndash916 2013

[92] M Bozic W D Marslen-Wilson E A Stamatakis M HDavis and L K Tyler ldquoDifferentiating morphology formand meaning Neural correlates of morphological complexityrdquoCognitive Neuroscience vol 19 no 9 pp 1464ndash1475 2007

[93] R F Port andA P Leary ldquoAgainst formal phonologyrdquoLanguagevol 81 no 4 pp 927ndash964 2005

[94] J P Blevins ldquoStems and paradigmsrdquo Language vol 79 no 4 pp737ndash767 2003

[95] R Forsyth and D Holmes ldquoFeature-finding for test classifica-tionrdquo Literary and Linguistic Computing vol 11 no 4 pp 163ndash174 1996

[96] H Pham and R H Baayen ldquoVietnamese compounds showan anti-frequency effect in visual lexical decisionrdquo LanguageCognition and Neuroscience vol 30 no 9 pp 1077ndash1095 2015

[97] L Cohen and S Dehaene ldquoVentral and dorsal contributions toword readingrdquo inTheCognitive Neurosciences M S GazzanigaEd pp 789ndash804 Massachusetts Institute of Technology 2009

[98] E Keuleers P Lacey K Rastle and M Brysbaert ldquoThe BritishLexicon Project Lexical decision data for 28730 monosyllabicand disyllabic English wordsrdquo Behavior Research Methods vol44 no 1 pp 287ndash304 2012

[99] R K Wagner J K Torgesen and C A Rashotte ldquoDevel-opment of reading-related phonological processing abilitiesnew evidence of bidirectional causality from a latent variablelongitudinal studyrdquoDevelopmental Psychology vol 30 no 1 pp73ndash87 1994

[100] K F EWong andH-C Chen ldquoOrthographic and phonologicalprocessing in reading Chinese text Evidence from eye fixa-tionsrdquo Language and Cognitive Processes vol 14 no 5-6 pp461ndash480 1999

[101] R L Newman D Jared and C A Haigh ldquoDoes phonologyplay a role when skilled readers read high-frequency wordsEvidence from ERPsrdquo Language and Cognitive Processes vol 27no 9 pp 1361ndash1384 2012

[102] D Jared J Ashby S J Agauas and B A Levy ldquoPhonologicalactivation of word meanings in grade 5 readersrdquo Journal ofExperimental Psychology LearningMemory andCognition vol42 no 4 pp 524ndash541 2016

[103] T Bitan A Kaftory A Meiri-Leib Z Eviatar and O PelegldquoPhonological ambiguity modulates resolution of semanticambiguity during reading An fMRI study of Hebrewrdquo Neu-ropsychology vol 31 no 7 pp 759ndash777 2017

[104] D Jared and S Bainbridge ldquoReading homophone puns Evi-dence from eye trackingrdquo Canadian Journal of ExperimentalPsychology vol 71 no 1 pp 2ndash13 2017

[105] S Amenta M Marelli and S Sulpizio ldquoFrom sound tomeaning Phonology-to-Semantics mapping in visual wordrecognitionrdquo Psychonomic Bulletin amp Review vol 24 no 3 pp887ndash893 2017

[106] M Perrone-Bertolotti J Kujala J R Vidal et al ldquoHow silent issilent reading intracerebral evidence for top-down activationof temporal voice areas during readingrdquo The Journal of Neuro-science vol 32 no 49 pp 17554ndash17562 2012

[107] B Yao P Belin andC Scheepers ldquoSilent reading of direct versusindirect speech activates voice-selective areas in the auditorycortexrdquo Journal of Cognitive Neuroscience vol 23 no 10 pp3146ndash3152 2011

[108] M Coltheart B Curtis P Atkins and M Haller ldquoMod-els of Reading Aloud Dual-Route and Parallel-Distributed-Processing Approachesrdquo Psychological Review vol 100 no 4pp 589ndash608 1993

[109] M Coltheart ldquoModeling reading The dual-route approachrdquo inThe Science of Reading A Handbook pp 6ndash23 2005

38 Complexity

[110] M Pitt K Johnson E Hume S Kiesling and W RaymondldquoThe Buckeye corpus of conversational speech Labeling con-ventions and a test of transcriber reliabilityrdquo Speech Communi-cation vol 45 no 1 pp 89ndash95 2005

[111] K Johnson ldquoMassive reduction in conversational AmericanEnglishrdquo in Proceedings of the Spontaneous Speech Data andAnalysis Proceedings of the 1st Session of the 10th InternationalSymposium pp 29ndash54 The National International Institute forJapanese Language Tokyo Japan 2004

[112] C Phillips ldquoLevels of representation in the electrophysiology ofspeech perceptionrdquo Cognitive Science vol 25 no 5 pp 711ndash7312001

[113] R L Diehl A J Lotto and L L Holt ldquoSpeech perceptionrdquoAnnual Review of Psychology vol 55 pp 149ndash179 2004

[114] D Norris and J M McQueen ldquoShortlist B a Bayesian model ofcontinuous speech recognitionrdquo Psychological Review vol 115no 2 pp 357ndash395 2008

[115] S Hawkins ldquoRoles and representations of systematic finephonetic detail in speech understandingrdquo Journal of Phoneticsvol 31 no 3-4 pp 373ndash405 2003

[116] CCucchiarini andH Strik ldquoAutomatic Phonetic TranscriptionAn overviewrdquo in Proceedings of the 15th ICPhS pp 347ndash350Barcelona Spain 2003

[117] K JohnsonTheAuditoryPerceptual Basis for Speech Segmenta-tion Ohio State University Working Papers in Linguistics 1997

[118] H Fletcher ldquoAuditory patternsrdquo Reviews of Modern Physics vol12 no 1 pp 47ndash65 1940

[119] D Arnold ldquoAcousticndlcoder Coding sound files for use withndl R package version 101rdquo 2017

[120] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2016

[121] J Pickett and I Pollack ldquoIntelligibility of excerpts from fluentspeech Effects of rate of utterance and duration of excerptrdquoLanguage and Speech vol 6 pp 151ndash164 1963

[122] L Shockey ldquoPerception of reduced forms by non-native speak-ers of Englishrdquo in Sound Patterns of Spontaneous Speech DDuez Ed pp 97ndash100 ESCA Aix 1998

[123] M Ernestus H Baayen and R Schreuder ldquoThe recognition ofreduced word formsrdquo Brain and Language vol 81 no 1-3 pp162ndash173 2002

[124] R H Baayen C Shaoul J Willits and M Ramscar ldquoCom-prehension without segmentation a proof of concept withnaive discriminative learningrdquo Language Cognition and Neu-roscience vol 31 no 1 pp 106ndash128 2016

[125] C P Browman and L Goldstein ldquoArticulatory phonology Anoverviewrdquo Phonetica vol 49 no 3-4 pp 155ndash180 1992

[126] J A Tourville and F H Guenther ldquoThe DIVA model A neuraltheory of speech acquisition and productionrdquo Language andCognitive Processes vol 26 no 7 pp 952ndash981 2011

[127] G Csardi and T Nepusz ldquoThe igraph software package forcomplex network researchrdquo InterJournal Complex Systems vol1695 no 5 pp 1ndash9 2006

[128] M Ernestus and R H Baayen ldquoPredicting the unpredictableInterpreting neutralized segments in Dutchrdquo Language vol 79no 1 pp 5ndash38 2003

[129] R Weingarten G Nottbusch and U Will ldquoMorphemes sylla-bles and graphemes in written word productionrdquo inMultidisci-plinary Approaches to Speech Production T Pechmann and CHabel Eds pp 529ndash572 Mouton de Gruyter Berlin Germany2004

[130] R Bertram F E Toslashnnessen S Stromqvist J Hyona andP Niemi ldquoCascaded processing in written compound wordproductionrdquo Frontiers in Human Neuroscience vol 9 pp 1ndash102015

[131] T Cho ldquoEffects of morpheme boundaries on intergesturaltiming Evidence from Koreanrdquo Phonetica vol 58 no 3 pp129ndash162 2001

[132] S N Wood Generalized Additive Models Chapman ampHallCRC NY USA 2017

[133] M Seidenberg ldquoSublexical structures in visual word recogni-tion Access units or orthographic redundancyrdquo in Attentionand Performance XII M Coltheart Ed pp 245ndash264 LawrenceErlbaum Associates Hove 1987

[134] J M McQueen ldquoSegmentation of continuous speech usingphonotacticsrdquo Journal of Memory and Language vol 39 no 1pp 21ndash46 1998

[135] J B Hay Causes and Consequences of Word Structure Rout-ledge NY USA London UK 2003

[136] J B Hay ldquoFrom speech perception to morphology Affix-ordering revisitedrdquo Language vol 78 pp 527ndash555 2002

[137] J B Hay and R H Baayen ldquoPhonotactics parsing and produc-tivityrdquo Italian Journal of Linguistics vol 15 no 1 pp 99ndash1302003

[138] J Hay J Pierrehumbert and M Beckman ldquoSpeech perceptionwell-formedness and the statistics of the lexiconrdquo Papers inlaboratory phonology VI pp 58ndash74 2004

[139] M Ferro C Marzi and V Pirrelli ldquoA self-organizing modelof word storage and processing implications for morphologylearningrdquo Lingue e Linguaggio vol 10 no 2 pp 209ndash226 2011

[140] F ChersiM Ferro G Pezzulo andV Pirrelli ldquoTopological self-organization and prediction learning support both action andlexical chains in the brainrdquo Topics in Cognitive Science vol 6no 3 pp 476ndash491 2014

[141] V Pirrelli M Ferro and C Marzi ldquoComputational complexityof abstractive morphologyrdquo in Understanding and MeasuringMorphological Complexity M Bearman D Brown and G GCorbett Eds pp 141ndash166 Oxford University Press 2015

[142] C Marzi M Ferro and V Pirrelli ldquoIs inflectional irregularitydysfunctional to human processingrdquo in Abstract Booklet TheMental Lexicon 2018 V Kuperman Ed University of AlbertaEdmonton 2018

[143] E Keuleers M Stevens P Mandera and M Brysbaert ldquoWordknowledge in the crowd Measuring vocabulary size and wordprevalence in a massive online experimentrdquo The QuarterlyJournal of Experimental Psychology vol 68 no 8 pp 1665ndash16922015

[144] M Ernestus Voice Assimilation And Segment Reduction inCasual Dutch A Corpus-Based Study of the Phonology-PhoneticsInterface LOT Utrecht the Netherlands 2000

[145] R Kemps M Ernestus R Schreuder and H Baayen ldquoProcess-ing reducedword formsThe suffix restoration effectrdquoBrain andLanguage vol 90 no 1-3 pp 117ndash127 2004

[146] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) vol 1 pp 886ndash893 June 2005

[147] G Hickok ldquoThe architecture of speech production and the roleof the phoneme in speech processingrdquo Language Cognition andNeuroscience vol 29 no 1 pp 2ndash20 2014

[148] B V Tucker M Sims and R H Baayen Opposing Forceson Acoustic Duration Manuscript University of Alberta andUniversity of Tubingen 2018

Complexity 39

[149] I Plag J Homann and G Kunter ldquoHomophony and mor-phology The acoustics of word-final S in Englishrdquo Journal ofLinguistics vol 53 no 1 pp 181ndash216 2017

[150] F Tomaschek I Plag M Ernestus and R H BaayenModeling the Duration of Word-Final s in English withNaive Discriminative Learning Manuscript University ofSiegenTubingenNijmegen 2018

[151] R E Kalman ldquoA new approach to linear filtering and predictionproblemsrdquo Journal of Fluids Engineering vol 82 no 1 pp 35ndash451960

[152] B Russell ldquoOn denotingrdquo Mind vol 14 no 56 pp 479ndash4931905

[153] N Hornstein Logical Form FromGB toMinimalism Blackwell1995

[154] R Jackendoff Semantic StructuresMITPress Cambridge 1990[155] J J McCarthy ldquoA prosodic theory of non-concatenative mor-

phologyrdquo Linguistic Inquiry vol 12 pp 373ndash418 1981[156] A Ussishkin ldquoA Fixed Prosodic Theory of Nonconcatenative

Templaticmorphologyrdquo Natural Language amp Linguistic Theoryvol 23 no 1 pp 169ndash218 2005

[157] A Ussishkin ldquoAffix-favored contrast inequity and psycholin-guistic grounding for non-concatenative morphologyrdquo Mor-phology vol 16 no 1 pp 107ndash125 2006

[158] RW Young andWMorganTheNavajo Language A Grammarand Colloquial Dictionary University of New Mexico Press1980

[159] S C Levinson and A Majid ldquoThe island of time Yelı Dnye thelanguage of Rossel islandrdquo Frontiers in psychology vol 4 2013

[160] S C Levinson ldquoThe language of space in Yelı Dnyerdquo inGrammars of Space Explorations in Cognitive Diversity pp 157ndash203 Cambridge University Press 2006

[161] E Shafaei-Bajestan and R H Baayen ldquoWide Learning forAuditory Comprehensionrdquo in Proceedings of the Interspeech2018 2018

[162] J Hay and K Drager ldquoStuffed toys and speech perceptionrdquoLinguistics vol 48 no 4 pp 865ndash892 2010

[163] V Henri and C Henri ldquoOn our earliest recollections ofchildhoodrdquo Psychological Review vol 2 pp 215-216 1895

[164] S Freud ldquoChildhood and concealing memoriesrdquo in The BasicWritings of Sigmund Freud The Modern Library NY USA19051953

[165] P J Bauer and M Larkina ldquoThe onset of childhood amnesiain childhood A prospective investigation of the course anddeterminants of forgetting of early-life eventsrdquoMemory vol 22no 8 pp 907ndash924 2014

[166] M S Seidenberg and L M Gonnerman ldquoExplaining deriva-tional morphology as the convergence of codesrdquo Trends inCognitive Sciences vol 4 no 9 pp 353ndash361 2000

[167] M Coltheart K Rastle C Perry R Langdon and J ZieglerldquoDRC a dual route cascaded model of visual word recognitionand reading aloudrdquoPsychological Review vol 108 no 1 pp 204ndash256 2001

[168] M V Butz and E F Kutter How the Mind Comes into BeingIntroducing Cognitive Science from a Functional and Computa-tional Perspective Oxford University Press 2016

[169] M W Harm and M S Seidenberg ldquoPhonology readingacquisition and dyslexia Insights from connectionist modelsrdquoPsychological Review vol 106 no 3 pp 491ndash528 1999

[170] L M Gonnerman M S Seidenberg and E S AndersenldquoGraded semantic and phonological similarity effects in prim-ing Evidence for a distributed connectionist approach to

morphologyrdquo Journal of Experimental Psychology General vol136 no 2 pp 323ndash345 2007

[171] H Zhao O Gallo I Frosio and J Kautz ldquoLoss Functions forImage Restoration With Neural Networksrdquo IEEE Transactionson Computational Imaging vol 3 no 1 pp 47ndash57 2017

[172] C Clarke and P Luce ldquoPerceptual adaptation to speakercharacteristics Vot boundaries in stop voicing categorizationrdquoin Proceedings of the ISCA Workshop on Plasticity in SpeechPerception 2005

[173] DNorris JMMcQueen andACutler ldquoPerceptual learning inspeechrdquo Cognitive Psychology vol 47 no 2 pp 204ndash238 2003

[174] A Schweitzer and N Lewandowski ldquoSocial factors in conver-gence of f1 and f2 in spontaneous speechrdquo in Proceedings ofthe 10th International Seminar on Speech Production CologneGermany 2014

[175] M J Pickering and S Garrod ldquoToward a mechanistic psychol-ogy of dialoguerdquo Behavioral and Brain Sciences vol 27 no 2 pp169ndash190 2004

[176] C M Clarke and M F Garrett ldquoRapid adaptation to foreign-accented Englishrdquo The Journal of the Acoustical Society ofAmerica vol 116 no 6 pp 3647ndash3658 2004

[177] W J Levelt ldquoDie konnektionistische moderdquo Sprache UndKognition vol 10 no 2 pp 61ndash72 1991

[178] A Kostic ldquoInformational load constraints on processinginflected morphologyrdquo in Morphological Aspects of LanguageProcessing L B Feldman Ed Lawrence Erlbaum Inc Publish-ers New Jersey USA 1995

[179] J P BlevinsWord andParadigmMorphology OxfordUniversityPress 2016

[180] E Forstemann Uber Deutsche volksetymologie Zeitschrift furvergleichende Sprachforschung auf dem Gebiete des DeutschenGriechischen und Lateinischen 1852

[181] K Nault Morphological Therapy Protocal [Ph D thesis] Uni-versity of Alberta Edmonton 2010

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 28: The Discriminative Lexicon: A Unified Computational Model ...downloads.hindawi.com/journals/complexity/2019/4895891.pdf · as developed within computational linguistics and com-puter

28 Complexity

typing speaking

orthographic targets

trigrams

auditory targets

triphones

semantic vectors

TASA

auditory cues

FBS features

orthographic cues

trigrams

cochlea retina

To Ta

Ha

GoGa

KaS

Fa Fo

CoCa

Figure 11 Overview of the discriminative lexicon Input and output systems are presented in light gray the representations of the model areshown in dark gray Each of these representations is a matrix with its own set of features Arcs are labeled with the discriminative networks(transformationmatrices) thatmapmatrices onto each otherThe arc from semantic vectors to orthographic vectors is in gray as thismappingis not explored in the present study

in Baayen et al [85] In their production model for Latininflection feedback from the projected triphone paths backto the semantics (synthesis by analysis) is allowed so thatof otherwise equally well-supported paths the path that bestexpresses the targeted semantics is selected For informationflows between systems in speech production see Hickok[147]) In what follows we touch upon a series of issues thatare relevant for evaluating our model

Incremental learning In the present study we esti-mated the weights on the connections of these networkswith matrix operations but importantly these weights canalso be estimated incrementally using the learning rule ofWidrow and Hoff [44] further improvements in accuracyare expected when using the Kalman filter [151] As allmatrices can be updated incrementally (and this holds aswell for the matrix with semantic vectors which are alsotime-variant) and in theory should be updated incrementally

whenever information about the order of learning events isavailable the present theory has potential for studying lexicalacquisition and the continuing development of the lexiconover the lifetime [79]

Morphology without compositional operations Wehave shown that our networks are predictive for a rangeof experimental findings Important from the perspective ofdiscriminative linguistics is that there are no compositionaloperations in the sense of Frege and Russell [1 2 152] Thework on compositionality in logic has deeply influencedformal linguistics (eg [3 153]) and has led to the beliefthat the ldquoarchitecture of the language facultyrdquo is groundedin a homomorphism between a calculus (or algebra) ofsyntactic representations and a calculus (or algebra) based onsemantic primitives Within this tradition compositionalityarises when rules combining representations of form arematched with rules combining representations of meaning

Complexity 29

The approach of Marelli and Baroni [60] who derive thesemantic vector of a complex word from the semantic vectorof its base and a dedicated affix-specific mapping from thepertinent base words to the corresponding derived wordsis in the spirit of compositionality in that a change in form(through affixation) goes hand in hand with a change inmeaning (modeled with a linear mapping in semantic space)

The perspective on morphological processing developedin the present study breaks with this tradition Althoughsemantic vectors for inflected words are obtained by summa-tion of the semantic vectors of the base word and those ofits inflectional functions (see [85] for modeling of the richverbal paradigms of Latin) the form vectors correspondingto the resulting semantic vectors are not obtained by thesummation of vectors for chunks of form nor by any otheroperations on forms Attempts to align chunks of wordsrsquo formwith their semantic structures mdash typically represented bygraphs constructed over semantic primitives (see eg [154])mdash are destined to end in quagmires of arbitrary decisionsabout the number of inflectional classes required (Estonian30 or 300) what the chunks should look like (does Germanhave a morpheme d in its articles and demonstratives derdie das diese ) whether Semitic morphology requiresoperations on form on one or two tiers [155ndash157] andhow many affixal position slots are required for Athebascanlanguages such asNavajo [158] A language that is particularlypainful for compositional theories is Yelı Dnye an Indo-Pacific language spoken on Rossel island (in the south-east ofPapuaNewGuinea)The language has a substantial inventoryof over a 1000 function words used for verb agreementTypically these words are monosyllables that simultaneouslyspecify values for negation tense aspect person and numberof subject deixis evidentiality associated motion and coun-terfactuality Furthermore verbs typically have no less thaneight different stems the use of which depends on particularconstellations of inflectional functions [159 160] Yelı Dnyethus provides a striking counterexample to the hypothesisthat a calculus of meaning would be paralleled by a calculusof form

Whereas the theory of linear discriminative learningradically rejects the idea that morphological processing isgrounded in compositionality as defined in mainstream for-mal linguistic theory it does allow for the conjoint expressionin form of lexical meanings and inflectional functions andfor forms to map onto such conjoint semantic vectors Butjust asmultiple inflectional functions can be expressed jointlyin a single word multiple words can map onto nonconjointsemantic vectors as was illustrated above for named entityrecognition John Wiggam can be understood to refer to avery specific person without this person being a composi-tional function of the proper name John and the surnameWiggam In both these cases there is no isomorphy betweeninflectional functions and lexical meanings on the one handand chunks of form on the other hand

Of course conjoint vectors for lexical meanings andinflectional functions can be implemented andmade to workonly because a Latin verb form such as sapıvissemus isunderstood when building the semantic space S to expresslexomes for person (first) number (plural) voice (active)

tense (pluperfect) mood (subjunctive) and lexical meaning(to know) (see [85] for computational modeling details)Although this could loosely be described as a decomposi-tional approach to inflected words we prefer not to use theterm lsquodecompositionalrsquo as in formal linguistics this termas outlined above has a very different meaning A moreadequate terminology would be that the system is conjointrealizational for production and conjoint inferential for com-prehension (we note here that we have focused on temporallyconjoint realization and inference leaving temporally disjointrealization and inference for sequences of words and possiblycompounds and fixed expression for further research)

Naive versus linear discriminative learning There isone important technical difference between the learningengine of the current study ldl and naive discriminativelearning (ndl) Because ndl makes the simplifying assump-tion that outcomes are orthogonal the weights from the cuesto a given outcome are independent of the weights from thecues to the other outcomes This independence assumptionwhich motivates the word naive in naive discriminativelearning makes it possible to very efficiently update networkweights during incremental learning In ldl by contrast thisindependence no longer holds learning is no longer naiveWe therefore refer to our networks not as ndl networksbut simply as linear discriminative learning networks Actualincremental updating of ldl networks is computationallymore costly but further numerical optimization is possibleand implementations are underway

Scaling An important property of the present approachis that good results are obtained already for datasets ofmodestsize Our semantic matrix is derived from the tasa corpuswhich with only 10 million words is dwarfed by the 28 billionwords of the ukWaC corpus used by Marelli and Baroni [60]more words than any individual speaker can ever encounterin their lifetime Similarly Arnold et al [18] report goodresults for word identification when models are trained on20 hours of speech which contrasts with the huge volumesof speech required for training deep learning networks forspeech recognition Although performance increases some-whatwithmore training data [161] it is clear that considerableheadway can be made with relatively moderate volumesof speech It thus becomes feasible to train models onfor instance Australian English and New Zealand Englishusing already existing speech corpora and to implementnetworks that make precise quantitative predictions for howunderstanding is mediated by listenersrsquo expectations abouttheir interlocutors (see eg [162])

In this study we have obtained good results with asemantic matrix with a dimensionality of around 4000lexomes By itself we find it surprising that a semantic vectorspace can be scaffolded with a mere 4000 words But thisfindingmay also shed light on the phenomenon of childhoodamnesia [163 164] Bauer and Larkina [165] pointed outthat young children have autobiographical memories thatthey may not remember a few years later They argue thatthe rate of forgetting diminishes as we grow older andstabilizes in adulthood Part of this process of forgettingmay relate to the changes in the semantic matrix Cruciallywe expect the correlational structure of lexomes to change

30 Complexity

considerably over time as experience accumulates If this isindeed the case the semantic vectors for the lexomes foryoung children are different from the corresponding lexomesfor older children which will again differ from those ofadults As a consequence autobiographical memories thatwere anchored to the lexomes at a young age can no longerbe accessed as the semantic vectors to which these memorieswere originally anchored no longer exist

No examplars Given how we estimate the transforma-tion matrices with which we navigate between form andmeaning it might seem that our approach is in essence animplementation of an exemplar-based theory of the mentallexicon After all for instance the C119886 matrix specifies foreach auditory word form (exemplar) its pertinent frequencyband summary features However the computational imple-mentation of the present study should not be confused withthe underlying algorithmic conceptualization The underly-ing conceptualization is that learning proceeds incrementallytrial by trial with the weights of transformation networksbeing updated with the learning rule of Widrow and Hoff[44] In the mean trial-by-trial updating with the Widrow-Hoff learning rule results in the same expected connectionweights as obtained by the present estimates using thegeneralized inverse The important conceptual advantage ofincremental learning is however that there is no need to storeexemplars By way of example for auditory comprehensionacoustic tokens are given with the input to the learningsystemThese tokens the ephemeral result of interactingwiththe environment leave their traces in the transformationmatrix F119886 but are not themselves stored The same holdsfor speech production We assume that speakers dynamicallyconstruct the semantic vectors from their past experiencesrather than retrieving these semantic vectors from somedictionary-like fixed representation familiar from currentfile systems The dynamic nature of these semantic vectorsemerged loud and clear from the modeling of novel complexwords for speech production under cross-validation In otherwords our hypothesis is that all input and output vectors areephemeral in the sense that they represent temporary statesof the cognitive system that are not themselves represented inthat system but that leave their traces in the transformationmatrices that jointly characterize and define the cognitivesystem that we approximate with the discriminative lexicon

This dynamic perspective on the mental lexicon asconsisting of several coupled networks that jointly bringthe cognitive system into states that in turn feed into fur-ther networks (not modeled here) for sentence integration(comprehension) and articulation (production) raises thequestion about the status of units in this theory Units playan important role when constructing numeric vectors forform and meaning Units for letter trigrams and phonetrigrams are of course context-enriched letter and phoneunits Content lexomes as well as lexomes for derivationaland inflectional functions are crutches for central semanticfunctions ranging from functions realizing relatively concreteonomasiological functions (lsquothis is a dog not a catrsquo) toabstract situational functions allowing entities events andstates to be specified on the dimensions of time spacequantity aspect etc However crucial to the central thrust of

the present study there are no morphological symbols (unitscombining form andmeaning) normorphological formunitssuch as stems and exponents of any kind in the modelAbove we have outlined the many linguistic considerationsthat have led us not to want to incorporate such units aspart of our theory Importantly as there are no hidden layersin our model it is not possible to seek for lsquosubsymbolicrsquoreflexes of morphological units in patterns of activation overhidden layers Here we depart from earlier connectionistmodels which rejected symbolic representations but retainedthe belief that there should be morphological representationalbeit subsymbolic ones Seidenberg and Gonnerman [166]for instance discuss morphological representations as beinglsquointerlevel representationsrsquo that are emergent reflections ofcorrelations among orthography phonology and semanticsThis is not to say that themore agglutinative a language is themore the present model will seem to the outside observer tobe operating withmorphemes But themore a language tendstowards fusional or polysyntheticmorphology the less strongthis impression will be

Model complexity Next consider the question of howdifficult lexical processing actually is Sidestepping the com-plexities of the neural embedding of lexical processing [126147] here we narrow down this question to algorithmiccomplexity State-of-the-art models in psychology [37 114167] implementmany layers of hierarchically organized unitsand many hold it for an established fact that such units arein some sense psychologically real (see eg [19 91 168])However empirical results can be equally well understoodwithout requiring the theoretical construct of the morpheme(see eg [47 54 58 166 169 170]) The present studyillustrates this point for lsquoboundaryrsquo effects in written andoral production When paths in the triphone graph branchuncertainty increases and processing slows down Althoughobserved lsquoboundaryrsquo effects are compatible with a post-Bloomfieldian lexicon the very same effects arise in ourmodel even though morphemes do not figure in any way inour computations

Importantly the simplicity of the linear mappings in ourmodel of the discriminative lexicon allows us to sidestep theproblem that typically arise in computationally implementedfull-scale hierarchical systems namely that errors arisingat lower levels propagate to higher levels The major stepsforward made in recent end-to-end models in languageengineering suggest that end-to-end cognitive models arealso likely to perform much better One might argue thatthe hidden layers of deep learning networks represent thetraditional lsquohidden layersrsquo of phonemes morphemes andword forms mediating between form and meaning in lin-guistic models and offshoots thereof in psychology Howeverthe case of baboon lexical learning [12 13] illustrates thatthis research strategy is not without risk and that simplerdiscriminative networks can substantially outperform deeplearning networks [15] We grant however that it is conceiv-able that model predictions will improve when the presentlinear networks are replaced by deep learning networks whenpresented with the same input and output representationsused here (see Zhao et al [171] for a discussion of lossfunctions for deep learning targeting numeric instead of

Complexity 31

categorical output vectors) What is unclear is whethersuch networks will improve our understandingmdashthe currentlinear networks offer the analyst connection weights betweeninput and output representations that are straightforwardlylinguistically interpretable Where we think deep learningnetworks really will come into their own is bridging the gapbetween for instance the cochlea and retina on the one handand the heuristic representations (letter trigrams) that wehave used as a starting point for our functional theory of themappings between form and meaning

Network flexibility An important challenge for deeplearning networks designed to model human lexical process-ing is to keep the networks flexible and open to rapid updat-ing This openness to continuous learning is required notonly by findings on categorization [40 41 41 42] includingphonetic categorization [172 173] but is also visible in timeseries of reaction times Baayen and Hendrix [17] reportedimproved prediction of visual lexical decision reaction timesin the British Lexicon Project when a naive discriminativelearning network is updated from trial to trial simulating thewithin-experiment learning that goes on as subjects proceedthrough the experiment In natural speech the phenomenonof phonetic convergence between interlocutors [174] likewisebears witness to a lexicon that is constantly recalibratedin interaction with its environment (for the many ways inwhich interlocutorsrsquo speech aligns see [175]) Likewise therapidity with which listeners adapt to foreign accents [176]within a few minutes shows that lexical networks exhibitfast local optimization As deep learning networks typicallyrequire huge numbers of cycles through the training dataonce trained they are at risk of not having the requiredflexibility for fast local adaptation This risk is reduced for(incremental) wide learning as illustrated by the phoneticconvergence evidenced by the model of Arnold et al [18] forauditory comprehension (a very similar point was made byLevelt [177] in his critique of connectionist models in thenineties)

Open questions This initial study necessarily leavesmany questions unanswered Compounds and inflectedderived words have not been addressed and morphologicaloperations such as reduplication infixation and nonconcate-native morphology provide further testing grounds for thepresent approach We also need to address the processingof compounds with multiple constituents ubiquitous inlanguages as different as German Mandarin and EstonianFurthermore in actual utterances words undergo assimila-tion at their boundaries and hence a crucial test case forthe discriminative lexicon is to model the production andcomprehension of words in multi-word utterances For ourproduction model we have also completely ignored stresssyllabification and pitch which however likely has renderedthe production task more difficult than necessary

A challenge for discriminative morphology is the devel-opment of proper semantic matrices For languages withrich morphology such as Estonian current morphologicalparsers will identify case-inflected words as nominativesgenitives or partitives (among others) but these labelsfor inflectional forms do not correspond to the semanticfunctions encoded in these forms (see eg [178 179]) What

is required are computational tools that detect the appropriateinflectional lexomes for these words in the sentences orutterances in which they are embedded

A related problem specifically for the speech productionmodel is how to predict the strongly reduced word formsthat are rampant in conversational speech [111] Our auditorycomprehension network F119886 is confronted with reduced formsin training and the study of Arnold et al [18] indicates thatpresent identification accuracymay approximate human per-formance for single-word recognition Whereas for auditorycomprehension we are making progress what is required forspeech production is a much more detailed understanding ofthe circumstances under which speakers actually use reducedword forms The factors driving reduction may be in partcaptured by the semantic vectors but we anticipate thataspects of the unfolding discourse play an important role aswell which brings us back to the unresolved question of howto best model not isolated words but words in context

An important challenge for our theory which formalizesincremental implicit learning is how to address and modelthe wealth of explicit knowledge that speakers have abouttheir language We play with the way words sound in poetrywe are perplexed by wordsrsquo meanings and reinterpret themso that they make more sense (eg the folk etymology [180]reflected in modern English crawfish a crustacean and nota fish originally Middle English crevis compare ecrevissein French) we teach morphology in schools and we evenfind that making morphological relations explicit may bebeneficial for aphasic patients [181] The rich culture of worduse cannot but interact with the implicit system but howexactly this interaction unfolds andwhat its consequences areat both sides of the consciousunconscious divide is presentlyunclear

Although many questions remain our model is remark-ably successful in capturing many aspects of lexical andmorphological processing in both comprehension and pro-duction Since themodel builds on a combination of linguisti-callymotivated lsquosmartrsquo low-level representations for form andmeaning and large but otherwise straightforward andmathe-matically well-understood two-layer linear networks (to beclear networks without any hidden layers) the conclusionseems justified that algorithmically the ldquomental lexiconrdquomaybe much simpler than previously thought

Appendix

A Vectors Matrices and MatrixMultiplication

Figure 12 presents two data points 119886 and 119887 with coordinates(3 4) and (-2 -3) Arrows drawn from the origin to thesepoints are shown in blue with a solid and dashed linerespectively We refer to these points as vectors Vectors aredenoted with lower case letters in bold and we place the 119909coordinate of a point next to the 119910 coordinate

a = (3 4) b = (minus2 minus3) (A1)

32 Complexity

minus4 minus2 0 2 4

minus4

minus2

02

4

x

y

a

b

x

y

Figure 12 Points 119886 and 119887 can be transformed into points 119901 and 119902 bya linear transformation

We can represent a and b jointly by placing them next to eachother in a matrix For matrices we use capital letters in boldfont

A = ( 3 4minus2 minus3) (A2)

In addition to the points 119886 and 119887 Figure 12 also shows twoother datapoints 119909 and 119910 The matrix for these two points is

B = (minus5 24 minus1) (A3)

Let us assume that points 119886 and 119887 represent the forms oftwo words 1205961 and 1205962 and that the points 119909 and 119910 representthe meanings of these two words (below we discuss in detailhow exactly wordsrsquo forms and meanings can be representedas vectors of numbers) As morphology is the study of therelation between wordsrsquo forms and their meanings we areinterested in how to transform points 119886 and b into points119909 and 119910 and likewise in how to transform points 119909 and119910 back into points 119886 and 119887 The first transformation isrequired for comprehension and the second transformationfor production

We begin with the transformation for comprehensionFormally we have matrix A which we want to transforminto matrix B A way of mapping A onto B is by means ofa linear transformation using matrix multiplication For 2 times2 matrices matrix multiplication is defined as follows(1198861 11988621198871 1198872) (1199091 11990921199101 1199102) = (11988611199091 + 11988621199101 11988611199092 + 1198862119910211988711199091 + 11988721199101 11988711199092 + 11988721199102) (A4)

Such a mapping of A onto B is given by a matrix F

F = ( 1 2minus2 minus1) (A5)

and it is straightforward to verify that indeed( 3 4minus2 minus3) ( 1 2minus2 minus1) = (minus5 24 minus1) (A6)

Using matrix notation we can write

AF = B (A7)

Given A and B how do we obtain F The answeris straightforward but we need two further concepts theidentity matrix and the matrix inverse For multiplication ofnumbers the identity multiplication is multiplication with 1For matrices multiplication with the identity matrix I leavesthe locations of the points unchangedThe identitymatrix hasones on the main diagonal and zeros elsewhere It is easilyverified that indeed(1 00 1) (1199091 11990921199101 1199102) = (1199091 11990921199101 1199102) (1 00 1) = (1199091 11990921199101 1199102) (A8)

For numbers the inverse of multiplication with 119904 is dividingby 119904 (1119904 ) (119904119909) = 119909 (A9)

For matrices the inverse of a square matrix X is that matrixXminus1 such that their product is the identity matrix

Xminus1X = XXminus1 = I (A10)

For nonsquare matrices the inverse Yminus1 is defined such that

Yminus1 (YX) = (XY) 119884minus1 = X (A11)

We find the square matrix F that maps the square matrices Aonto B as follows

AF = B

Aminus1AF = Aminus1B

IF = Aminus1B

F = Aminus1B(A12)

Since for the present example A and its inverse happen to beidentical (Aminus1 = A) we obtain (A6)

Fmaps wordsrsquo form vectors onto wordsrsquo semantic vectorsLet us now consider the reverse and see how we can obtainwordsrsquo form vectors from wordsrsquo semantic vectors That isgiven B we want to find that matrix G which maps B ontoA ie

BG = A (A13)

Solving this equation proceeds exactly as above

BG = A

Bminus1BG = Bminus1A

IG = Bminus1A

G = Bminus1A(A14)

Complexity 33

The inverse of B is

Bminus1 = (13 2343 53) (A15)

and hence we have

(13 2343 53) ( 3 4minus2 minus3) = (minus13 minus2323 13 ) = G (A16)

The inverse of a matrix needs not exist A square matrixthat does not have an inverse is referred to as a singularmatrix Almost all matrices with which we work in theremainder of this study are singular For singular matricesan approximation of the inverse can be used such as theMoore-Penrose generalized inverse (In R the generalizedinverse is available in theMASS package function ginv Thenumeric library that is used by ginv is LAPACK availableat httpwwwnetliborglapack) In this study wedenote the generalized inverse of a matrix X by X1015840

The examples presented here are restricted to 2 times 2matrices but themathematics generalize tomatrices of largerdimensions When multiplying two matrices X and Y theonly constraint is that the number of columns ofX is the sameas the number of rows ofYThe resultingmatrix has the samenumber of rows as X and the same number of columns as Y

In the present study the rows of matrices represent wordsand columns the features of thesewords For instance the rowvectors of A can represent wordsrsquo form features and the rowvectors of and B the corresponding semantic features Thefirst row vector of A is mapped onto the first row vector ofB as follows (3 4) ( 1 2minus2 minus1) = (minus5 2) (A17)

A transformationmatrix such as F can be represented as atwo-layer networkThe network corresponding to F is shownin Figure 13 When the input vector (3 4) is presented to thenetwork the nodes 1198941 and 1198942 are set to 3 and 4 respectively Toobtain the activations 1199001 = minus5 and 1199002 = 2 for the output vector(minus5 2) we cumulate the evidence at the input weighted bythe connection strengths specified on the edges of the graphThus for 1199001 we obtain the value 3 times 1 + 4 times minus2 = minus5 and for1199002 we have 3 times 2 + 4 times minus1 = 2 In other words the networkmaps the input vector (3 4) onto the output vector (minus5 2)

An important property of linear maps is that they areproductive We can present a novel form vector say

s = (2 2) (A18)

to the network and it will map this vector onto a novelsemantic vector Using matrices this new semantic vector isstraightforwardly calculated(2 2) ( 1 2minus2 minus1) = (minus2 2) (A19)

1 2 minus2 minus1

i1 i2

o1 o2

semantic vectors come out here

form features go in here

connection weightsare the transformationmatrix

Figure 13 The linear network corresponding to the transformationmatrix F There are two input nodes 1198941 and 1198942 (in blue)and twooutput nodes 1199001 and 1199002 in red When the input vector (3 4) ispresented to the network the value at 1199001 is 3 times 1 + 4 times minus2 = minus5and that at 1199002 is 3 times 2 + 4 times minus1 = 2 The network maps the point(3 4) onto the point (minus5 2)B Graph-Based Triphone Sequencing

The hypothesis underlying graph-based triphone sequencingis that in the directed graph that has triphones as verticesand an edge between any two vertices that overlap (iesegments 2 and 3 of triphone 1 are identical to segments 1and 2 of triphone 2) the path from a wordrsquos initial triphone(the triphone starting with ) to a wordrsquos final triphone (thetriphone ending with a ) is the path receiving the bestsupport of all possible paths from the initial to the finaltriphone Unfortunately the directed graph containing allvertices and edges is too large tomake computations tractablein a reasonable amount of computation time The followingalgorithm is a heuristic algorithm that first collects potentiallyrelevant vertices and edges and then calculates all pathsstarting fromany initial vertex using theall simple pathsfunction from the igraph package [127] As a first step allvertices that are supported by the stem and whole wordwith network support exceeding 01 are selected To theresulting set those vertices are added that are supported bythe affix which is conditional on these vertices having anetwork support exceeding 095 A directed graph can nowbe constructed and for each initial vertex all paths startingat these vertices are calculated The subset of those pathsreaching a final vertex is selected and the support for eachpath is calculated As longer paths trivially will have greatersupport the support for a path is weighted for its lengthsimply by dividing the raw support by the path length Pathsare ranked by weighted path support and the path withmaximal support is selected as the acoustic image drivingarticulation

It is possible that no path from an initial vertex to a finalvertex is found due to critical boundary triphones not beinginstantiated in the data on which the model was trainedThis happens when the model is assessed on novel inflected

34 Complexity

nobo

dyso

met

hing

ever

ythi

ngno

thin

gev

eryo

nean

ythi

ngan

yone

som

eone

hers

him his

via

than I

exce

ptou

tside

thei

rsw

hoev

erth

ence

min

ein

side

amon

gst its

thei

rou

rm

yyo

uyo

ur her

she

your

selv

esw

hate

ver us we it

they he

them m

em

yself

him

self

hers

elfou

rselv

esyo

urse

lfits

elfth

emse

lves

abou

tw

ithou

tw

ithin

beyo

ndam

ong

betw

een

thro

ugho

utdu

ring

since by

from in of

with at fo

r tobe

low

abov

e up offdo

wn

upon on

agai

nst

tow

ard

tow

ards

arou

ndne

aral

ong

unde

rbe

hind

besid

eth

roug

hon

toin

toov

erac

ross

nobodysomethingeverythingnothingeveryoneanythinganyonesomeonehershimhisviathanIexceptoutsidetheirswhoeverthencemineinsideamongstitstheirourmyyouyourhersheyourselveswhateverusweittheyhethemmemyselfhimselfherselfourselvesyourselfitselfthemselvesaboutwithoutwithinbeyondamongbetweenthroughoutduringsincebyfrominofwithatfortobelowaboveupoffdownupononagainsttowardtowardsaroundnearalongunderbehindbesidethroughontointooveracross

minus01 0 01 02Value

Color Keyand Histogram

050

015

0025

00C

ount

Figure 14 Heatmap for the correlations of the semantic vectors of pronouns and prepositions Note that quantifier pronouns form a groupof their own that prepositions and the other pronouns form distinguisable groups and that within groups further subclusters are visible egforwe and us she and her and across and over All diagonal elements represent correlations equal to 1 (this is not brought out by the heatmapwhich color-codes diagonal elements with the color for the highest correlation of the range we specified 025)

and derived words under cross-validationTherefore verticesand edges are added to the graph for all nonfinal verticesthat are at the end of any of the paths starting at any initialtriphone Such novel vertices and edges are assigned zeronetwork support Typically paths from the initial vertex tothe final vertex can now be constructed and path support isevaluated as above

For an algorithm that allows triphone paths to includecycles which may be the case in languages with much richer

morphology than English see Baayen et al [85] and for codethe R packageWpmWithLdl described therein

C A Heatmap for Function Words(Pronouns and Prepositions)

The heatmap for the correlations of the semantic vectors ofpronouns and prepositions is presented in Figure 14

Complexity 35

Data Availability

With the exception of the data from theDistributed Little RedHen Lab all primary data are publicly available from the citedsources

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The authors are indebted to Geoff Hollis Jessie Nixon ChrisWestbury and LuisMienhardt for their constructive feedbackon earlier versions of this manuscript This research wassupported by an ERC advancedGrant (no 742545) to the firstauthor

References

[1] G Frege ldquoBegriffsschrift a formula language modeled uponthat of arithmetic for pure thoughtrdquo in From Frege to GodelA Source Book in Mathematical Logic 1879-1931 pp 1ndash82 1967

[2] B RussellAn Inquiry intoMeaning andTruth Allen andUnwinLondon UK 1942

[3] R Montague ldquoThe proper treatment of quantification in ordi-nary englishrdquo in Approaches to Natural Language pp 221ndash242Springer 1973

[4] N Chomsky Syntactic Structures vol 33 Mouton The Hague1957

[5] W J M Levelt Formal Grammars in Linguistics And Psycholin-guistics Volume 1 An Introduction to the Theory of FormalLanguages and Automata Volume 2 Applications in LinguisticTheory Volume 3 Psycholinguistic Applications John BenjaminsPublishing 2008

[6] D F Kleinschmidt and T Florian Jaeger ldquoRobust speechperception Recognize the familiar generalize to the similar andadapt to the novelrdquo Psychological Review vol 122 no 2 pp 148ndash203 2015

[7] L Breiman ldquoRandom forestsrdquo Machine Learning vol 45 no 1pp 5ndash32 2001

[8] C Strobl A-L Boulesteix A Zeileis and T Hothorn ldquoBiasin random forest variable importance measures illustrationssources and a solutionrdquo BMC Bioinformatics vol 8 2007

[9] A Hannun C Case J Casper et al ldquoDeep speechScaling up end-to-end speech recognitionrdquo 2014httpsarxivorgabs14125567

[10] Z Tuske P Golik R Schluter and H Ney ldquoAcoustic modelingwith deep neural networks using raw time signal for lvcsrrdquo inProceedings of the INTERSPEECH pp 890ndash894 2014

[11] J Schmidhuber ldquoDeep learning in neural networks anoverviewrdquo Neural Networks vol 61 pp 85ndash117 2015

[12] T Hannagan J C Ziegler S Dufau J Fagot and J GraingerldquoDeep learning of orthographic representations in baboonsrdquoPLoS ONE vol 9 no 1 2014

[13] J Grainger S Dufau M Montant J C Ziegler and J FagotldquoOrthographic processing in baboons (Papio papio)rdquo Sciencevol 336 no 6078 pp 245ndash248 2012

[14] D Scarf K Boy A U Reinert J Devine O Gunturkun andM Colombo ldquoOrthographic processing in pigeons (columba

livia)rdquo Proceedings of the National Academy of Sciences vol 113no 40 pp 11272ndash11276 2016

[15] M Linke F Broker M Ramscar and R H Baayen ldquoArebaboons learning ldquoorthographicrdquo representations Probablynotrdquo PLoS ONE vol 12 no 8 2017

[16] D H Hubel and T N Wiesel ldquoReceptive fields binocularinteraction and functional architecture in the catrsquos visualcortexrdquo The Journal of Physiology vol 160 pp 106ndash154 1962

[17] R H Baayen and P Hendrix ldquoTwo-layer networks non-linear separation and human learningrdquo in From Semantics toDialectometry Festschrift in honor of JohnNerbonne Tributes 32M Wieling M Kroon G Van Noord and G Bouma Eds pp13ndash22 College Publications 2017

[18] D Arnold F Tomaschek K Sering F Lopez and R H BaayenldquoWords from spontaneous conversational speech can be rec-ognized with human-like accuracy by an error-driven learningalgorithm that discriminates between meanings straight fromsmart acoustic features bypassing the phoneme as recognitionunitrdquo PLoS ONE vol 12 no 4 2017

[19] P Zwitserlood ldquoProcessing and representation of morphologi-cal complexity in native language comprehension and produc-tionrdquo in The Construction of Words G E Booij Ed pp 583ndash602 Springer 2018

[20] G Booij ldquoConstruction morphologyrdquo Language and LinguisticsCompass vol 4 no 7 pp 543ndash555 2010

[21] J P Blevins ldquoWord-based morphologyrdquo Journal of Linguisticsvol 42 no 3 pp 531ndash573 2006

[22] G Stump Inflectional Morphology A Theory of ParadigmStructure Cambridge University Press 2001

[23] R Beard ldquoOn the extent and nature of irregularity in thelexiconrdquo Lingua vol 42 no 4 pp 305ndash341 1977

[24] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1974

[25] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1991

[26] C Hockett ldquoThe origin of speechrdquo Scientific American vol 203pp 89ndash97 1960

[27] M Erelt Estonian Language M Erelt Ed Estonian AcademyPublishers Tallinn Estonia 2003

[28] P H Matthews Grammatical Theory in the United States fromBloomfield to Chomsky Cambridge University Press Cam-bridge 1993

[29] N Chomsky andMHalleTheSound Pattern of English Harperand Row NY USA 1968

[30] T Landauer and S Dumais ldquoA solution to Platorsquos problem Thelatent semantic analysis theory of acquisition induction andrepresentation of knowledgerdquo Psychological Review vol 104 no2 pp 211ndash240 1997

[31] W Weaver ldquoTranslationrdquo in Machine Translation of LanguagesFourteen Essays W N Locke and A D Booth Eds pp 15ndash23MIT Press Cambridge 1955

[32] J R Firth Selected Papers of J R Firth 1952-59 IndianaUniversity Press 1968

[33] K Lund and C Burgess ldquoProducing high-dimensional seman-tic spaces from lexical co-occurrencerdquo Behavior Research Meth-ods Instruments and Computers vol 28 no 2 pp 203ndash2081996

[34] C Shaoul and C Westbury ldquoExploring lexical co-occurrencespace using HiDExrdquo Behavior Research Methods vol 42 no 2pp 393ndash413 2010

36 Complexity

[35] T Mikolov I Sutskever K Chen G Corrado and J DeanldquoDistributed representations of words and phrases and theircompositionalityrdquo inAdvances in Neural Information ProcessingSystems pp 3111ndash3119 2013

[36] A Roelofs ldquoThe WEAVER model of word-form encoding inspeech productionrdquo Cognition vol 64 no 3 pp 249ndash284 1997

[37] W J M Levelt A Roelofs and A S Meyer ldquoA theory of lexicalaccess in speech productionrdquoBehavioral and Brain Sciences vol22 no 1 pp 1ndash38 1999

[38] M Taft ldquoInteractive-activation as a Framework for Under-standing Morphological Processingrdquo Language and CognitiveProcesses vol 9 no 3 pp 271ndash294 1994

[39] R Schreuder and R H Baayen ldquoModeling morphologicalprocessingrdquo inMorphological Aspects of Language Processing LB Feldman Ed pp 131ndash154 Lawrence ErlbaumHillsdale NewJersey USA 1995

[40] B C Love D L Medin and T M Gureckis ldquoSUSTAIN ANetwork Model of Category Learningrdquo Psychological Reviewvol 111 no 2 pp 309ndash332 2004

[41] C J Marsolek ldquoWhat antipriming reveals about primingrdquoTrends in Cognitive Sciences vol 12 no 5 pp 176ndash181 2008

[42] M Ramscar and R Port ldquoCategorization (without categories)rdquoin Handbook of Cognitive Linguistics E Dabrowska and DDivjak Eds pp 75ndash99 De Gruyter Berlin Germany 2015

[43] R A Rescorla and A R Wagner ldquoA theory of Pavlovianconditioning Variations in the effectiveness of reinforcementand nonreinforcementrdquo in Classical Conditioning II CurrentResearch and Theory A H Black and W F Prokasy Eds pp64ndash99 Appleton Century Crofts NY USA 1972

[44] B Widrow and M E Hoff ldquoAdaptive switching circuitsrdquo 1960WESCON Convention Record Part IV pp 96ndash104 1960

[45] R H Baayen R Schreuder and R Sproat ldquoMorphology inthe Mental Lexicon A Computational Model for Visual WordRecognitionrdquo in Lexicon Development for Speech and LanguageProcessing F van Eynde and D Gibbon Eds pp 267ndash291Kluwer Academic Publishers 2000

[46] M Taft ldquoA morphological-decomposition model of lexicalrepresentationrdquo Linguistics vol 26 no 4 pp 657ndash667 1988

[47] MWHarmandM S Seidenberg ldquoComputing themeanings ofwords in reading Cooperative division of labor between visualand phonological processesrdquo Psychological Review vol 111 no3 pp 662ndash720 2004

[48] P C Trimmer J M McNamara A Houston and J A RMarshall ldquoDoes natural selection favour the Rescorla-Wagnerrulerdquo Journal of Theoretical Biology vol 302 pp 39ndash52 2012

[49] R R Miller R C Barnet and N J Grahame ldquoAssessment ofthe Rescorla-Wagner modelrdquo Psychological Bulletin vol 117 no3 pp 363ndash386 1995

[50] L J Kamin ldquoPredictability surprise attention and condition-ingrdquo in Punishment and Aversive Behavior B A Campbell andRMChurch Eds pp 276ndash296Appleton-Century-CroftsNYUSA 1969

[51] R A Rescorla ldquoPavlovian conditioning Itrsquos not what you thinkit isrdquo American Psychologist vol 43 no 3 pp 151ndash160 1988

[52] M Ramscar D Yarlett M Dye K Denny and K ThorpeldquoThe Effects of Feature-Label-Order and Their Implications forSymbolic Learningrdquo Cognitive Science vol 34 no 6 pp 909ndash957 2010

[53] M Ramscar and D Yarlett ldquoLinguistic self-correction in theabsence of feedback A new approach to the logical problem oflanguage acquisitionrdquo Cognitive Science vol 31 no 6 pp 927ndash960 2007

[54] R H Baayen P Milin D F ETHurđevic P Hendrix and MMarelli ldquoAn Amorphous Model for Morphological Process-ing in Visual Comprehension Based on Naive DiscriminativeLearningrdquo Psychological Review vol 118 no 3 pp 438ndash4812011

[55] F M Del Prado Martın R Bertram T Haikio R Schreuderand R H Baayen ldquoMorphological family size in a morphologi-cally rich languageThe case of Finnish compared to Dutch andHebrewrdquo Journal of Experimental Psychology LearningMemoryand Cognition vol 30 pp 1271ndash1278 2004

[56] P Milin D F Durdevic and F M Del Prado Martın ldquoThesimultaneous effects of inflectional paradigms and classes onlexical recognition Evidence from Serbianrdquo Journal of Memoryand Language vol 60 no 1 pp 50ndash64 2009

[57] B K Bergen ldquoThe psychological reality of phonaesthemesrdquoLanguage vol 80 no 2 pp 290ndash311 2004

[58] P Milin L B Feldman M Ramscar P Hendrix and R HBaayen ldquoDiscrimination in lexical decisionrdquo PLoS ONE vol 12no 2 2017

[59] R H Baayen P Milin and M Ramscar ldquoFrequency in lexicalprocessingrdquo Aphasiology vol 30 no 11 pp 1174ndash1220 2016

[60] M Marelli and M Baroni ldquoAffixation in semantic space Mod-eling morpheme meanings with compositional distributionalsemanticsrdquo Psychological Review vol 122 no 3 pp 485ndash5152015

[61] T Sering PMilin andRH Baayen ldquoLanguage comprehensionas amultiple label classification problemrdquo StatisticaNeerlandicapp 1ndash15 2018

[62] R Kaye and RWilson Linear Algebra Oxford University Press1998

[63] S H Ivens and B L Koslin Demands for Reading LiteracyRequire New Accountability Methods Touchstone Applied Sci-ence Associates 1991

[64] T K Landauer P W Foltz and D Laham ldquoAn introduction tolatent semantic analysisrdquoDiscourse Processes vol 25 no 2-3 pp259ndash284 1998

[65] H Schmid ldquoImprovements in Part-of-Speech Tagging with anApplication to Germanrdquo in Proceedings of the ACL SIGDAT-Workshop pp 13ndash25 Dublin Ireland 1995

[66] R H Baayen R Piepenbrock and L Gulikers The CELEXLexical Database (CD-ROM) Linguistic Data ConsortiumUniversity of Pennsylvania Philadelphia PA 1995

[67] J Mitchell and M Lapata ldquoVector-based models of semanticcompositionrdquo in Proceedings of the ACL pp 236ndash244 2008

[68] A Lazaridou M Marelli R Zamparelli and M BaronildquoCompositionally derived representations of morphologicallycomplex words in distributional semanticsrdquo in Proceedings ofthe ACL (1) pp 1517ndash1526 2013

[69] R Cotterell H Schutze and J Eisner ldquoMorphological smooth-ing and extrapolation of word embeddingsrdquo in Proceedings ofthe 54th Annual Meeting of the Association for ComputationalLinguistics vol 1 pp 1651ndash1660 2016

[70] B D Zeller S Pado and J Snajder ldquoTowards semantic valida-tion of a derivational lexiconrdquo in Proceedings of the COLING2014 the 25th International Conference on Computational Lin-guistics Technical Papers pp 1728ndash1739 2014

[71] H Marchand The Categories and Types of Present-Day EnglishWord Formation A Synchronic-Diachronic Approach BeckrsquoscheVerlagsbuchhandlung Munchen Germany 1969

[72] L Bauer EnglishWord-Formation Cambridge University PressCambridge 1983

Complexity 37

[73] I PlagWord-Formation in English CambridgeUniversity PressCambridge 2003

[74] R Beard Lexeme-Morpheme Base Morphology A GeneralTheory of Inflection and Word Formation State University ofNew York Press Albany NY USA 1995

[75] K Geeraert J Newman and R H Baayen ldquoIdiom VariationExperimental Data and a Blueprint of a Computational ModelrdquoTopics in Cognitive Science vol 9 no 3 pp 653ndash669 2017

[76] C Shaoul S Bitschau N Schilling et al ldquondl2 Naive discrim-inative learning an implementation in R R packagerdquo 2015

[77] P Milin D Divjak and R H Baayen ldquoA learning perspectiveon individual differences in skilled reading Exploring andexploiting orthographic and semantic discrimination cuesrdquoJournal of Experimental Psychology Learning Memory andCognition vol 43 no 11 pp 1730ndash1751 2017

[78] M Ramscar P Hendrix C Shaoul P Milin and R H BaayenldquoNonlinear dynamics of lifelong learning the myth of cognitivedeclinerdquo Topics in Cognitive Science vol 6 pp 5ndash42 2014

[79] M Ramscar C C Sun P Hendrix and R H Baayen ldquoTheMis-measurement of Mind Life-Span Changes in Paired-Associate-Learning Scores Reflect the ldquoCostrdquo of Learning Not CognitiveDeclinerdquo Psychological Science vol 28 no 8 pp 1171ndash1179 2017

[80] G desRosiers and D Ivison ldquoPaired associate learning Form 1and form 2 of the Wechsler memory scalerdquo Archives of ClinicalNeuropsychology vol 3 no 1 pp 47ndash67 1988

[81] E Bruni N K Tran andM Baroni ldquoMultimodal distributionalsemanticsrdquo Journal of Artificial Intelligence Research vol 49 pp1ndash47 2014

[82] W N Venables and B D RipleyModern Applied Statistics withS-PLUS Springer NY USA 2002

[83] A B Warriner V Kuperman and M Brysbaert ldquoNorms ofvalence arousal and dominance for 13915 English lemmasrdquoBehavior Research Methods vol 45 no 4 pp 1191ndash1207 2013

[84] D Kastovsky ldquoProductivity in word formationrdquo Linguistics vol24 no 3 pp 585ndash600 1986

[85] R H Baayen Y Chuang and J P Blevins ldquoInflectionalmorphology with linear mappingsrdquoTheMental Lexicon vol 13no 2 pp 232ndash270 2018

[86] Burnage CELEX A Guide for Users Centre for Lexical Infor-mation Nijmegen The Netherlands 1988

[87] C McBride-Chang ldquoModels of Speech Perception and Phono-logical Processing in Readingrdquo Child Development vol 67 no4 pp 1836ndash1856 1996

[88] G C Van Orden H Kloos et al ldquoThe question of phonologyand readingrdquo inThe Science of Reading A Handbook pp 61ndash782005

[89] D Jared and K OrsquoDonnell ldquoSkilled adult readers activate themeanings of high-frequency words using phonology Evidencefrom eye trackingrdquoMemoryampCognition vol 45 no 2 pp 334ndash346 2017

[90] G S Dell ldquoA Spreading-Activation Theory of Retrieval inSentence Productionrdquo Psychological Review vol 93 no 3 pp283ndash321 1986

[91] A Marantz ldquoNo escape from morphemes in morphologicalprocessingrdquo Language and Cognitive Processes vol 28 no 7 pp905ndash916 2013

[92] M Bozic W D Marslen-Wilson E A Stamatakis M HDavis and L K Tyler ldquoDifferentiating morphology formand meaning Neural correlates of morphological complexityrdquoCognitive Neuroscience vol 19 no 9 pp 1464ndash1475 2007

[93] R F Port andA P Leary ldquoAgainst formal phonologyrdquoLanguagevol 81 no 4 pp 927ndash964 2005

[94] J P Blevins ldquoStems and paradigmsrdquo Language vol 79 no 4 pp737ndash767 2003

[95] R Forsyth and D Holmes ldquoFeature-finding for test classifica-tionrdquo Literary and Linguistic Computing vol 11 no 4 pp 163ndash174 1996

[96] H Pham and R H Baayen ldquoVietnamese compounds showan anti-frequency effect in visual lexical decisionrdquo LanguageCognition and Neuroscience vol 30 no 9 pp 1077ndash1095 2015

[97] L Cohen and S Dehaene ldquoVentral and dorsal contributions toword readingrdquo inTheCognitive Neurosciences M S GazzanigaEd pp 789ndash804 Massachusetts Institute of Technology 2009

[98] E Keuleers P Lacey K Rastle and M Brysbaert ldquoThe BritishLexicon Project Lexical decision data for 28730 monosyllabicand disyllabic English wordsrdquo Behavior Research Methods vol44 no 1 pp 287ndash304 2012

[99] R K Wagner J K Torgesen and C A Rashotte ldquoDevel-opment of reading-related phonological processing abilitiesnew evidence of bidirectional causality from a latent variablelongitudinal studyrdquoDevelopmental Psychology vol 30 no 1 pp73ndash87 1994

[100] K F EWong andH-C Chen ldquoOrthographic and phonologicalprocessing in reading Chinese text Evidence from eye fixa-tionsrdquo Language and Cognitive Processes vol 14 no 5-6 pp461ndash480 1999

[101] R L Newman D Jared and C A Haigh ldquoDoes phonologyplay a role when skilled readers read high-frequency wordsEvidence from ERPsrdquo Language and Cognitive Processes vol 27no 9 pp 1361ndash1384 2012

[102] D Jared J Ashby S J Agauas and B A Levy ldquoPhonologicalactivation of word meanings in grade 5 readersrdquo Journal ofExperimental Psychology LearningMemory andCognition vol42 no 4 pp 524ndash541 2016

[103] T Bitan A Kaftory A Meiri-Leib Z Eviatar and O PelegldquoPhonological ambiguity modulates resolution of semanticambiguity during reading An fMRI study of Hebrewrdquo Neu-ropsychology vol 31 no 7 pp 759ndash777 2017

[104] D Jared and S Bainbridge ldquoReading homophone puns Evi-dence from eye trackingrdquo Canadian Journal of ExperimentalPsychology vol 71 no 1 pp 2ndash13 2017

[105] S Amenta M Marelli and S Sulpizio ldquoFrom sound tomeaning Phonology-to-Semantics mapping in visual wordrecognitionrdquo Psychonomic Bulletin amp Review vol 24 no 3 pp887ndash893 2017

[106] M Perrone-Bertolotti J Kujala J R Vidal et al ldquoHow silent issilent reading intracerebral evidence for top-down activationof temporal voice areas during readingrdquo The Journal of Neuro-science vol 32 no 49 pp 17554ndash17562 2012

[107] B Yao P Belin andC Scheepers ldquoSilent reading of direct versusindirect speech activates voice-selective areas in the auditorycortexrdquo Journal of Cognitive Neuroscience vol 23 no 10 pp3146ndash3152 2011

[108] M Coltheart B Curtis P Atkins and M Haller ldquoMod-els of Reading Aloud Dual-Route and Parallel-Distributed-Processing Approachesrdquo Psychological Review vol 100 no 4pp 589ndash608 1993

[109] M Coltheart ldquoModeling reading The dual-route approachrdquo inThe Science of Reading A Handbook pp 6ndash23 2005

38 Complexity

[110] M Pitt K Johnson E Hume S Kiesling and W RaymondldquoThe Buckeye corpus of conversational speech Labeling con-ventions and a test of transcriber reliabilityrdquo Speech Communi-cation vol 45 no 1 pp 89ndash95 2005

[111] K Johnson ldquoMassive reduction in conversational AmericanEnglishrdquo in Proceedings of the Spontaneous Speech Data andAnalysis Proceedings of the 1st Session of the 10th InternationalSymposium pp 29ndash54 The National International Institute forJapanese Language Tokyo Japan 2004

[112] C Phillips ldquoLevels of representation in the electrophysiology ofspeech perceptionrdquo Cognitive Science vol 25 no 5 pp 711ndash7312001

[113] R L Diehl A J Lotto and L L Holt ldquoSpeech perceptionrdquoAnnual Review of Psychology vol 55 pp 149ndash179 2004

[114] D Norris and J M McQueen ldquoShortlist B a Bayesian model ofcontinuous speech recognitionrdquo Psychological Review vol 115no 2 pp 357ndash395 2008

[115] S Hawkins ldquoRoles and representations of systematic finephonetic detail in speech understandingrdquo Journal of Phoneticsvol 31 no 3-4 pp 373ndash405 2003

[116] CCucchiarini andH Strik ldquoAutomatic Phonetic TranscriptionAn overviewrdquo in Proceedings of the 15th ICPhS pp 347ndash350Barcelona Spain 2003

[117] K JohnsonTheAuditoryPerceptual Basis for Speech Segmenta-tion Ohio State University Working Papers in Linguistics 1997

[118] H Fletcher ldquoAuditory patternsrdquo Reviews of Modern Physics vol12 no 1 pp 47ndash65 1940

[119] D Arnold ldquoAcousticndlcoder Coding sound files for use withndl R package version 101rdquo 2017

[120] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2016

[121] J Pickett and I Pollack ldquoIntelligibility of excerpts from fluentspeech Effects of rate of utterance and duration of excerptrdquoLanguage and Speech vol 6 pp 151ndash164 1963

[122] L Shockey ldquoPerception of reduced forms by non-native speak-ers of Englishrdquo in Sound Patterns of Spontaneous Speech DDuez Ed pp 97ndash100 ESCA Aix 1998

[123] M Ernestus H Baayen and R Schreuder ldquoThe recognition ofreduced word formsrdquo Brain and Language vol 81 no 1-3 pp162ndash173 2002

[124] R H Baayen C Shaoul J Willits and M Ramscar ldquoCom-prehension without segmentation a proof of concept withnaive discriminative learningrdquo Language Cognition and Neu-roscience vol 31 no 1 pp 106ndash128 2016

[125] C P Browman and L Goldstein ldquoArticulatory phonology Anoverviewrdquo Phonetica vol 49 no 3-4 pp 155ndash180 1992

[126] J A Tourville and F H Guenther ldquoThe DIVA model A neuraltheory of speech acquisition and productionrdquo Language andCognitive Processes vol 26 no 7 pp 952ndash981 2011

[127] G Csardi and T Nepusz ldquoThe igraph software package forcomplex network researchrdquo InterJournal Complex Systems vol1695 no 5 pp 1ndash9 2006

[128] M Ernestus and R H Baayen ldquoPredicting the unpredictableInterpreting neutralized segments in Dutchrdquo Language vol 79no 1 pp 5ndash38 2003

[129] R Weingarten G Nottbusch and U Will ldquoMorphemes sylla-bles and graphemes in written word productionrdquo inMultidisci-plinary Approaches to Speech Production T Pechmann and CHabel Eds pp 529ndash572 Mouton de Gruyter Berlin Germany2004

[130] R Bertram F E Toslashnnessen S Stromqvist J Hyona andP Niemi ldquoCascaded processing in written compound wordproductionrdquo Frontiers in Human Neuroscience vol 9 pp 1ndash102015

[131] T Cho ldquoEffects of morpheme boundaries on intergesturaltiming Evidence from Koreanrdquo Phonetica vol 58 no 3 pp129ndash162 2001

[132] S N Wood Generalized Additive Models Chapman ampHallCRC NY USA 2017

[133] M Seidenberg ldquoSublexical structures in visual word recogni-tion Access units or orthographic redundancyrdquo in Attentionand Performance XII M Coltheart Ed pp 245ndash264 LawrenceErlbaum Associates Hove 1987

[134] J M McQueen ldquoSegmentation of continuous speech usingphonotacticsrdquo Journal of Memory and Language vol 39 no 1pp 21ndash46 1998

[135] J B Hay Causes and Consequences of Word Structure Rout-ledge NY USA London UK 2003

[136] J B Hay ldquoFrom speech perception to morphology Affix-ordering revisitedrdquo Language vol 78 pp 527ndash555 2002

[137] J B Hay and R H Baayen ldquoPhonotactics parsing and produc-tivityrdquo Italian Journal of Linguistics vol 15 no 1 pp 99ndash1302003

[138] J Hay J Pierrehumbert and M Beckman ldquoSpeech perceptionwell-formedness and the statistics of the lexiconrdquo Papers inlaboratory phonology VI pp 58ndash74 2004

[139] M Ferro C Marzi and V Pirrelli ldquoA self-organizing modelof word storage and processing implications for morphologylearningrdquo Lingue e Linguaggio vol 10 no 2 pp 209ndash226 2011

[140] F ChersiM Ferro G Pezzulo andV Pirrelli ldquoTopological self-organization and prediction learning support both action andlexical chains in the brainrdquo Topics in Cognitive Science vol 6no 3 pp 476ndash491 2014

[141] V Pirrelli M Ferro and C Marzi ldquoComputational complexityof abstractive morphologyrdquo in Understanding and MeasuringMorphological Complexity M Bearman D Brown and G GCorbett Eds pp 141ndash166 Oxford University Press 2015

[142] C Marzi M Ferro and V Pirrelli ldquoIs inflectional irregularitydysfunctional to human processingrdquo in Abstract Booklet TheMental Lexicon 2018 V Kuperman Ed University of AlbertaEdmonton 2018

[143] E Keuleers M Stevens P Mandera and M Brysbaert ldquoWordknowledge in the crowd Measuring vocabulary size and wordprevalence in a massive online experimentrdquo The QuarterlyJournal of Experimental Psychology vol 68 no 8 pp 1665ndash16922015

[144] M Ernestus Voice Assimilation And Segment Reduction inCasual Dutch A Corpus-Based Study of the Phonology-PhoneticsInterface LOT Utrecht the Netherlands 2000

[145] R Kemps M Ernestus R Schreuder and H Baayen ldquoProcess-ing reducedword formsThe suffix restoration effectrdquoBrain andLanguage vol 90 no 1-3 pp 117ndash127 2004

[146] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) vol 1 pp 886ndash893 June 2005

[147] G Hickok ldquoThe architecture of speech production and the roleof the phoneme in speech processingrdquo Language Cognition andNeuroscience vol 29 no 1 pp 2ndash20 2014

[148] B V Tucker M Sims and R H Baayen Opposing Forceson Acoustic Duration Manuscript University of Alberta andUniversity of Tubingen 2018

Complexity 39

[149] I Plag J Homann and G Kunter ldquoHomophony and mor-phology The acoustics of word-final S in Englishrdquo Journal ofLinguistics vol 53 no 1 pp 181ndash216 2017

[150] F Tomaschek I Plag M Ernestus and R H BaayenModeling the Duration of Word-Final s in English withNaive Discriminative Learning Manuscript University ofSiegenTubingenNijmegen 2018

[151] R E Kalman ldquoA new approach to linear filtering and predictionproblemsrdquo Journal of Fluids Engineering vol 82 no 1 pp 35ndash451960

[152] B Russell ldquoOn denotingrdquo Mind vol 14 no 56 pp 479ndash4931905

[153] N Hornstein Logical Form FromGB toMinimalism Blackwell1995

[154] R Jackendoff Semantic StructuresMITPress Cambridge 1990[155] J J McCarthy ldquoA prosodic theory of non-concatenative mor-

phologyrdquo Linguistic Inquiry vol 12 pp 373ndash418 1981[156] A Ussishkin ldquoA Fixed Prosodic Theory of Nonconcatenative

Templaticmorphologyrdquo Natural Language amp Linguistic Theoryvol 23 no 1 pp 169ndash218 2005

[157] A Ussishkin ldquoAffix-favored contrast inequity and psycholin-guistic grounding for non-concatenative morphologyrdquo Mor-phology vol 16 no 1 pp 107ndash125 2006

[158] RW Young andWMorganTheNavajo Language A Grammarand Colloquial Dictionary University of New Mexico Press1980

[159] S C Levinson and A Majid ldquoThe island of time Yelı Dnye thelanguage of Rossel islandrdquo Frontiers in psychology vol 4 2013

[160] S C Levinson ldquoThe language of space in Yelı Dnyerdquo inGrammars of Space Explorations in Cognitive Diversity pp 157ndash203 Cambridge University Press 2006

[161] E Shafaei-Bajestan and R H Baayen ldquoWide Learning forAuditory Comprehensionrdquo in Proceedings of the Interspeech2018 2018

[162] J Hay and K Drager ldquoStuffed toys and speech perceptionrdquoLinguistics vol 48 no 4 pp 865ndash892 2010

[163] V Henri and C Henri ldquoOn our earliest recollections ofchildhoodrdquo Psychological Review vol 2 pp 215-216 1895

[164] S Freud ldquoChildhood and concealing memoriesrdquo in The BasicWritings of Sigmund Freud The Modern Library NY USA19051953

[165] P J Bauer and M Larkina ldquoThe onset of childhood amnesiain childhood A prospective investigation of the course anddeterminants of forgetting of early-life eventsrdquoMemory vol 22no 8 pp 907ndash924 2014

[166] M S Seidenberg and L M Gonnerman ldquoExplaining deriva-tional morphology as the convergence of codesrdquo Trends inCognitive Sciences vol 4 no 9 pp 353ndash361 2000

[167] M Coltheart K Rastle C Perry R Langdon and J ZieglerldquoDRC a dual route cascaded model of visual word recognitionand reading aloudrdquoPsychological Review vol 108 no 1 pp 204ndash256 2001

[168] M V Butz and E F Kutter How the Mind Comes into BeingIntroducing Cognitive Science from a Functional and Computa-tional Perspective Oxford University Press 2016

[169] M W Harm and M S Seidenberg ldquoPhonology readingacquisition and dyslexia Insights from connectionist modelsrdquoPsychological Review vol 106 no 3 pp 491ndash528 1999

[170] L M Gonnerman M S Seidenberg and E S AndersenldquoGraded semantic and phonological similarity effects in prim-ing Evidence for a distributed connectionist approach to

morphologyrdquo Journal of Experimental Psychology General vol136 no 2 pp 323ndash345 2007

[171] H Zhao O Gallo I Frosio and J Kautz ldquoLoss Functions forImage Restoration With Neural Networksrdquo IEEE Transactionson Computational Imaging vol 3 no 1 pp 47ndash57 2017

[172] C Clarke and P Luce ldquoPerceptual adaptation to speakercharacteristics Vot boundaries in stop voicing categorizationrdquoin Proceedings of the ISCA Workshop on Plasticity in SpeechPerception 2005

[173] DNorris JMMcQueen andACutler ldquoPerceptual learning inspeechrdquo Cognitive Psychology vol 47 no 2 pp 204ndash238 2003

[174] A Schweitzer and N Lewandowski ldquoSocial factors in conver-gence of f1 and f2 in spontaneous speechrdquo in Proceedings ofthe 10th International Seminar on Speech Production CologneGermany 2014

[175] M J Pickering and S Garrod ldquoToward a mechanistic psychol-ogy of dialoguerdquo Behavioral and Brain Sciences vol 27 no 2 pp169ndash190 2004

[176] C M Clarke and M F Garrett ldquoRapid adaptation to foreign-accented Englishrdquo The Journal of the Acoustical Society ofAmerica vol 116 no 6 pp 3647ndash3658 2004

[177] W J Levelt ldquoDie konnektionistische moderdquo Sprache UndKognition vol 10 no 2 pp 61ndash72 1991

[178] A Kostic ldquoInformational load constraints on processinginflected morphologyrdquo in Morphological Aspects of LanguageProcessing L B Feldman Ed Lawrence Erlbaum Inc Publish-ers New Jersey USA 1995

[179] J P BlevinsWord andParadigmMorphology OxfordUniversityPress 2016

[180] E Forstemann Uber Deutsche volksetymologie Zeitschrift furvergleichende Sprachforschung auf dem Gebiete des DeutschenGriechischen und Lateinischen 1852

[181] K Nault Morphological Therapy Protocal [Ph D thesis] Uni-versity of Alberta Edmonton 2010

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 29: The Discriminative Lexicon: A Unified Computational Model ...downloads.hindawi.com/journals/complexity/2019/4895891.pdf · as developed within computational linguistics and com-puter

Complexity 29

The approach of Marelli and Baroni [60] who derive thesemantic vector of a complex word from the semantic vectorof its base and a dedicated affix-specific mapping from thepertinent base words to the corresponding derived wordsis in the spirit of compositionality in that a change in form(through affixation) goes hand in hand with a change inmeaning (modeled with a linear mapping in semantic space)

The perspective on morphological processing developedin the present study breaks with this tradition Althoughsemantic vectors for inflected words are obtained by summa-tion of the semantic vectors of the base word and those ofits inflectional functions (see [85] for modeling of the richverbal paradigms of Latin) the form vectors correspondingto the resulting semantic vectors are not obtained by thesummation of vectors for chunks of form nor by any otheroperations on forms Attempts to align chunks of wordsrsquo formwith their semantic structures mdash typically represented bygraphs constructed over semantic primitives (see eg [154])mdash are destined to end in quagmires of arbitrary decisionsabout the number of inflectional classes required (Estonian30 or 300) what the chunks should look like (does Germanhave a morpheme d in its articles and demonstratives derdie das diese ) whether Semitic morphology requiresoperations on form on one or two tiers [155ndash157] andhow many affixal position slots are required for Athebascanlanguages such asNavajo [158] A language that is particularlypainful for compositional theories is Yelı Dnye an Indo-Pacific language spoken on Rossel island (in the south-east ofPapuaNewGuinea)The language has a substantial inventoryof over a 1000 function words used for verb agreementTypically these words are monosyllables that simultaneouslyspecify values for negation tense aspect person and numberof subject deixis evidentiality associated motion and coun-terfactuality Furthermore verbs typically have no less thaneight different stems the use of which depends on particularconstellations of inflectional functions [159 160] Yelı Dnyethus provides a striking counterexample to the hypothesisthat a calculus of meaning would be paralleled by a calculusof form

Whereas the theory of linear discriminative learningradically rejects the idea that morphological processing isgrounded in compositionality as defined in mainstream for-mal linguistic theory it does allow for the conjoint expressionin form of lexical meanings and inflectional functions andfor forms to map onto such conjoint semantic vectors Butjust asmultiple inflectional functions can be expressed jointlyin a single word multiple words can map onto nonconjointsemantic vectors as was illustrated above for named entityrecognition John Wiggam can be understood to refer to avery specific person without this person being a composi-tional function of the proper name John and the surnameWiggam In both these cases there is no isomorphy betweeninflectional functions and lexical meanings on the one handand chunks of form on the other hand

Of course conjoint vectors for lexical meanings andinflectional functions can be implemented andmade to workonly because a Latin verb form such as sapıvissemus isunderstood when building the semantic space S to expresslexomes for person (first) number (plural) voice (active)

tense (pluperfect) mood (subjunctive) and lexical meaning(to know) (see [85] for computational modeling details)Although this could loosely be described as a decomposi-tional approach to inflected words we prefer not to use theterm lsquodecompositionalrsquo as in formal linguistics this termas outlined above has a very different meaning A moreadequate terminology would be that the system is conjointrealizational for production and conjoint inferential for com-prehension (we note here that we have focused on temporallyconjoint realization and inference leaving temporally disjointrealization and inference for sequences of words and possiblycompounds and fixed expression for further research)

Naive versus linear discriminative learning There isone important technical difference between the learningengine of the current study ldl and naive discriminativelearning (ndl) Because ndl makes the simplifying assump-tion that outcomes are orthogonal the weights from the cuesto a given outcome are independent of the weights from thecues to the other outcomes This independence assumptionwhich motivates the word naive in naive discriminativelearning makes it possible to very efficiently update networkweights during incremental learning In ldl by contrast thisindependence no longer holds learning is no longer naiveWe therefore refer to our networks not as ndl networksbut simply as linear discriminative learning networks Actualincremental updating of ldl networks is computationallymore costly but further numerical optimization is possibleand implementations are underway

Scaling An important property of the present approachis that good results are obtained already for datasets ofmodestsize Our semantic matrix is derived from the tasa corpuswhich with only 10 million words is dwarfed by the 28 billionwords of the ukWaC corpus used by Marelli and Baroni [60]more words than any individual speaker can ever encounterin their lifetime Similarly Arnold et al [18] report goodresults for word identification when models are trained on20 hours of speech which contrasts with the huge volumesof speech required for training deep learning networks forspeech recognition Although performance increases some-whatwithmore training data [161] it is clear that considerableheadway can be made with relatively moderate volumesof speech It thus becomes feasible to train models onfor instance Australian English and New Zealand Englishusing already existing speech corpora and to implementnetworks that make precise quantitative predictions for howunderstanding is mediated by listenersrsquo expectations abouttheir interlocutors (see eg [162])

In this study we have obtained good results with asemantic matrix with a dimensionality of around 4000lexomes By itself we find it surprising that a semantic vectorspace can be scaffolded with a mere 4000 words But thisfindingmay also shed light on the phenomenon of childhoodamnesia [163 164] Bauer and Larkina [165] pointed outthat young children have autobiographical memories thatthey may not remember a few years later They argue thatthe rate of forgetting diminishes as we grow older andstabilizes in adulthood Part of this process of forgettingmay relate to the changes in the semantic matrix Cruciallywe expect the correlational structure of lexomes to change

30 Complexity

considerably over time as experience accumulates If this isindeed the case the semantic vectors for the lexomes foryoung children are different from the corresponding lexomesfor older children which will again differ from those ofadults As a consequence autobiographical memories thatwere anchored to the lexomes at a young age can no longerbe accessed as the semantic vectors to which these memorieswere originally anchored no longer exist

No examplars Given how we estimate the transforma-tion matrices with which we navigate between form andmeaning it might seem that our approach is in essence animplementation of an exemplar-based theory of the mentallexicon After all for instance the C119886 matrix specifies foreach auditory word form (exemplar) its pertinent frequencyband summary features However the computational imple-mentation of the present study should not be confused withthe underlying algorithmic conceptualization The underly-ing conceptualization is that learning proceeds incrementallytrial by trial with the weights of transformation networksbeing updated with the learning rule of Widrow and Hoff[44] In the mean trial-by-trial updating with the Widrow-Hoff learning rule results in the same expected connectionweights as obtained by the present estimates using thegeneralized inverse The important conceptual advantage ofincremental learning is however that there is no need to storeexemplars By way of example for auditory comprehensionacoustic tokens are given with the input to the learningsystemThese tokens the ephemeral result of interactingwiththe environment leave their traces in the transformationmatrix F119886 but are not themselves stored The same holdsfor speech production We assume that speakers dynamicallyconstruct the semantic vectors from their past experiencesrather than retrieving these semantic vectors from somedictionary-like fixed representation familiar from currentfile systems The dynamic nature of these semantic vectorsemerged loud and clear from the modeling of novel complexwords for speech production under cross-validation In otherwords our hypothesis is that all input and output vectors areephemeral in the sense that they represent temporary statesof the cognitive system that are not themselves represented inthat system but that leave their traces in the transformationmatrices that jointly characterize and define the cognitivesystem that we approximate with the discriminative lexicon

This dynamic perspective on the mental lexicon asconsisting of several coupled networks that jointly bringthe cognitive system into states that in turn feed into fur-ther networks (not modeled here) for sentence integration(comprehension) and articulation (production) raises thequestion about the status of units in this theory Units playan important role when constructing numeric vectors forform and meaning Units for letter trigrams and phonetrigrams are of course context-enriched letter and phoneunits Content lexomes as well as lexomes for derivationaland inflectional functions are crutches for central semanticfunctions ranging from functions realizing relatively concreteonomasiological functions (lsquothis is a dog not a catrsquo) toabstract situational functions allowing entities events andstates to be specified on the dimensions of time spacequantity aspect etc However crucial to the central thrust of

the present study there are no morphological symbols (unitscombining form andmeaning) normorphological formunitssuch as stems and exponents of any kind in the modelAbove we have outlined the many linguistic considerationsthat have led us not to want to incorporate such units aspart of our theory Importantly as there are no hidden layersin our model it is not possible to seek for lsquosubsymbolicrsquoreflexes of morphological units in patterns of activation overhidden layers Here we depart from earlier connectionistmodels which rejected symbolic representations but retainedthe belief that there should be morphological representationalbeit subsymbolic ones Seidenberg and Gonnerman [166]for instance discuss morphological representations as beinglsquointerlevel representationsrsquo that are emergent reflections ofcorrelations among orthography phonology and semanticsThis is not to say that themore agglutinative a language is themore the present model will seem to the outside observer tobe operating withmorphemes But themore a language tendstowards fusional or polysyntheticmorphology the less strongthis impression will be

Model complexity Next consider the question of howdifficult lexical processing actually is Sidestepping the com-plexities of the neural embedding of lexical processing [126147] here we narrow down this question to algorithmiccomplexity State-of-the-art models in psychology [37 114167] implementmany layers of hierarchically organized unitsand many hold it for an established fact that such units arein some sense psychologically real (see eg [19 91 168])However empirical results can be equally well understoodwithout requiring the theoretical construct of the morpheme(see eg [47 54 58 166 169 170]) The present studyillustrates this point for lsquoboundaryrsquo effects in written andoral production When paths in the triphone graph branchuncertainty increases and processing slows down Althoughobserved lsquoboundaryrsquo effects are compatible with a post-Bloomfieldian lexicon the very same effects arise in ourmodel even though morphemes do not figure in any way inour computations

Importantly the simplicity of the linear mappings in ourmodel of the discriminative lexicon allows us to sidestep theproblem that typically arise in computationally implementedfull-scale hierarchical systems namely that errors arisingat lower levels propagate to higher levels The major stepsforward made in recent end-to-end models in languageengineering suggest that end-to-end cognitive models arealso likely to perform much better One might argue thatthe hidden layers of deep learning networks represent thetraditional lsquohidden layersrsquo of phonemes morphemes andword forms mediating between form and meaning in lin-guistic models and offshoots thereof in psychology Howeverthe case of baboon lexical learning [12 13] illustrates thatthis research strategy is not without risk and that simplerdiscriminative networks can substantially outperform deeplearning networks [15] We grant however that it is conceiv-able that model predictions will improve when the presentlinear networks are replaced by deep learning networks whenpresented with the same input and output representationsused here (see Zhao et al [171] for a discussion of lossfunctions for deep learning targeting numeric instead of

Complexity 31

categorical output vectors) What is unclear is whethersuch networks will improve our understandingmdashthe currentlinear networks offer the analyst connection weights betweeninput and output representations that are straightforwardlylinguistically interpretable Where we think deep learningnetworks really will come into their own is bridging the gapbetween for instance the cochlea and retina on the one handand the heuristic representations (letter trigrams) that wehave used as a starting point for our functional theory of themappings between form and meaning

Network flexibility An important challenge for deeplearning networks designed to model human lexical process-ing is to keep the networks flexible and open to rapid updat-ing This openness to continuous learning is required notonly by findings on categorization [40 41 41 42] includingphonetic categorization [172 173] but is also visible in timeseries of reaction times Baayen and Hendrix [17] reportedimproved prediction of visual lexical decision reaction timesin the British Lexicon Project when a naive discriminativelearning network is updated from trial to trial simulating thewithin-experiment learning that goes on as subjects proceedthrough the experiment In natural speech the phenomenonof phonetic convergence between interlocutors [174] likewisebears witness to a lexicon that is constantly recalibratedin interaction with its environment (for the many ways inwhich interlocutorsrsquo speech aligns see [175]) Likewise therapidity with which listeners adapt to foreign accents [176]within a few minutes shows that lexical networks exhibitfast local optimization As deep learning networks typicallyrequire huge numbers of cycles through the training dataonce trained they are at risk of not having the requiredflexibility for fast local adaptation This risk is reduced for(incremental) wide learning as illustrated by the phoneticconvergence evidenced by the model of Arnold et al [18] forauditory comprehension (a very similar point was made byLevelt [177] in his critique of connectionist models in thenineties)

Open questions This initial study necessarily leavesmany questions unanswered Compounds and inflectedderived words have not been addressed and morphologicaloperations such as reduplication infixation and nonconcate-native morphology provide further testing grounds for thepresent approach We also need to address the processingof compounds with multiple constituents ubiquitous inlanguages as different as German Mandarin and EstonianFurthermore in actual utterances words undergo assimila-tion at their boundaries and hence a crucial test case forthe discriminative lexicon is to model the production andcomprehension of words in multi-word utterances For ourproduction model we have also completely ignored stresssyllabification and pitch which however likely has renderedthe production task more difficult than necessary

A challenge for discriminative morphology is the devel-opment of proper semantic matrices For languages withrich morphology such as Estonian current morphologicalparsers will identify case-inflected words as nominativesgenitives or partitives (among others) but these labelsfor inflectional forms do not correspond to the semanticfunctions encoded in these forms (see eg [178 179]) What

is required are computational tools that detect the appropriateinflectional lexomes for these words in the sentences orutterances in which they are embedded

A related problem specifically for the speech productionmodel is how to predict the strongly reduced word formsthat are rampant in conversational speech [111] Our auditorycomprehension network F119886 is confronted with reduced formsin training and the study of Arnold et al [18] indicates thatpresent identification accuracymay approximate human per-formance for single-word recognition Whereas for auditorycomprehension we are making progress what is required forspeech production is a much more detailed understanding ofthe circumstances under which speakers actually use reducedword forms The factors driving reduction may be in partcaptured by the semantic vectors but we anticipate thataspects of the unfolding discourse play an important role aswell which brings us back to the unresolved question of howto best model not isolated words but words in context

An important challenge for our theory which formalizesincremental implicit learning is how to address and modelthe wealth of explicit knowledge that speakers have abouttheir language We play with the way words sound in poetrywe are perplexed by wordsrsquo meanings and reinterpret themso that they make more sense (eg the folk etymology [180]reflected in modern English crawfish a crustacean and nota fish originally Middle English crevis compare ecrevissein French) we teach morphology in schools and we evenfind that making morphological relations explicit may bebeneficial for aphasic patients [181] The rich culture of worduse cannot but interact with the implicit system but howexactly this interaction unfolds andwhat its consequences areat both sides of the consciousunconscious divide is presentlyunclear

Although many questions remain our model is remark-ably successful in capturing many aspects of lexical andmorphological processing in both comprehension and pro-duction Since themodel builds on a combination of linguisti-callymotivated lsquosmartrsquo low-level representations for form andmeaning and large but otherwise straightforward andmathe-matically well-understood two-layer linear networks (to beclear networks without any hidden layers) the conclusionseems justified that algorithmically the ldquomental lexiconrdquomaybe much simpler than previously thought

Appendix

A Vectors Matrices and MatrixMultiplication

Figure 12 presents two data points 119886 and 119887 with coordinates(3 4) and (-2 -3) Arrows drawn from the origin to thesepoints are shown in blue with a solid and dashed linerespectively We refer to these points as vectors Vectors aredenoted with lower case letters in bold and we place the 119909coordinate of a point next to the 119910 coordinate

a = (3 4) b = (minus2 minus3) (A1)

32 Complexity

minus4 minus2 0 2 4

minus4

minus2

02

4

x

y

a

b

x

y

Figure 12 Points 119886 and 119887 can be transformed into points 119901 and 119902 bya linear transformation

We can represent a and b jointly by placing them next to eachother in a matrix For matrices we use capital letters in boldfont

A = ( 3 4minus2 minus3) (A2)

In addition to the points 119886 and 119887 Figure 12 also shows twoother datapoints 119909 and 119910 The matrix for these two points is

B = (minus5 24 minus1) (A3)

Let us assume that points 119886 and 119887 represent the forms oftwo words 1205961 and 1205962 and that the points 119909 and 119910 representthe meanings of these two words (below we discuss in detailhow exactly wordsrsquo forms and meanings can be representedas vectors of numbers) As morphology is the study of therelation between wordsrsquo forms and their meanings we areinterested in how to transform points 119886 and b into points119909 and 119910 and likewise in how to transform points 119909 and119910 back into points 119886 and 119887 The first transformation isrequired for comprehension and the second transformationfor production

We begin with the transformation for comprehensionFormally we have matrix A which we want to transforminto matrix B A way of mapping A onto B is by means ofa linear transformation using matrix multiplication For 2 times2 matrices matrix multiplication is defined as follows(1198861 11988621198871 1198872) (1199091 11990921199101 1199102) = (11988611199091 + 11988621199101 11988611199092 + 1198862119910211988711199091 + 11988721199101 11988711199092 + 11988721199102) (A4)

Such a mapping of A onto B is given by a matrix F

F = ( 1 2minus2 minus1) (A5)

and it is straightforward to verify that indeed( 3 4minus2 minus3) ( 1 2minus2 minus1) = (minus5 24 minus1) (A6)

Using matrix notation we can write

AF = B (A7)

Given A and B how do we obtain F The answeris straightforward but we need two further concepts theidentity matrix and the matrix inverse For multiplication ofnumbers the identity multiplication is multiplication with 1For matrices multiplication with the identity matrix I leavesthe locations of the points unchangedThe identitymatrix hasones on the main diagonal and zeros elsewhere It is easilyverified that indeed(1 00 1) (1199091 11990921199101 1199102) = (1199091 11990921199101 1199102) (1 00 1) = (1199091 11990921199101 1199102) (A8)

For numbers the inverse of multiplication with 119904 is dividingby 119904 (1119904 ) (119904119909) = 119909 (A9)

For matrices the inverse of a square matrix X is that matrixXminus1 such that their product is the identity matrix

Xminus1X = XXminus1 = I (A10)

For nonsquare matrices the inverse Yminus1 is defined such that

Yminus1 (YX) = (XY) 119884minus1 = X (A11)

We find the square matrix F that maps the square matrices Aonto B as follows

AF = B

Aminus1AF = Aminus1B

IF = Aminus1B

F = Aminus1B(A12)

Since for the present example A and its inverse happen to beidentical (Aminus1 = A) we obtain (A6)

Fmaps wordsrsquo form vectors onto wordsrsquo semantic vectorsLet us now consider the reverse and see how we can obtainwordsrsquo form vectors from wordsrsquo semantic vectors That isgiven B we want to find that matrix G which maps B ontoA ie

BG = A (A13)

Solving this equation proceeds exactly as above

BG = A

Bminus1BG = Bminus1A

IG = Bminus1A

G = Bminus1A(A14)

Complexity 33

The inverse of B is

Bminus1 = (13 2343 53) (A15)

and hence we have

(13 2343 53) ( 3 4minus2 minus3) = (minus13 minus2323 13 ) = G (A16)

The inverse of a matrix needs not exist A square matrixthat does not have an inverse is referred to as a singularmatrix Almost all matrices with which we work in theremainder of this study are singular For singular matricesan approximation of the inverse can be used such as theMoore-Penrose generalized inverse (In R the generalizedinverse is available in theMASS package function ginv Thenumeric library that is used by ginv is LAPACK availableat httpwwwnetliborglapack) In this study wedenote the generalized inverse of a matrix X by X1015840

The examples presented here are restricted to 2 times 2matrices but themathematics generalize tomatrices of largerdimensions When multiplying two matrices X and Y theonly constraint is that the number of columns ofX is the sameas the number of rows ofYThe resultingmatrix has the samenumber of rows as X and the same number of columns as Y

In the present study the rows of matrices represent wordsand columns the features of thesewords For instance the rowvectors of A can represent wordsrsquo form features and the rowvectors of and B the corresponding semantic features Thefirst row vector of A is mapped onto the first row vector ofB as follows (3 4) ( 1 2minus2 minus1) = (minus5 2) (A17)

A transformationmatrix such as F can be represented as atwo-layer networkThe network corresponding to F is shownin Figure 13 When the input vector (3 4) is presented to thenetwork the nodes 1198941 and 1198942 are set to 3 and 4 respectively Toobtain the activations 1199001 = minus5 and 1199002 = 2 for the output vector(minus5 2) we cumulate the evidence at the input weighted bythe connection strengths specified on the edges of the graphThus for 1199001 we obtain the value 3 times 1 + 4 times minus2 = minus5 and for1199002 we have 3 times 2 + 4 times minus1 = 2 In other words the networkmaps the input vector (3 4) onto the output vector (minus5 2)

An important property of linear maps is that they areproductive We can present a novel form vector say

s = (2 2) (A18)

to the network and it will map this vector onto a novelsemantic vector Using matrices this new semantic vector isstraightforwardly calculated(2 2) ( 1 2minus2 minus1) = (minus2 2) (A19)

1 2 minus2 minus1

i1 i2

o1 o2

semantic vectors come out here

form features go in here

connection weightsare the transformationmatrix

Figure 13 The linear network corresponding to the transformationmatrix F There are two input nodes 1198941 and 1198942 (in blue)and twooutput nodes 1199001 and 1199002 in red When the input vector (3 4) ispresented to the network the value at 1199001 is 3 times 1 + 4 times minus2 = minus5and that at 1199002 is 3 times 2 + 4 times minus1 = 2 The network maps the point(3 4) onto the point (minus5 2)B Graph-Based Triphone Sequencing

The hypothesis underlying graph-based triphone sequencingis that in the directed graph that has triphones as verticesand an edge between any two vertices that overlap (iesegments 2 and 3 of triphone 1 are identical to segments 1and 2 of triphone 2) the path from a wordrsquos initial triphone(the triphone starting with ) to a wordrsquos final triphone (thetriphone ending with a ) is the path receiving the bestsupport of all possible paths from the initial to the finaltriphone Unfortunately the directed graph containing allvertices and edges is too large tomake computations tractablein a reasonable amount of computation time The followingalgorithm is a heuristic algorithm that first collects potentiallyrelevant vertices and edges and then calculates all pathsstarting fromany initial vertex using theall simple pathsfunction from the igraph package [127] As a first step allvertices that are supported by the stem and whole wordwith network support exceeding 01 are selected To theresulting set those vertices are added that are supported bythe affix which is conditional on these vertices having anetwork support exceeding 095 A directed graph can nowbe constructed and for each initial vertex all paths startingat these vertices are calculated The subset of those pathsreaching a final vertex is selected and the support for eachpath is calculated As longer paths trivially will have greatersupport the support for a path is weighted for its lengthsimply by dividing the raw support by the path length Pathsare ranked by weighted path support and the path withmaximal support is selected as the acoustic image drivingarticulation

It is possible that no path from an initial vertex to a finalvertex is found due to critical boundary triphones not beinginstantiated in the data on which the model was trainedThis happens when the model is assessed on novel inflected

34 Complexity

nobo

dyso

met

hing

ever

ythi

ngno

thin

gev

eryo

nean

ythi

ngan

yone

som

eone

hers

him his

via

than I

exce

ptou

tside

thei

rsw

hoev

erth

ence

min

ein

side

amon

gst its

thei

rou

rm

yyo

uyo

ur her

she

your

selv

esw

hate

ver us we it

they he

them m

em

yself

him

self

hers

elfou

rselv

esyo

urse

lfits

elfth

emse

lves

abou

tw

ithou

tw

ithin

beyo

ndam

ong

betw

een

thro

ugho

utdu

ring

since by

from in of

with at fo

r tobe

low

abov

e up offdo

wn

upon on

agai

nst

tow

ard

tow

ards

arou

ndne

aral

ong

unde

rbe

hind

besid

eth

roug

hon

toin

toov

erac

ross

nobodysomethingeverythingnothingeveryoneanythinganyonesomeonehershimhisviathanIexceptoutsidetheirswhoeverthencemineinsideamongstitstheirourmyyouyourhersheyourselveswhateverusweittheyhethemmemyselfhimselfherselfourselvesyourselfitselfthemselvesaboutwithoutwithinbeyondamongbetweenthroughoutduringsincebyfrominofwithatfortobelowaboveupoffdownupononagainsttowardtowardsaroundnearalongunderbehindbesidethroughontointooveracross

minus01 0 01 02Value

Color Keyand Histogram

050

015

0025

00C

ount

Figure 14 Heatmap for the correlations of the semantic vectors of pronouns and prepositions Note that quantifier pronouns form a groupof their own that prepositions and the other pronouns form distinguisable groups and that within groups further subclusters are visible egforwe and us she and her and across and over All diagonal elements represent correlations equal to 1 (this is not brought out by the heatmapwhich color-codes diagonal elements with the color for the highest correlation of the range we specified 025)

and derived words under cross-validationTherefore verticesand edges are added to the graph for all nonfinal verticesthat are at the end of any of the paths starting at any initialtriphone Such novel vertices and edges are assigned zeronetwork support Typically paths from the initial vertex tothe final vertex can now be constructed and path support isevaluated as above

For an algorithm that allows triphone paths to includecycles which may be the case in languages with much richer

morphology than English see Baayen et al [85] and for codethe R packageWpmWithLdl described therein

C A Heatmap for Function Words(Pronouns and Prepositions)

The heatmap for the correlations of the semantic vectors ofpronouns and prepositions is presented in Figure 14

Complexity 35

Data Availability

With the exception of the data from theDistributed Little RedHen Lab all primary data are publicly available from the citedsources

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The authors are indebted to Geoff Hollis Jessie Nixon ChrisWestbury and LuisMienhardt for their constructive feedbackon earlier versions of this manuscript This research wassupported by an ERC advancedGrant (no 742545) to the firstauthor

References

[1] G Frege ldquoBegriffsschrift a formula language modeled uponthat of arithmetic for pure thoughtrdquo in From Frege to GodelA Source Book in Mathematical Logic 1879-1931 pp 1ndash82 1967

[2] B RussellAn Inquiry intoMeaning andTruth Allen andUnwinLondon UK 1942

[3] R Montague ldquoThe proper treatment of quantification in ordi-nary englishrdquo in Approaches to Natural Language pp 221ndash242Springer 1973

[4] N Chomsky Syntactic Structures vol 33 Mouton The Hague1957

[5] W J M Levelt Formal Grammars in Linguistics And Psycholin-guistics Volume 1 An Introduction to the Theory of FormalLanguages and Automata Volume 2 Applications in LinguisticTheory Volume 3 Psycholinguistic Applications John BenjaminsPublishing 2008

[6] D F Kleinschmidt and T Florian Jaeger ldquoRobust speechperception Recognize the familiar generalize to the similar andadapt to the novelrdquo Psychological Review vol 122 no 2 pp 148ndash203 2015

[7] L Breiman ldquoRandom forestsrdquo Machine Learning vol 45 no 1pp 5ndash32 2001

[8] C Strobl A-L Boulesteix A Zeileis and T Hothorn ldquoBiasin random forest variable importance measures illustrationssources and a solutionrdquo BMC Bioinformatics vol 8 2007

[9] A Hannun C Case J Casper et al ldquoDeep speechScaling up end-to-end speech recognitionrdquo 2014httpsarxivorgabs14125567

[10] Z Tuske P Golik R Schluter and H Ney ldquoAcoustic modelingwith deep neural networks using raw time signal for lvcsrrdquo inProceedings of the INTERSPEECH pp 890ndash894 2014

[11] J Schmidhuber ldquoDeep learning in neural networks anoverviewrdquo Neural Networks vol 61 pp 85ndash117 2015

[12] T Hannagan J C Ziegler S Dufau J Fagot and J GraingerldquoDeep learning of orthographic representations in baboonsrdquoPLoS ONE vol 9 no 1 2014

[13] J Grainger S Dufau M Montant J C Ziegler and J FagotldquoOrthographic processing in baboons (Papio papio)rdquo Sciencevol 336 no 6078 pp 245ndash248 2012

[14] D Scarf K Boy A U Reinert J Devine O Gunturkun andM Colombo ldquoOrthographic processing in pigeons (columba

livia)rdquo Proceedings of the National Academy of Sciences vol 113no 40 pp 11272ndash11276 2016

[15] M Linke F Broker M Ramscar and R H Baayen ldquoArebaboons learning ldquoorthographicrdquo representations Probablynotrdquo PLoS ONE vol 12 no 8 2017

[16] D H Hubel and T N Wiesel ldquoReceptive fields binocularinteraction and functional architecture in the catrsquos visualcortexrdquo The Journal of Physiology vol 160 pp 106ndash154 1962

[17] R H Baayen and P Hendrix ldquoTwo-layer networks non-linear separation and human learningrdquo in From Semantics toDialectometry Festschrift in honor of JohnNerbonne Tributes 32M Wieling M Kroon G Van Noord and G Bouma Eds pp13ndash22 College Publications 2017

[18] D Arnold F Tomaschek K Sering F Lopez and R H BaayenldquoWords from spontaneous conversational speech can be rec-ognized with human-like accuracy by an error-driven learningalgorithm that discriminates between meanings straight fromsmart acoustic features bypassing the phoneme as recognitionunitrdquo PLoS ONE vol 12 no 4 2017

[19] P Zwitserlood ldquoProcessing and representation of morphologi-cal complexity in native language comprehension and produc-tionrdquo in The Construction of Words G E Booij Ed pp 583ndash602 Springer 2018

[20] G Booij ldquoConstruction morphologyrdquo Language and LinguisticsCompass vol 4 no 7 pp 543ndash555 2010

[21] J P Blevins ldquoWord-based morphologyrdquo Journal of Linguisticsvol 42 no 3 pp 531ndash573 2006

[22] G Stump Inflectional Morphology A Theory of ParadigmStructure Cambridge University Press 2001

[23] R Beard ldquoOn the extent and nature of irregularity in thelexiconrdquo Lingua vol 42 no 4 pp 305ndash341 1977

[24] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1974

[25] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1991

[26] C Hockett ldquoThe origin of speechrdquo Scientific American vol 203pp 89ndash97 1960

[27] M Erelt Estonian Language M Erelt Ed Estonian AcademyPublishers Tallinn Estonia 2003

[28] P H Matthews Grammatical Theory in the United States fromBloomfield to Chomsky Cambridge University Press Cam-bridge 1993

[29] N Chomsky andMHalleTheSound Pattern of English Harperand Row NY USA 1968

[30] T Landauer and S Dumais ldquoA solution to Platorsquos problem Thelatent semantic analysis theory of acquisition induction andrepresentation of knowledgerdquo Psychological Review vol 104 no2 pp 211ndash240 1997

[31] W Weaver ldquoTranslationrdquo in Machine Translation of LanguagesFourteen Essays W N Locke and A D Booth Eds pp 15ndash23MIT Press Cambridge 1955

[32] J R Firth Selected Papers of J R Firth 1952-59 IndianaUniversity Press 1968

[33] K Lund and C Burgess ldquoProducing high-dimensional seman-tic spaces from lexical co-occurrencerdquo Behavior Research Meth-ods Instruments and Computers vol 28 no 2 pp 203ndash2081996

[34] C Shaoul and C Westbury ldquoExploring lexical co-occurrencespace using HiDExrdquo Behavior Research Methods vol 42 no 2pp 393ndash413 2010

36 Complexity

[35] T Mikolov I Sutskever K Chen G Corrado and J DeanldquoDistributed representations of words and phrases and theircompositionalityrdquo inAdvances in Neural Information ProcessingSystems pp 3111ndash3119 2013

[36] A Roelofs ldquoThe WEAVER model of word-form encoding inspeech productionrdquo Cognition vol 64 no 3 pp 249ndash284 1997

[37] W J M Levelt A Roelofs and A S Meyer ldquoA theory of lexicalaccess in speech productionrdquoBehavioral and Brain Sciences vol22 no 1 pp 1ndash38 1999

[38] M Taft ldquoInteractive-activation as a Framework for Under-standing Morphological Processingrdquo Language and CognitiveProcesses vol 9 no 3 pp 271ndash294 1994

[39] R Schreuder and R H Baayen ldquoModeling morphologicalprocessingrdquo inMorphological Aspects of Language Processing LB Feldman Ed pp 131ndash154 Lawrence ErlbaumHillsdale NewJersey USA 1995

[40] B C Love D L Medin and T M Gureckis ldquoSUSTAIN ANetwork Model of Category Learningrdquo Psychological Reviewvol 111 no 2 pp 309ndash332 2004

[41] C J Marsolek ldquoWhat antipriming reveals about primingrdquoTrends in Cognitive Sciences vol 12 no 5 pp 176ndash181 2008

[42] M Ramscar and R Port ldquoCategorization (without categories)rdquoin Handbook of Cognitive Linguistics E Dabrowska and DDivjak Eds pp 75ndash99 De Gruyter Berlin Germany 2015

[43] R A Rescorla and A R Wagner ldquoA theory of Pavlovianconditioning Variations in the effectiveness of reinforcementand nonreinforcementrdquo in Classical Conditioning II CurrentResearch and Theory A H Black and W F Prokasy Eds pp64ndash99 Appleton Century Crofts NY USA 1972

[44] B Widrow and M E Hoff ldquoAdaptive switching circuitsrdquo 1960WESCON Convention Record Part IV pp 96ndash104 1960

[45] R H Baayen R Schreuder and R Sproat ldquoMorphology inthe Mental Lexicon A Computational Model for Visual WordRecognitionrdquo in Lexicon Development for Speech and LanguageProcessing F van Eynde and D Gibbon Eds pp 267ndash291Kluwer Academic Publishers 2000

[46] M Taft ldquoA morphological-decomposition model of lexicalrepresentationrdquo Linguistics vol 26 no 4 pp 657ndash667 1988

[47] MWHarmandM S Seidenberg ldquoComputing themeanings ofwords in reading Cooperative division of labor between visualand phonological processesrdquo Psychological Review vol 111 no3 pp 662ndash720 2004

[48] P C Trimmer J M McNamara A Houston and J A RMarshall ldquoDoes natural selection favour the Rescorla-Wagnerrulerdquo Journal of Theoretical Biology vol 302 pp 39ndash52 2012

[49] R R Miller R C Barnet and N J Grahame ldquoAssessment ofthe Rescorla-Wagner modelrdquo Psychological Bulletin vol 117 no3 pp 363ndash386 1995

[50] L J Kamin ldquoPredictability surprise attention and condition-ingrdquo in Punishment and Aversive Behavior B A Campbell andRMChurch Eds pp 276ndash296Appleton-Century-CroftsNYUSA 1969

[51] R A Rescorla ldquoPavlovian conditioning Itrsquos not what you thinkit isrdquo American Psychologist vol 43 no 3 pp 151ndash160 1988

[52] M Ramscar D Yarlett M Dye K Denny and K ThorpeldquoThe Effects of Feature-Label-Order and Their Implications forSymbolic Learningrdquo Cognitive Science vol 34 no 6 pp 909ndash957 2010

[53] M Ramscar and D Yarlett ldquoLinguistic self-correction in theabsence of feedback A new approach to the logical problem oflanguage acquisitionrdquo Cognitive Science vol 31 no 6 pp 927ndash960 2007

[54] R H Baayen P Milin D F ETHurđevic P Hendrix and MMarelli ldquoAn Amorphous Model for Morphological Process-ing in Visual Comprehension Based on Naive DiscriminativeLearningrdquo Psychological Review vol 118 no 3 pp 438ndash4812011

[55] F M Del Prado Martın R Bertram T Haikio R Schreuderand R H Baayen ldquoMorphological family size in a morphologi-cally rich languageThe case of Finnish compared to Dutch andHebrewrdquo Journal of Experimental Psychology LearningMemoryand Cognition vol 30 pp 1271ndash1278 2004

[56] P Milin D F Durdevic and F M Del Prado Martın ldquoThesimultaneous effects of inflectional paradigms and classes onlexical recognition Evidence from Serbianrdquo Journal of Memoryand Language vol 60 no 1 pp 50ndash64 2009

[57] B K Bergen ldquoThe psychological reality of phonaesthemesrdquoLanguage vol 80 no 2 pp 290ndash311 2004

[58] P Milin L B Feldman M Ramscar P Hendrix and R HBaayen ldquoDiscrimination in lexical decisionrdquo PLoS ONE vol 12no 2 2017

[59] R H Baayen P Milin and M Ramscar ldquoFrequency in lexicalprocessingrdquo Aphasiology vol 30 no 11 pp 1174ndash1220 2016

[60] M Marelli and M Baroni ldquoAffixation in semantic space Mod-eling morpheme meanings with compositional distributionalsemanticsrdquo Psychological Review vol 122 no 3 pp 485ndash5152015

[61] T Sering PMilin andRH Baayen ldquoLanguage comprehensionas amultiple label classification problemrdquo StatisticaNeerlandicapp 1ndash15 2018

[62] R Kaye and RWilson Linear Algebra Oxford University Press1998

[63] S H Ivens and B L Koslin Demands for Reading LiteracyRequire New Accountability Methods Touchstone Applied Sci-ence Associates 1991

[64] T K Landauer P W Foltz and D Laham ldquoAn introduction tolatent semantic analysisrdquoDiscourse Processes vol 25 no 2-3 pp259ndash284 1998

[65] H Schmid ldquoImprovements in Part-of-Speech Tagging with anApplication to Germanrdquo in Proceedings of the ACL SIGDAT-Workshop pp 13ndash25 Dublin Ireland 1995

[66] R H Baayen R Piepenbrock and L Gulikers The CELEXLexical Database (CD-ROM) Linguistic Data ConsortiumUniversity of Pennsylvania Philadelphia PA 1995

[67] J Mitchell and M Lapata ldquoVector-based models of semanticcompositionrdquo in Proceedings of the ACL pp 236ndash244 2008

[68] A Lazaridou M Marelli R Zamparelli and M BaronildquoCompositionally derived representations of morphologicallycomplex words in distributional semanticsrdquo in Proceedings ofthe ACL (1) pp 1517ndash1526 2013

[69] R Cotterell H Schutze and J Eisner ldquoMorphological smooth-ing and extrapolation of word embeddingsrdquo in Proceedings ofthe 54th Annual Meeting of the Association for ComputationalLinguistics vol 1 pp 1651ndash1660 2016

[70] B D Zeller S Pado and J Snajder ldquoTowards semantic valida-tion of a derivational lexiconrdquo in Proceedings of the COLING2014 the 25th International Conference on Computational Lin-guistics Technical Papers pp 1728ndash1739 2014

[71] H Marchand The Categories and Types of Present-Day EnglishWord Formation A Synchronic-Diachronic Approach BeckrsquoscheVerlagsbuchhandlung Munchen Germany 1969

[72] L Bauer EnglishWord-Formation Cambridge University PressCambridge 1983

Complexity 37

[73] I PlagWord-Formation in English CambridgeUniversity PressCambridge 2003

[74] R Beard Lexeme-Morpheme Base Morphology A GeneralTheory of Inflection and Word Formation State University ofNew York Press Albany NY USA 1995

[75] K Geeraert J Newman and R H Baayen ldquoIdiom VariationExperimental Data and a Blueprint of a Computational ModelrdquoTopics in Cognitive Science vol 9 no 3 pp 653ndash669 2017

[76] C Shaoul S Bitschau N Schilling et al ldquondl2 Naive discrim-inative learning an implementation in R R packagerdquo 2015

[77] P Milin D Divjak and R H Baayen ldquoA learning perspectiveon individual differences in skilled reading Exploring andexploiting orthographic and semantic discrimination cuesrdquoJournal of Experimental Psychology Learning Memory andCognition vol 43 no 11 pp 1730ndash1751 2017

[78] M Ramscar P Hendrix C Shaoul P Milin and R H BaayenldquoNonlinear dynamics of lifelong learning the myth of cognitivedeclinerdquo Topics in Cognitive Science vol 6 pp 5ndash42 2014

[79] M Ramscar C C Sun P Hendrix and R H Baayen ldquoTheMis-measurement of Mind Life-Span Changes in Paired-Associate-Learning Scores Reflect the ldquoCostrdquo of Learning Not CognitiveDeclinerdquo Psychological Science vol 28 no 8 pp 1171ndash1179 2017

[80] G desRosiers and D Ivison ldquoPaired associate learning Form 1and form 2 of the Wechsler memory scalerdquo Archives of ClinicalNeuropsychology vol 3 no 1 pp 47ndash67 1988

[81] E Bruni N K Tran andM Baroni ldquoMultimodal distributionalsemanticsrdquo Journal of Artificial Intelligence Research vol 49 pp1ndash47 2014

[82] W N Venables and B D RipleyModern Applied Statistics withS-PLUS Springer NY USA 2002

[83] A B Warriner V Kuperman and M Brysbaert ldquoNorms ofvalence arousal and dominance for 13915 English lemmasrdquoBehavior Research Methods vol 45 no 4 pp 1191ndash1207 2013

[84] D Kastovsky ldquoProductivity in word formationrdquo Linguistics vol24 no 3 pp 585ndash600 1986

[85] R H Baayen Y Chuang and J P Blevins ldquoInflectionalmorphology with linear mappingsrdquoTheMental Lexicon vol 13no 2 pp 232ndash270 2018

[86] Burnage CELEX A Guide for Users Centre for Lexical Infor-mation Nijmegen The Netherlands 1988

[87] C McBride-Chang ldquoModels of Speech Perception and Phono-logical Processing in Readingrdquo Child Development vol 67 no4 pp 1836ndash1856 1996

[88] G C Van Orden H Kloos et al ldquoThe question of phonologyand readingrdquo inThe Science of Reading A Handbook pp 61ndash782005

[89] D Jared and K OrsquoDonnell ldquoSkilled adult readers activate themeanings of high-frequency words using phonology Evidencefrom eye trackingrdquoMemoryampCognition vol 45 no 2 pp 334ndash346 2017

[90] G S Dell ldquoA Spreading-Activation Theory of Retrieval inSentence Productionrdquo Psychological Review vol 93 no 3 pp283ndash321 1986

[91] A Marantz ldquoNo escape from morphemes in morphologicalprocessingrdquo Language and Cognitive Processes vol 28 no 7 pp905ndash916 2013

[92] M Bozic W D Marslen-Wilson E A Stamatakis M HDavis and L K Tyler ldquoDifferentiating morphology formand meaning Neural correlates of morphological complexityrdquoCognitive Neuroscience vol 19 no 9 pp 1464ndash1475 2007

[93] R F Port andA P Leary ldquoAgainst formal phonologyrdquoLanguagevol 81 no 4 pp 927ndash964 2005

[94] J P Blevins ldquoStems and paradigmsrdquo Language vol 79 no 4 pp737ndash767 2003

[95] R Forsyth and D Holmes ldquoFeature-finding for test classifica-tionrdquo Literary and Linguistic Computing vol 11 no 4 pp 163ndash174 1996

[96] H Pham and R H Baayen ldquoVietnamese compounds showan anti-frequency effect in visual lexical decisionrdquo LanguageCognition and Neuroscience vol 30 no 9 pp 1077ndash1095 2015

[97] L Cohen and S Dehaene ldquoVentral and dorsal contributions toword readingrdquo inTheCognitive Neurosciences M S GazzanigaEd pp 789ndash804 Massachusetts Institute of Technology 2009

[98] E Keuleers P Lacey K Rastle and M Brysbaert ldquoThe BritishLexicon Project Lexical decision data for 28730 monosyllabicand disyllabic English wordsrdquo Behavior Research Methods vol44 no 1 pp 287ndash304 2012

[99] R K Wagner J K Torgesen and C A Rashotte ldquoDevel-opment of reading-related phonological processing abilitiesnew evidence of bidirectional causality from a latent variablelongitudinal studyrdquoDevelopmental Psychology vol 30 no 1 pp73ndash87 1994

[100] K F EWong andH-C Chen ldquoOrthographic and phonologicalprocessing in reading Chinese text Evidence from eye fixa-tionsrdquo Language and Cognitive Processes vol 14 no 5-6 pp461ndash480 1999

[101] R L Newman D Jared and C A Haigh ldquoDoes phonologyplay a role when skilled readers read high-frequency wordsEvidence from ERPsrdquo Language and Cognitive Processes vol 27no 9 pp 1361ndash1384 2012

[102] D Jared J Ashby S J Agauas and B A Levy ldquoPhonologicalactivation of word meanings in grade 5 readersrdquo Journal ofExperimental Psychology LearningMemory andCognition vol42 no 4 pp 524ndash541 2016

[103] T Bitan A Kaftory A Meiri-Leib Z Eviatar and O PelegldquoPhonological ambiguity modulates resolution of semanticambiguity during reading An fMRI study of Hebrewrdquo Neu-ropsychology vol 31 no 7 pp 759ndash777 2017

[104] D Jared and S Bainbridge ldquoReading homophone puns Evi-dence from eye trackingrdquo Canadian Journal of ExperimentalPsychology vol 71 no 1 pp 2ndash13 2017

[105] S Amenta M Marelli and S Sulpizio ldquoFrom sound tomeaning Phonology-to-Semantics mapping in visual wordrecognitionrdquo Psychonomic Bulletin amp Review vol 24 no 3 pp887ndash893 2017

[106] M Perrone-Bertolotti J Kujala J R Vidal et al ldquoHow silent issilent reading intracerebral evidence for top-down activationof temporal voice areas during readingrdquo The Journal of Neuro-science vol 32 no 49 pp 17554ndash17562 2012

[107] B Yao P Belin andC Scheepers ldquoSilent reading of direct versusindirect speech activates voice-selective areas in the auditorycortexrdquo Journal of Cognitive Neuroscience vol 23 no 10 pp3146ndash3152 2011

[108] M Coltheart B Curtis P Atkins and M Haller ldquoMod-els of Reading Aloud Dual-Route and Parallel-Distributed-Processing Approachesrdquo Psychological Review vol 100 no 4pp 589ndash608 1993

[109] M Coltheart ldquoModeling reading The dual-route approachrdquo inThe Science of Reading A Handbook pp 6ndash23 2005

38 Complexity

[110] M Pitt K Johnson E Hume S Kiesling and W RaymondldquoThe Buckeye corpus of conversational speech Labeling con-ventions and a test of transcriber reliabilityrdquo Speech Communi-cation vol 45 no 1 pp 89ndash95 2005

[111] K Johnson ldquoMassive reduction in conversational AmericanEnglishrdquo in Proceedings of the Spontaneous Speech Data andAnalysis Proceedings of the 1st Session of the 10th InternationalSymposium pp 29ndash54 The National International Institute forJapanese Language Tokyo Japan 2004

[112] C Phillips ldquoLevels of representation in the electrophysiology ofspeech perceptionrdquo Cognitive Science vol 25 no 5 pp 711ndash7312001

[113] R L Diehl A J Lotto and L L Holt ldquoSpeech perceptionrdquoAnnual Review of Psychology vol 55 pp 149ndash179 2004

[114] D Norris and J M McQueen ldquoShortlist B a Bayesian model ofcontinuous speech recognitionrdquo Psychological Review vol 115no 2 pp 357ndash395 2008

[115] S Hawkins ldquoRoles and representations of systematic finephonetic detail in speech understandingrdquo Journal of Phoneticsvol 31 no 3-4 pp 373ndash405 2003

[116] CCucchiarini andH Strik ldquoAutomatic Phonetic TranscriptionAn overviewrdquo in Proceedings of the 15th ICPhS pp 347ndash350Barcelona Spain 2003

[117] K JohnsonTheAuditoryPerceptual Basis for Speech Segmenta-tion Ohio State University Working Papers in Linguistics 1997

[118] H Fletcher ldquoAuditory patternsrdquo Reviews of Modern Physics vol12 no 1 pp 47ndash65 1940

[119] D Arnold ldquoAcousticndlcoder Coding sound files for use withndl R package version 101rdquo 2017

[120] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2016

[121] J Pickett and I Pollack ldquoIntelligibility of excerpts from fluentspeech Effects of rate of utterance and duration of excerptrdquoLanguage and Speech vol 6 pp 151ndash164 1963

[122] L Shockey ldquoPerception of reduced forms by non-native speak-ers of Englishrdquo in Sound Patterns of Spontaneous Speech DDuez Ed pp 97ndash100 ESCA Aix 1998

[123] M Ernestus H Baayen and R Schreuder ldquoThe recognition ofreduced word formsrdquo Brain and Language vol 81 no 1-3 pp162ndash173 2002

[124] R H Baayen C Shaoul J Willits and M Ramscar ldquoCom-prehension without segmentation a proof of concept withnaive discriminative learningrdquo Language Cognition and Neu-roscience vol 31 no 1 pp 106ndash128 2016

[125] C P Browman and L Goldstein ldquoArticulatory phonology Anoverviewrdquo Phonetica vol 49 no 3-4 pp 155ndash180 1992

[126] J A Tourville and F H Guenther ldquoThe DIVA model A neuraltheory of speech acquisition and productionrdquo Language andCognitive Processes vol 26 no 7 pp 952ndash981 2011

[127] G Csardi and T Nepusz ldquoThe igraph software package forcomplex network researchrdquo InterJournal Complex Systems vol1695 no 5 pp 1ndash9 2006

[128] M Ernestus and R H Baayen ldquoPredicting the unpredictableInterpreting neutralized segments in Dutchrdquo Language vol 79no 1 pp 5ndash38 2003

[129] R Weingarten G Nottbusch and U Will ldquoMorphemes sylla-bles and graphemes in written word productionrdquo inMultidisci-plinary Approaches to Speech Production T Pechmann and CHabel Eds pp 529ndash572 Mouton de Gruyter Berlin Germany2004

[130] R Bertram F E Toslashnnessen S Stromqvist J Hyona andP Niemi ldquoCascaded processing in written compound wordproductionrdquo Frontiers in Human Neuroscience vol 9 pp 1ndash102015

[131] T Cho ldquoEffects of morpheme boundaries on intergesturaltiming Evidence from Koreanrdquo Phonetica vol 58 no 3 pp129ndash162 2001

[132] S N Wood Generalized Additive Models Chapman ampHallCRC NY USA 2017

[133] M Seidenberg ldquoSublexical structures in visual word recogni-tion Access units or orthographic redundancyrdquo in Attentionand Performance XII M Coltheart Ed pp 245ndash264 LawrenceErlbaum Associates Hove 1987

[134] J M McQueen ldquoSegmentation of continuous speech usingphonotacticsrdquo Journal of Memory and Language vol 39 no 1pp 21ndash46 1998

[135] J B Hay Causes and Consequences of Word Structure Rout-ledge NY USA London UK 2003

[136] J B Hay ldquoFrom speech perception to morphology Affix-ordering revisitedrdquo Language vol 78 pp 527ndash555 2002

[137] J B Hay and R H Baayen ldquoPhonotactics parsing and produc-tivityrdquo Italian Journal of Linguistics vol 15 no 1 pp 99ndash1302003

[138] J Hay J Pierrehumbert and M Beckman ldquoSpeech perceptionwell-formedness and the statistics of the lexiconrdquo Papers inlaboratory phonology VI pp 58ndash74 2004

[139] M Ferro C Marzi and V Pirrelli ldquoA self-organizing modelof word storage and processing implications for morphologylearningrdquo Lingue e Linguaggio vol 10 no 2 pp 209ndash226 2011

[140] F ChersiM Ferro G Pezzulo andV Pirrelli ldquoTopological self-organization and prediction learning support both action andlexical chains in the brainrdquo Topics in Cognitive Science vol 6no 3 pp 476ndash491 2014

[141] V Pirrelli M Ferro and C Marzi ldquoComputational complexityof abstractive morphologyrdquo in Understanding and MeasuringMorphological Complexity M Bearman D Brown and G GCorbett Eds pp 141ndash166 Oxford University Press 2015

[142] C Marzi M Ferro and V Pirrelli ldquoIs inflectional irregularitydysfunctional to human processingrdquo in Abstract Booklet TheMental Lexicon 2018 V Kuperman Ed University of AlbertaEdmonton 2018

[143] E Keuleers M Stevens P Mandera and M Brysbaert ldquoWordknowledge in the crowd Measuring vocabulary size and wordprevalence in a massive online experimentrdquo The QuarterlyJournal of Experimental Psychology vol 68 no 8 pp 1665ndash16922015

[144] M Ernestus Voice Assimilation And Segment Reduction inCasual Dutch A Corpus-Based Study of the Phonology-PhoneticsInterface LOT Utrecht the Netherlands 2000

[145] R Kemps M Ernestus R Schreuder and H Baayen ldquoProcess-ing reducedword formsThe suffix restoration effectrdquoBrain andLanguage vol 90 no 1-3 pp 117ndash127 2004

[146] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) vol 1 pp 886ndash893 June 2005

[147] G Hickok ldquoThe architecture of speech production and the roleof the phoneme in speech processingrdquo Language Cognition andNeuroscience vol 29 no 1 pp 2ndash20 2014

[148] B V Tucker M Sims and R H Baayen Opposing Forceson Acoustic Duration Manuscript University of Alberta andUniversity of Tubingen 2018

Complexity 39

[149] I Plag J Homann and G Kunter ldquoHomophony and mor-phology The acoustics of word-final S in Englishrdquo Journal ofLinguistics vol 53 no 1 pp 181ndash216 2017

[150] F Tomaschek I Plag M Ernestus and R H BaayenModeling the Duration of Word-Final s in English withNaive Discriminative Learning Manuscript University ofSiegenTubingenNijmegen 2018

[151] R E Kalman ldquoA new approach to linear filtering and predictionproblemsrdquo Journal of Fluids Engineering vol 82 no 1 pp 35ndash451960

[152] B Russell ldquoOn denotingrdquo Mind vol 14 no 56 pp 479ndash4931905

[153] N Hornstein Logical Form FromGB toMinimalism Blackwell1995

[154] R Jackendoff Semantic StructuresMITPress Cambridge 1990[155] J J McCarthy ldquoA prosodic theory of non-concatenative mor-

phologyrdquo Linguistic Inquiry vol 12 pp 373ndash418 1981[156] A Ussishkin ldquoA Fixed Prosodic Theory of Nonconcatenative

Templaticmorphologyrdquo Natural Language amp Linguistic Theoryvol 23 no 1 pp 169ndash218 2005

[157] A Ussishkin ldquoAffix-favored contrast inequity and psycholin-guistic grounding for non-concatenative morphologyrdquo Mor-phology vol 16 no 1 pp 107ndash125 2006

[158] RW Young andWMorganTheNavajo Language A Grammarand Colloquial Dictionary University of New Mexico Press1980

[159] S C Levinson and A Majid ldquoThe island of time Yelı Dnye thelanguage of Rossel islandrdquo Frontiers in psychology vol 4 2013

[160] S C Levinson ldquoThe language of space in Yelı Dnyerdquo inGrammars of Space Explorations in Cognitive Diversity pp 157ndash203 Cambridge University Press 2006

[161] E Shafaei-Bajestan and R H Baayen ldquoWide Learning forAuditory Comprehensionrdquo in Proceedings of the Interspeech2018 2018

[162] J Hay and K Drager ldquoStuffed toys and speech perceptionrdquoLinguistics vol 48 no 4 pp 865ndash892 2010

[163] V Henri and C Henri ldquoOn our earliest recollections ofchildhoodrdquo Psychological Review vol 2 pp 215-216 1895

[164] S Freud ldquoChildhood and concealing memoriesrdquo in The BasicWritings of Sigmund Freud The Modern Library NY USA19051953

[165] P J Bauer and M Larkina ldquoThe onset of childhood amnesiain childhood A prospective investigation of the course anddeterminants of forgetting of early-life eventsrdquoMemory vol 22no 8 pp 907ndash924 2014

[166] M S Seidenberg and L M Gonnerman ldquoExplaining deriva-tional morphology as the convergence of codesrdquo Trends inCognitive Sciences vol 4 no 9 pp 353ndash361 2000

[167] M Coltheart K Rastle C Perry R Langdon and J ZieglerldquoDRC a dual route cascaded model of visual word recognitionand reading aloudrdquoPsychological Review vol 108 no 1 pp 204ndash256 2001

[168] M V Butz and E F Kutter How the Mind Comes into BeingIntroducing Cognitive Science from a Functional and Computa-tional Perspective Oxford University Press 2016

[169] M W Harm and M S Seidenberg ldquoPhonology readingacquisition and dyslexia Insights from connectionist modelsrdquoPsychological Review vol 106 no 3 pp 491ndash528 1999

[170] L M Gonnerman M S Seidenberg and E S AndersenldquoGraded semantic and phonological similarity effects in prim-ing Evidence for a distributed connectionist approach to

morphologyrdquo Journal of Experimental Psychology General vol136 no 2 pp 323ndash345 2007

[171] H Zhao O Gallo I Frosio and J Kautz ldquoLoss Functions forImage Restoration With Neural Networksrdquo IEEE Transactionson Computational Imaging vol 3 no 1 pp 47ndash57 2017

[172] C Clarke and P Luce ldquoPerceptual adaptation to speakercharacteristics Vot boundaries in stop voicing categorizationrdquoin Proceedings of the ISCA Workshop on Plasticity in SpeechPerception 2005

[173] DNorris JMMcQueen andACutler ldquoPerceptual learning inspeechrdquo Cognitive Psychology vol 47 no 2 pp 204ndash238 2003

[174] A Schweitzer and N Lewandowski ldquoSocial factors in conver-gence of f1 and f2 in spontaneous speechrdquo in Proceedings ofthe 10th International Seminar on Speech Production CologneGermany 2014

[175] M J Pickering and S Garrod ldquoToward a mechanistic psychol-ogy of dialoguerdquo Behavioral and Brain Sciences vol 27 no 2 pp169ndash190 2004

[176] C M Clarke and M F Garrett ldquoRapid adaptation to foreign-accented Englishrdquo The Journal of the Acoustical Society ofAmerica vol 116 no 6 pp 3647ndash3658 2004

[177] W J Levelt ldquoDie konnektionistische moderdquo Sprache UndKognition vol 10 no 2 pp 61ndash72 1991

[178] A Kostic ldquoInformational load constraints on processinginflected morphologyrdquo in Morphological Aspects of LanguageProcessing L B Feldman Ed Lawrence Erlbaum Inc Publish-ers New Jersey USA 1995

[179] J P BlevinsWord andParadigmMorphology OxfordUniversityPress 2016

[180] E Forstemann Uber Deutsche volksetymologie Zeitschrift furvergleichende Sprachforschung auf dem Gebiete des DeutschenGriechischen und Lateinischen 1852

[181] K Nault Morphological Therapy Protocal [Ph D thesis] Uni-versity of Alberta Edmonton 2010

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 30: The Discriminative Lexicon: A Unified Computational Model ...downloads.hindawi.com/journals/complexity/2019/4895891.pdf · as developed within computational linguistics and com-puter

30 Complexity

considerably over time as experience accumulates If this isindeed the case the semantic vectors for the lexomes foryoung children are different from the corresponding lexomesfor older children which will again differ from those ofadults As a consequence autobiographical memories thatwere anchored to the lexomes at a young age can no longerbe accessed as the semantic vectors to which these memorieswere originally anchored no longer exist

No examplars Given how we estimate the transforma-tion matrices with which we navigate between form andmeaning it might seem that our approach is in essence animplementation of an exemplar-based theory of the mentallexicon After all for instance the C119886 matrix specifies foreach auditory word form (exemplar) its pertinent frequencyband summary features However the computational imple-mentation of the present study should not be confused withthe underlying algorithmic conceptualization The underly-ing conceptualization is that learning proceeds incrementallytrial by trial with the weights of transformation networksbeing updated with the learning rule of Widrow and Hoff[44] In the mean trial-by-trial updating with the Widrow-Hoff learning rule results in the same expected connectionweights as obtained by the present estimates using thegeneralized inverse The important conceptual advantage ofincremental learning is however that there is no need to storeexemplars By way of example for auditory comprehensionacoustic tokens are given with the input to the learningsystemThese tokens the ephemeral result of interactingwiththe environment leave their traces in the transformationmatrix F119886 but are not themselves stored The same holdsfor speech production We assume that speakers dynamicallyconstruct the semantic vectors from their past experiencesrather than retrieving these semantic vectors from somedictionary-like fixed representation familiar from currentfile systems The dynamic nature of these semantic vectorsemerged loud and clear from the modeling of novel complexwords for speech production under cross-validation In otherwords our hypothesis is that all input and output vectors areephemeral in the sense that they represent temporary statesof the cognitive system that are not themselves represented inthat system but that leave their traces in the transformationmatrices that jointly characterize and define the cognitivesystem that we approximate with the discriminative lexicon

This dynamic perspective on the mental lexicon asconsisting of several coupled networks that jointly bringthe cognitive system into states that in turn feed into fur-ther networks (not modeled here) for sentence integration(comprehension) and articulation (production) raises thequestion about the status of units in this theory Units playan important role when constructing numeric vectors forform and meaning Units for letter trigrams and phonetrigrams are of course context-enriched letter and phoneunits Content lexomes as well as lexomes for derivationaland inflectional functions are crutches for central semanticfunctions ranging from functions realizing relatively concreteonomasiological functions (lsquothis is a dog not a catrsquo) toabstract situational functions allowing entities events andstates to be specified on the dimensions of time spacequantity aspect etc However crucial to the central thrust of

the present study there are no morphological symbols (unitscombining form andmeaning) normorphological formunitssuch as stems and exponents of any kind in the modelAbove we have outlined the many linguistic considerationsthat have led us not to want to incorporate such units aspart of our theory Importantly as there are no hidden layersin our model it is not possible to seek for lsquosubsymbolicrsquoreflexes of morphological units in patterns of activation overhidden layers Here we depart from earlier connectionistmodels which rejected symbolic representations but retainedthe belief that there should be morphological representationalbeit subsymbolic ones Seidenberg and Gonnerman [166]for instance discuss morphological representations as beinglsquointerlevel representationsrsquo that are emergent reflections ofcorrelations among orthography phonology and semanticsThis is not to say that themore agglutinative a language is themore the present model will seem to the outside observer tobe operating withmorphemes But themore a language tendstowards fusional or polysyntheticmorphology the less strongthis impression will be

Model complexity Next consider the question of howdifficult lexical processing actually is Sidestepping the com-plexities of the neural embedding of lexical processing [126147] here we narrow down this question to algorithmiccomplexity State-of-the-art models in psychology [37 114167] implementmany layers of hierarchically organized unitsand many hold it for an established fact that such units arein some sense psychologically real (see eg [19 91 168])However empirical results can be equally well understoodwithout requiring the theoretical construct of the morpheme(see eg [47 54 58 166 169 170]) The present studyillustrates this point for lsquoboundaryrsquo effects in written andoral production When paths in the triphone graph branchuncertainty increases and processing slows down Althoughobserved lsquoboundaryrsquo effects are compatible with a post-Bloomfieldian lexicon the very same effects arise in ourmodel even though morphemes do not figure in any way inour computations

Importantly the simplicity of the linear mappings in ourmodel of the discriminative lexicon allows us to sidestep theproblem that typically arise in computationally implementedfull-scale hierarchical systems namely that errors arisingat lower levels propagate to higher levels The major stepsforward made in recent end-to-end models in languageengineering suggest that end-to-end cognitive models arealso likely to perform much better One might argue thatthe hidden layers of deep learning networks represent thetraditional lsquohidden layersrsquo of phonemes morphemes andword forms mediating between form and meaning in lin-guistic models and offshoots thereof in psychology Howeverthe case of baboon lexical learning [12 13] illustrates thatthis research strategy is not without risk and that simplerdiscriminative networks can substantially outperform deeplearning networks [15] We grant however that it is conceiv-able that model predictions will improve when the presentlinear networks are replaced by deep learning networks whenpresented with the same input and output representationsused here (see Zhao et al [171] for a discussion of lossfunctions for deep learning targeting numeric instead of

Complexity 31

categorical output vectors) What is unclear is whethersuch networks will improve our understandingmdashthe currentlinear networks offer the analyst connection weights betweeninput and output representations that are straightforwardlylinguistically interpretable Where we think deep learningnetworks really will come into their own is bridging the gapbetween for instance the cochlea and retina on the one handand the heuristic representations (letter trigrams) that wehave used as a starting point for our functional theory of themappings between form and meaning

Network flexibility An important challenge for deeplearning networks designed to model human lexical process-ing is to keep the networks flexible and open to rapid updat-ing This openness to continuous learning is required notonly by findings on categorization [40 41 41 42] includingphonetic categorization [172 173] but is also visible in timeseries of reaction times Baayen and Hendrix [17] reportedimproved prediction of visual lexical decision reaction timesin the British Lexicon Project when a naive discriminativelearning network is updated from trial to trial simulating thewithin-experiment learning that goes on as subjects proceedthrough the experiment In natural speech the phenomenonof phonetic convergence between interlocutors [174] likewisebears witness to a lexicon that is constantly recalibratedin interaction with its environment (for the many ways inwhich interlocutorsrsquo speech aligns see [175]) Likewise therapidity with which listeners adapt to foreign accents [176]within a few minutes shows that lexical networks exhibitfast local optimization As deep learning networks typicallyrequire huge numbers of cycles through the training dataonce trained they are at risk of not having the requiredflexibility for fast local adaptation This risk is reduced for(incremental) wide learning as illustrated by the phoneticconvergence evidenced by the model of Arnold et al [18] forauditory comprehension (a very similar point was made byLevelt [177] in his critique of connectionist models in thenineties)

Open questions This initial study necessarily leavesmany questions unanswered Compounds and inflectedderived words have not been addressed and morphologicaloperations such as reduplication infixation and nonconcate-native morphology provide further testing grounds for thepresent approach We also need to address the processingof compounds with multiple constituents ubiquitous inlanguages as different as German Mandarin and EstonianFurthermore in actual utterances words undergo assimila-tion at their boundaries and hence a crucial test case forthe discriminative lexicon is to model the production andcomprehension of words in multi-word utterances For ourproduction model we have also completely ignored stresssyllabification and pitch which however likely has renderedthe production task more difficult than necessary

A challenge for discriminative morphology is the devel-opment of proper semantic matrices For languages withrich morphology such as Estonian current morphologicalparsers will identify case-inflected words as nominativesgenitives or partitives (among others) but these labelsfor inflectional forms do not correspond to the semanticfunctions encoded in these forms (see eg [178 179]) What

is required are computational tools that detect the appropriateinflectional lexomes for these words in the sentences orutterances in which they are embedded

A related problem specifically for the speech productionmodel is how to predict the strongly reduced word formsthat are rampant in conversational speech [111] Our auditorycomprehension network F119886 is confronted with reduced formsin training and the study of Arnold et al [18] indicates thatpresent identification accuracymay approximate human per-formance for single-word recognition Whereas for auditorycomprehension we are making progress what is required forspeech production is a much more detailed understanding ofthe circumstances under which speakers actually use reducedword forms The factors driving reduction may be in partcaptured by the semantic vectors but we anticipate thataspects of the unfolding discourse play an important role aswell which brings us back to the unresolved question of howto best model not isolated words but words in context

An important challenge for our theory which formalizesincremental implicit learning is how to address and modelthe wealth of explicit knowledge that speakers have abouttheir language We play with the way words sound in poetrywe are perplexed by wordsrsquo meanings and reinterpret themso that they make more sense (eg the folk etymology [180]reflected in modern English crawfish a crustacean and nota fish originally Middle English crevis compare ecrevissein French) we teach morphology in schools and we evenfind that making morphological relations explicit may bebeneficial for aphasic patients [181] The rich culture of worduse cannot but interact with the implicit system but howexactly this interaction unfolds andwhat its consequences areat both sides of the consciousunconscious divide is presentlyunclear

Although many questions remain our model is remark-ably successful in capturing many aspects of lexical andmorphological processing in both comprehension and pro-duction Since themodel builds on a combination of linguisti-callymotivated lsquosmartrsquo low-level representations for form andmeaning and large but otherwise straightforward andmathe-matically well-understood two-layer linear networks (to beclear networks without any hidden layers) the conclusionseems justified that algorithmically the ldquomental lexiconrdquomaybe much simpler than previously thought

Appendix

A Vectors Matrices and MatrixMultiplication

Figure 12 presents two data points 119886 and 119887 with coordinates(3 4) and (-2 -3) Arrows drawn from the origin to thesepoints are shown in blue with a solid and dashed linerespectively We refer to these points as vectors Vectors aredenoted with lower case letters in bold and we place the 119909coordinate of a point next to the 119910 coordinate

a = (3 4) b = (minus2 minus3) (A1)

32 Complexity

minus4 minus2 0 2 4

minus4

minus2

02

4

x

y

a

b

x

y

Figure 12 Points 119886 and 119887 can be transformed into points 119901 and 119902 bya linear transformation

We can represent a and b jointly by placing them next to eachother in a matrix For matrices we use capital letters in boldfont

A = ( 3 4minus2 minus3) (A2)

In addition to the points 119886 and 119887 Figure 12 also shows twoother datapoints 119909 and 119910 The matrix for these two points is

B = (minus5 24 minus1) (A3)

Let us assume that points 119886 and 119887 represent the forms oftwo words 1205961 and 1205962 and that the points 119909 and 119910 representthe meanings of these two words (below we discuss in detailhow exactly wordsrsquo forms and meanings can be representedas vectors of numbers) As morphology is the study of therelation between wordsrsquo forms and their meanings we areinterested in how to transform points 119886 and b into points119909 and 119910 and likewise in how to transform points 119909 and119910 back into points 119886 and 119887 The first transformation isrequired for comprehension and the second transformationfor production

We begin with the transformation for comprehensionFormally we have matrix A which we want to transforminto matrix B A way of mapping A onto B is by means ofa linear transformation using matrix multiplication For 2 times2 matrices matrix multiplication is defined as follows(1198861 11988621198871 1198872) (1199091 11990921199101 1199102) = (11988611199091 + 11988621199101 11988611199092 + 1198862119910211988711199091 + 11988721199101 11988711199092 + 11988721199102) (A4)

Such a mapping of A onto B is given by a matrix F

F = ( 1 2minus2 minus1) (A5)

and it is straightforward to verify that indeed( 3 4minus2 minus3) ( 1 2minus2 minus1) = (minus5 24 minus1) (A6)

Using matrix notation we can write

AF = B (A7)

Given A and B how do we obtain F The answeris straightforward but we need two further concepts theidentity matrix and the matrix inverse For multiplication ofnumbers the identity multiplication is multiplication with 1For matrices multiplication with the identity matrix I leavesthe locations of the points unchangedThe identitymatrix hasones on the main diagonal and zeros elsewhere It is easilyverified that indeed(1 00 1) (1199091 11990921199101 1199102) = (1199091 11990921199101 1199102) (1 00 1) = (1199091 11990921199101 1199102) (A8)

For numbers the inverse of multiplication with 119904 is dividingby 119904 (1119904 ) (119904119909) = 119909 (A9)

For matrices the inverse of a square matrix X is that matrixXminus1 such that their product is the identity matrix

Xminus1X = XXminus1 = I (A10)

For nonsquare matrices the inverse Yminus1 is defined such that

Yminus1 (YX) = (XY) 119884minus1 = X (A11)

We find the square matrix F that maps the square matrices Aonto B as follows

AF = B

Aminus1AF = Aminus1B

IF = Aminus1B

F = Aminus1B(A12)

Since for the present example A and its inverse happen to beidentical (Aminus1 = A) we obtain (A6)

Fmaps wordsrsquo form vectors onto wordsrsquo semantic vectorsLet us now consider the reverse and see how we can obtainwordsrsquo form vectors from wordsrsquo semantic vectors That isgiven B we want to find that matrix G which maps B ontoA ie

BG = A (A13)

Solving this equation proceeds exactly as above

BG = A

Bminus1BG = Bminus1A

IG = Bminus1A

G = Bminus1A(A14)

Complexity 33

The inverse of B is

Bminus1 = (13 2343 53) (A15)

and hence we have

(13 2343 53) ( 3 4minus2 minus3) = (minus13 minus2323 13 ) = G (A16)

The inverse of a matrix needs not exist A square matrixthat does not have an inverse is referred to as a singularmatrix Almost all matrices with which we work in theremainder of this study are singular For singular matricesan approximation of the inverse can be used such as theMoore-Penrose generalized inverse (In R the generalizedinverse is available in theMASS package function ginv Thenumeric library that is used by ginv is LAPACK availableat httpwwwnetliborglapack) In this study wedenote the generalized inverse of a matrix X by X1015840

The examples presented here are restricted to 2 times 2matrices but themathematics generalize tomatrices of largerdimensions When multiplying two matrices X and Y theonly constraint is that the number of columns ofX is the sameas the number of rows ofYThe resultingmatrix has the samenumber of rows as X and the same number of columns as Y

In the present study the rows of matrices represent wordsand columns the features of thesewords For instance the rowvectors of A can represent wordsrsquo form features and the rowvectors of and B the corresponding semantic features Thefirst row vector of A is mapped onto the first row vector ofB as follows (3 4) ( 1 2minus2 minus1) = (minus5 2) (A17)

A transformationmatrix such as F can be represented as atwo-layer networkThe network corresponding to F is shownin Figure 13 When the input vector (3 4) is presented to thenetwork the nodes 1198941 and 1198942 are set to 3 and 4 respectively Toobtain the activations 1199001 = minus5 and 1199002 = 2 for the output vector(minus5 2) we cumulate the evidence at the input weighted bythe connection strengths specified on the edges of the graphThus for 1199001 we obtain the value 3 times 1 + 4 times minus2 = minus5 and for1199002 we have 3 times 2 + 4 times minus1 = 2 In other words the networkmaps the input vector (3 4) onto the output vector (minus5 2)

An important property of linear maps is that they areproductive We can present a novel form vector say

s = (2 2) (A18)

to the network and it will map this vector onto a novelsemantic vector Using matrices this new semantic vector isstraightforwardly calculated(2 2) ( 1 2minus2 minus1) = (minus2 2) (A19)

1 2 minus2 minus1

i1 i2

o1 o2

semantic vectors come out here

form features go in here

connection weightsare the transformationmatrix

Figure 13 The linear network corresponding to the transformationmatrix F There are two input nodes 1198941 and 1198942 (in blue)and twooutput nodes 1199001 and 1199002 in red When the input vector (3 4) ispresented to the network the value at 1199001 is 3 times 1 + 4 times minus2 = minus5and that at 1199002 is 3 times 2 + 4 times minus1 = 2 The network maps the point(3 4) onto the point (minus5 2)B Graph-Based Triphone Sequencing

The hypothesis underlying graph-based triphone sequencingis that in the directed graph that has triphones as verticesand an edge between any two vertices that overlap (iesegments 2 and 3 of triphone 1 are identical to segments 1and 2 of triphone 2) the path from a wordrsquos initial triphone(the triphone starting with ) to a wordrsquos final triphone (thetriphone ending with a ) is the path receiving the bestsupport of all possible paths from the initial to the finaltriphone Unfortunately the directed graph containing allvertices and edges is too large tomake computations tractablein a reasonable amount of computation time The followingalgorithm is a heuristic algorithm that first collects potentiallyrelevant vertices and edges and then calculates all pathsstarting fromany initial vertex using theall simple pathsfunction from the igraph package [127] As a first step allvertices that are supported by the stem and whole wordwith network support exceeding 01 are selected To theresulting set those vertices are added that are supported bythe affix which is conditional on these vertices having anetwork support exceeding 095 A directed graph can nowbe constructed and for each initial vertex all paths startingat these vertices are calculated The subset of those pathsreaching a final vertex is selected and the support for eachpath is calculated As longer paths trivially will have greatersupport the support for a path is weighted for its lengthsimply by dividing the raw support by the path length Pathsare ranked by weighted path support and the path withmaximal support is selected as the acoustic image drivingarticulation

It is possible that no path from an initial vertex to a finalvertex is found due to critical boundary triphones not beinginstantiated in the data on which the model was trainedThis happens when the model is assessed on novel inflected

34 Complexity

nobo

dyso

met

hing

ever

ythi

ngno

thin

gev

eryo

nean

ythi

ngan

yone

som

eone

hers

him his

via

than I

exce

ptou

tside

thei

rsw

hoev

erth

ence

min

ein

side

amon

gst its

thei

rou

rm

yyo

uyo

ur her

she

your

selv

esw

hate

ver us we it

they he

them m

em

yself

him

self

hers

elfou

rselv

esyo

urse

lfits

elfth

emse

lves

abou

tw

ithou

tw

ithin

beyo

ndam

ong

betw

een

thro

ugho

utdu

ring

since by

from in of

with at fo

r tobe

low

abov

e up offdo

wn

upon on

agai

nst

tow

ard

tow

ards

arou

ndne

aral

ong

unde

rbe

hind

besid

eth

roug

hon

toin

toov

erac

ross

nobodysomethingeverythingnothingeveryoneanythinganyonesomeonehershimhisviathanIexceptoutsidetheirswhoeverthencemineinsideamongstitstheirourmyyouyourhersheyourselveswhateverusweittheyhethemmemyselfhimselfherselfourselvesyourselfitselfthemselvesaboutwithoutwithinbeyondamongbetweenthroughoutduringsincebyfrominofwithatfortobelowaboveupoffdownupononagainsttowardtowardsaroundnearalongunderbehindbesidethroughontointooveracross

minus01 0 01 02Value

Color Keyand Histogram

050

015

0025

00C

ount

Figure 14 Heatmap for the correlations of the semantic vectors of pronouns and prepositions Note that quantifier pronouns form a groupof their own that prepositions and the other pronouns form distinguisable groups and that within groups further subclusters are visible egforwe and us she and her and across and over All diagonal elements represent correlations equal to 1 (this is not brought out by the heatmapwhich color-codes diagonal elements with the color for the highest correlation of the range we specified 025)

and derived words under cross-validationTherefore verticesand edges are added to the graph for all nonfinal verticesthat are at the end of any of the paths starting at any initialtriphone Such novel vertices and edges are assigned zeronetwork support Typically paths from the initial vertex tothe final vertex can now be constructed and path support isevaluated as above

For an algorithm that allows triphone paths to includecycles which may be the case in languages with much richer

morphology than English see Baayen et al [85] and for codethe R packageWpmWithLdl described therein

C A Heatmap for Function Words(Pronouns and Prepositions)

The heatmap for the correlations of the semantic vectors ofpronouns and prepositions is presented in Figure 14

Complexity 35

Data Availability

With the exception of the data from theDistributed Little RedHen Lab all primary data are publicly available from the citedsources

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The authors are indebted to Geoff Hollis Jessie Nixon ChrisWestbury and LuisMienhardt for their constructive feedbackon earlier versions of this manuscript This research wassupported by an ERC advancedGrant (no 742545) to the firstauthor

References

[1] G Frege ldquoBegriffsschrift a formula language modeled uponthat of arithmetic for pure thoughtrdquo in From Frege to GodelA Source Book in Mathematical Logic 1879-1931 pp 1ndash82 1967

[2] B RussellAn Inquiry intoMeaning andTruth Allen andUnwinLondon UK 1942

[3] R Montague ldquoThe proper treatment of quantification in ordi-nary englishrdquo in Approaches to Natural Language pp 221ndash242Springer 1973

[4] N Chomsky Syntactic Structures vol 33 Mouton The Hague1957

[5] W J M Levelt Formal Grammars in Linguistics And Psycholin-guistics Volume 1 An Introduction to the Theory of FormalLanguages and Automata Volume 2 Applications in LinguisticTheory Volume 3 Psycholinguistic Applications John BenjaminsPublishing 2008

[6] D F Kleinschmidt and T Florian Jaeger ldquoRobust speechperception Recognize the familiar generalize to the similar andadapt to the novelrdquo Psychological Review vol 122 no 2 pp 148ndash203 2015

[7] L Breiman ldquoRandom forestsrdquo Machine Learning vol 45 no 1pp 5ndash32 2001

[8] C Strobl A-L Boulesteix A Zeileis and T Hothorn ldquoBiasin random forest variable importance measures illustrationssources and a solutionrdquo BMC Bioinformatics vol 8 2007

[9] A Hannun C Case J Casper et al ldquoDeep speechScaling up end-to-end speech recognitionrdquo 2014httpsarxivorgabs14125567

[10] Z Tuske P Golik R Schluter and H Ney ldquoAcoustic modelingwith deep neural networks using raw time signal for lvcsrrdquo inProceedings of the INTERSPEECH pp 890ndash894 2014

[11] J Schmidhuber ldquoDeep learning in neural networks anoverviewrdquo Neural Networks vol 61 pp 85ndash117 2015

[12] T Hannagan J C Ziegler S Dufau J Fagot and J GraingerldquoDeep learning of orthographic representations in baboonsrdquoPLoS ONE vol 9 no 1 2014

[13] J Grainger S Dufau M Montant J C Ziegler and J FagotldquoOrthographic processing in baboons (Papio papio)rdquo Sciencevol 336 no 6078 pp 245ndash248 2012

[14] D Scarf K Boy A U Reinert J Devine O Gunturkun andM Colombo ldquoOrthographic processing in pigeons (columba

livia)rdquo Proceedings of the National Academy of Sciences vol 113no 40 pp 11272ndash11276 2016

[15] M Linke F Broker M Ramscar and R H Baayen ldquoArebaboons learning ldquoorthographicrdquo representations Probablynotrdquo PLoS ONE vol 12 no 8 2017

[16] D H Hubel and T N Wiesel ldquoReceptive fields binocularinteraction and functional architecture in the catrsquos visualcortexrdquo The Journal of Physiology vol 160 pp 106ndash154 1962

[17] R H Baayen and P Hendrix ldquoTwo-layer networks non-linear separation and human learningrdquo in From Semantics toDialectometry Festschrift in honor of JohnNerbonne Tributes 32M Wieling M Kroon G Van Noord and G Bouma Eds pp13ndash22 College Publications 2017

[18] D Arnold F Tomaschek K Sering F Lopez and R H BaayenldquoWords from spontaneous conversational speech can be rec-ognized with human-like accuracy by an error-driven learningalgorithm that discriminates between meanings straight fromsmart acoustic features bypassing the phoneme as recognitionunitrdquo PLoS ONE vol 12 no 4 2017

[19] P Zwitserlood ldquoProcessing and representation of morphologi-cal complexity in native language comprehension and produc-tionrdquo in The Construction of Words G E Booij Ed pp 583ndash602 Springer 2018

[20] G Booij ldquoConstruction morphologyrdquo Language and LinguisticsCompass vol 4 no 7 pp 543ndash555 2010

[21] J P Blevins ldquoWord-based morphologyrdquo Journal of Linguisticsvol 42 no 3 pp 531ndash573 2006

[22] G Stump Inflectional Morphology A Theory of ParadigmStructure Cambridge University Press 2001

[23] R Beard ldquoOn the extent and nature of irregularity in thelexiconrdquo Lingua vol 42 no 4 pp 305ndash341 1977

[24] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1974

[25] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1991

[26] C Hockett ldquoThe origin of speechrdquo Scientific American vol 203pp 89ndash97 1960

[27] M Erelt Estonian Language M Erelt Ed Estonian AcademyPublishers Tallinn Estonia 2003

[28] P H Matthews Grammatical Theory in the United States fromBloomfield to Chomsky Cambridge University Press Cam-bridge 1993

[29] N Chomsky andMHalleTheSound Pattern of English Harperand Row NY USA 1968

[30] T Landauer and S Dumais ldquoA solution to Platorsquos problem Thelatent semantic analysis theory of acquisition induction andrepresentation of knowledgerdquo Psychological Review vol 104 no2 pp 211ndash240 1997

[31] W Weaver ldquoTranslationrdquo in Machine Translation of LanguagesFourteen Essays W N Locke and A D Booth Eds pp 15ndash23MIT Press Cambridge 1955

[32] J R Firth Selected Papers of J R Firth 1952-59 IndianaUniversity Press 1968

[33] K Lund and C Burgess ldquoProducing high-dimensional seman-tic spaces from lexical co-occurrencerdquo Behavior Research Meth-ods Instruments and Computers vol 28 no 2 pp 203ndash2081996

[34] C Shaoul and C Westbury ldquoExploring lexical co-occurrencespace using HiDExrdquo Behavior Research Methods vol 42 no 2pp 393ndash413 2010

36 Complexity

[35] T Mikolov I Sutskever K Chen G Corrado and J DeanldquoDistributed representations of words and phrases and theircompositionalityrdquo inAdvances in Neural Information ProcessingSystems pp 3111ndash3119 2013

[36] A Roelofs ldquoThe WEAVER model of word-form encoding inspeech productionrdquo Cognition vol 64 no 3 pp 249ndash284 1997

[37] W J M Levelt A Roelofs and A S Meyer ldquoA theory of lexicalaccess in speech productionrdquoBehavioral and Brain Sciences vol22 no 1 pp 1ndash38 1999

[38] M Taft ldquoInteractive-activation as a Framework for Under-standing Morphological Processingrdquo Language and CognitiveProcesses vol 9 no 3 pp 271ndash294 1994

[39] R Schreuder and R H Baayen ldquoModeling morphologicalprocessingrdquo inMorphological Aspects of Language Processing LB Feldman Ed pp 131ndash154 Lawrence ErlbaumHillsdale NewJersey USA 1995

[40] B C Love D L Medin and T M Gureckis ldquoSUSTAIN ANetwork Model of Category Learningrdquo Psychological Reviewvol 111 no 2 pp 309ndash332 2004

[41] C J Marsolek ldquoWhat antipriming reveals about primingrdquoTrends in Cognitive Sciences vol 12 no 5 pp 176ndash181 2008

[42] M Ramscar and R Port ldquoCategorization (without categories)rdquoin Handbook of Cognitive Linguistics E Dabrowska and DDivjak Eds pp 75ndash99 De Gruyter Berlin Germany 2015

[43] R A Rescorla and A R Wagner ldquoA theory of Pavlovianconditioning Variations in the effectiveness of reinforcementand nonreinforcementrdquo in Classical Conditioning II CurrentResearch and Theory A H Black and W F Prokasy Eds pp64ndash99 Appleton Century Crofts NY USA 1972

[44] B Widrow and M E Hoff ldquoAdaptive switching circuitsrdquo 1960WESCON Convention Record Part IV pp 96ndash104 1960

[45] R H Baayen R Schreuder and R Sproat ldquoMorphology inthe Mental Lexicon A Computational Model for Visual WordRecognitionrdquo in Lexicon Development for Speech and LanguageProcessing F van Eynde and D Gibbon Eds pp 267ndash291Kluwer Academic Publishers 2000

[46] M Taft ldquoA morphological-decomposition model of lexicalrepresentationrdquo Linguistics vol 26 no 4 pp 657ndash667 1988

[47] MWHarmandM S Seidenberg ldquoComputing themeanings ofwords in reading Cooperative division of labor between visualand phonological processesrdquo Psychological Review vol 111 no3 pp 662ndash720 2004

[48] P C Trimmer J M McNamara A Houston and J A RMarshall ldquoDoes natural selection favour the Rescorla-Wagnerrulerdquo Journal of Theoretical Biology vol 302 pp 39ndash52 2012

[49] R R Miller R C Barnet and N J Grahame ldquoAssessment ofthe Rescorla-Wagner modelrdquo Psychological Bulletin vol 117 no3 pp 363ndash386 1995

[50] L J Kamin ldquoPredictability surprise attention and condition-ingrdquo in Punishment and Aversive Behavior B A Campbell andRMChurch Eds pp 276ndash296Appleton-Century-CroftsNYUSA 1969

[51] R A Rescorla ldquoPavlovian conditioning Itrsquos not what you thinkit isrdquo American Psychologist vol 43 no 3 pp 151ndash160 1988

[52] M Ramscar D Yarlett M Dye K Denny and K ThorpeldquoThe Effects of Feature-Label-Order and Their Implications forSymbolic Learningrdquo Cognitive Science vol 34 no 6 pp 909ndash957 2010

[53] M Ramscar and D Yarlett ldquoLinguistic self-correction in theabsence of feedback A new approach to the logical problem oflanguage acquisitionrdquo Cognitive Science vol 31 no 6 pp 927ndash960 2007

[54] R H Baayen P Milin D F ETHurđevic P Hendrix and MMarelli ldquoAn Amorphous Model for Morphological Process-ing in Visual Comprehension Based on Naive DiscriminativeLearningrdquo Psychological Review vol 118 no 3 pp 438ndash4812011

[55] F M Del Prado Martın R Bertram T Haikio R Schreuderand R H Baayen ldquoMorphological family size in a morphologi-cally rich languageThe case of Finnish compared to Dutch andHebrewrdquo Journal of Experimental Psychology LearningMemoryand Cognition vol 30 pp 1271ndash1278 2004

[56] P Milin D F Durdevic and F M Del Prado Martın ldquoThesimultaneous effects of inflectional paradigms and classes onlexical recognition Evidence from Serbianrdquo Journal of Memoryand Language vol 60 no 1 pp 50ndash64 2009

[57] B K Bergen ldquoThe psychological reality of phonaesthemesrdquoLanguage vol 80 no 2 pp 290ndash311 2004

[58] P Milin L B Feldman M Ramscar P Hendrix and R HBaayen ldquoDiscrimination in lexical decisionrdquo PLoS ONE vol 12no 2 2017

[59] R H Baayen P Milin and M Ramscar ldquoFrequency in lexicalprocessingrdquo Aphasiology vol 30 no 11 pp 1174ndash1220 2016

[60] M Marelli and M Baroni ldquoAffixation in semantic space Mod-eling morpheme meanings with compositional distributionalsemanticsrdquo Psychological Review vol 122 no 3 pp 485ndash5152015

[61] T Sering PMilin andRH Baayen ldquoLanguage comprehensionas amultiple label classification problemrdquo StatisticaNeerlandicapp 1ndash15 2018

[62] R Kaye and RWilson Linear Algebra Oxford University Press1998

[63] S H Ivens and B L Koslin Demands for Reading LiteracyRequire New Accountability Methods Touchstone Applied Sci-ence Associates 1991

[64] T K Landauer P W Foltz and D Laham ldquoAn introduction tolatent semantic analysisrdquoDiscourse Processes vol 25 no 2-3 pp259ndash284 1998

[65] H Schmid ldquoImprovements in Part-of-Speech Tagging with anApplication to Germanrdquo in Proceedings of the ACL SIGDAT-Workshop pp 13ndash25 Dublin Ireland 1995

[66] R H Baayen R Piepenbrock and L Gulikers The CELEXLexical Database (CD-ROM) Linguistic Data ConsortiumUniversity of Pennsylvania Philadelphia PA 1995

[67] J Mitchell and M Lapata ldquoVector-based models of semanticcompositionrdquo in Proceedings of the ACL pp 236ndash244 2008

[68] A Lazaridou M Marelli R Zamparelli and M BaronildquoCompositionally derived representations of morphologicallycomplex words in distributional semanticsrdquo in Proceedings ofthe ACL (1) pp 1517ndash1526 2013

[69] R Cotterell H Schutze and J Eisner ldquoMorphological smooth-ing and extrapolation of word embeddingsrdquo in Proceedings ofthe 54th Annual Meeting of the Association for ComputationalLinguistics vol 1 pp 1651ndash1660 2016

[70] B D Zeller S Pado and J Snajder ldquoTowards semantic valida-tion of a derivational lexiconrdquo in Proceedings of the COLING2014 the 25th International Conference on Computational Lin-guistics Technical Papers pp 1728ndash1739 2014

[71] H Marchand The Categories and Types of Present-Day EnglishWord Formation A Synchronic-Diachronic Approach BeckrsquoscheVerlagsbuchhandlung Munchen Germany 1969

[72] L Bauer EnglishWord-Formation Cambridge University PressCambridge 1983

Complexity 37

[73] I PlagWord-Formation in English CambridgeUniversity PressCambridge 2003

[74] R Beard Lexeme-Morpheme Base Morphology A GeneralTheory of Inflection and Word Formation State University ofNew York Press Albany NY USA 1995

[75] K Geeraert J Newman and R H Baayen ldquoIdiom VariationExperimental Data and a Blueprint of a Computational ModelrdquoTopics in Cognitive Science vol 9 no 3 pp 653ndash669 2017

[76] C Shaoul S Bitschau N Schilling et al ldquondl2 Naive discrim-inative learning an implementation in R R packagerdquo 2015

[77] P Milin D Divjak and R H Baayen ldquoA learning perspectiveon individual differences in skilled reading Exploring andexploiting orthographic and semantic discrimination cuesrdquoJournal of Experimental Psychology Learning Memory andCognition vol 43 no 11 pp 1730ndash1751 2017

[78] M Ramscar P Hendrix C Shaoul P Milin and R H BaayenldquoNonlinear dynamics of lifelong learning the myth of cognitivedeclinerdquo Topics in Cognitive Science vol 6 pp 5ndash42 2014

[79] M Ramscar C C Sun P Hendrix and R H Baayen ldquoTheMis-measurement of Mind Life-Span Changes in Paired-Associate-Learning Scores Reflect the ldquoCostrdquo of Learning Not CognitiveDeclinerdquo Psychological Science vol 28 no 8 pp 1171ndash1179 2017

[80] G desRosiers and D Ivison ldquoPaired associate learning Form 1and form 2 of the Wechsler memory scalerdquo Archives of ClinicalNeuropsychology vol 3 no 1 pp 47ndash67 1988

[81] E Bruni N K Tran andM Baroni ldquoMultimodal distributionalsemanticsrdquo Journal of Artificial Intelligence Research vol 49 pp1ndash47 2014

[82] W N Venables and B D RipleyModern Applied Statistics withS-PLUS Springer NY USA 2002

[83] A B Warriner V Kuperman and M Brysbaert ldquoNorms ofvalence arousal and dominance for 13915 English lemmasrdquoBehavior Research Methods vol 45 no 4 pp 1191ndash1207 2013

[84] D Kastovsky ldquoProductivity in word formationrdquo Linguistics vol24 no 3 pp 585ndash600 1986

[85] R H Baayen Y Chuang and J P Blevins ldquoInflectionalmorphology with linear mappingsrdquoTheMental Lexicon vol 13no 2 pp 232ndash270 2018

[86] Burnage CELEX A Guide for Users Centre for Lexical Infor-mation Nijmegen The Netherlands 1988

[87] C McBride-Chang ldquoModels of Speech Perception and Phono-logical Processing in Readingrdquo Child Development vol 67 no4 pp 1836ndash1856 1996

[88] G C Van Orden H Kloos et al ldquoThe question of phonologyand readingrdquo inThe Science of Reading A Handbook pp 61ndash782005

[89] D Jared and K OrsquoDonnell ldquoSkilled adult readers activate themeanings of high-frequency words using phonology Evidencefrom eye trackingrdquoMemoryampCognition vol 45 no 2 pp 334ndash346 2017

[90] G S Dell ldquoA Spreading-Activation Theory of Retrieval inSentence Productionrdquo Psychological Review vol 93 no 3 pp283ndash321 1986

[91] A Marantz ldquoNo escape from morphemes in morphologicalprocessingrdquo Language and Cognitive Processes vol 28 no 7 pp905ndash916 2013

[92] M Bozic W D Marslen-Wilson E A Stamatakis M HDavis and L K Tyler ldquoDifferentiating morphology formand meaning Neural correlates of morphological complexityrdquoCognitive Neuroscience vol 19 no 9 pp 1464ndash1475 2007

[93] R F Port andA P Leary ldquoAgainst formal phonologyrdquoLanguagevol 81 no 4 pp 927ndash964 2005

[94] J P Blevins ldquoStems and paradigmsrdquo Language vol 79 no 4 pp737ndash767 2003

[95] R Forsyth and D Holmes ldquoFeature-finding for test classifica-tionrdquo Literary and Linguistic Computing vol 11 no 4 pp 163ndash174 1996

[96] H Pham and R H Baayen ldquoVietnamese compounds showan anti-frequency effect in visual lexical decisionrdquo LanguageCognition and Neuroscience vol 30 no 9 pp 1077ndash1095 2015

[97] L Cohen and S Dehaene ldquoVentral and dorsal contributions toword readingrdquo inTheCognitive Neurosciences M S GazzanigaEd pp 789ndash804 Massachusetts Institute of Technology 2009

[98] E Keuleers P Lacey K Rastle and M Brysbaert ldquoThe BritishLexicon Project Lexical decision data for 28730 monosyllabicand disyllabic English wordsrdquo Behavior Research Methods vol44 no 1 pp 287ndash304 2012

[99] R K Wagner J K Torgesen and C A Rashotte ldquoDevel-opment of reading-related phonological processing abilitiesnew evidence of bidirectional causality from a latent variablelongitudinal studyrdquoDevelopmental Psychology vol 30 no 1 pp73ndash87 1994

[100] K F EWong andH-C Chen ldquoOrthographic and phonologicalprocessing in reading Chinese text Evidence from eye fixa-tionsrdquo Language and Cognitive Processes vol 14 no 5-6 pp461ndash480 1999

[101] R L Newman D Jared and C A Haigh ldquoDoes phonologyplay a role when skilled readers read high-frequency wordsEvidence from ERPsrdquo Language and Cognitive Processes vol 27no 9 pp 1361ndash1384 2012

[102] D Jared J Ashby S J Agauas and B A Levy ldquoPhonologicalactivation of word meanings in grade 5 readersrdquo Journal ofExperimental Psychology LearningMemory andCognition vol42 no 4 pp 524ndash541 2016

[103] T Bitan A Kaftory A Meiri-Leib Z Eviatar and O PelegldquoPhonological ambiguity modulates resolution of semanticambiguity during reading An fMRI study of Hebrewrdquo Neu-ropsychology vol 31 no 7 pp 759ndash777 2017

[104] D Jared and S Bainbridge ldquoReading homophone puns Evi-dence from eye trackingrdquo Canadian Journal of ExperimentalPsychology vol 71 no 1 pp 2ndash13 2017

[105] S Amenta M Marelli and S Sulpizio ldquoFrom sound tomeaning Phonology-to-Semantics mapping in visual wordrecognitionrdquo Psychonomic Bulletin amp Review vol 24 no 3 pp887ndash893 2017

[106] M Perrone-Bertolotti J Kujala J R Vidal et al ldquoHow silent issilent reading intracerebral evidence for top-down activationof temporal voice areas during readingrdquo The Journal of Neuro-science vol 32 no 49 pp 17554ndash17562 2012

[107] B Yao P Belin andC Scheepers ldquoSilent reading of direct versusindirect speech activates voice-selective areas in the auditorycortexrdquo Journal of Cognitive Neuroscience vol 23 no 10 pp3146ndash3152 2011

[108] M Coltheart B Curtis P Atkins and M Haller ldquoMod-els of Reading Aloud Dual-Route and Parallel-Distributed-Processing Approachesrdquo Psychological Review vol 100 no 4pp 589ndash608 1993

[109] M Coltheart ldquoModeling reading The dual-route approachrdquo inThe Science of Reading A Handbook pp 6ndash23 2005

38 Complexity

[110] M Pitt K Johnson E Hume S Kiesling and W RaymondldquoThe Buckeye corpus of conversational speech Labeling con-ventions and a test of transcriber reliabilityrdquo Speech Communi-cation vol 45 no 1 pp 89ndash95 2005

[111] K Johnson ldquoMassive reduction in conversational AmericanEnglishrdquo in Proceedings of the Spontaneous Speech Data andAnalysis Proceedings of the 1st Session of the 10th InternationalSymposium pp 29ndash54 The National International Institute forJapanese Language Tokyo Japan 2004

[112] C Phillips ldquoLevels of representation in the electrophysiology ofspeech perceptionrdquo Cognitive Science vol 25 no 5 pp 711ndash7312001

[113] R L Diehl A J Lotto and L L Holt ldquoSpeech perceptionrdquoAnnual Review of Psychology vol 55 pp 149ndash179 2004

[114] D Norris and J M McQueen ldquoShortlist B a Bayesian model ofcontinuous speech recognitionrdquo Psychological Review vol 115no 2 pp 357ndash395 2008

[115] S Hawkins ldquoRoles and representations of systematic finephonetic detail in speech understandingrdquo Journal of Phoneticsvol 31 no 3-4 pp 373ndash405 2003

[116] CCucchiarini andH Strik ldquoAutomatic Phonetic TranscriptionAn overviewrdquo in Proceedings of the 15th ICPhS pp 347ndash350Barcelona Spain 2003

[117] K JohnsonTheAuditoryPerceptual Basis for Speech Segmenta-tion Ohio State University Working Papers in Linguistics 1997

[118] H Fletcher ldquoAuditory patternsrdquo Reviews of Modern Physics vol12 no 1 pp 47ndash65 1940

[119] D Arnold ldquoAcousticndlcoder Coding sound files for use withndl R package version 101rdquo 2017

[120] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2016

[121] J Pickett and I Pollack ldquoIntelligibility of excerpts from fluentspeech Effects of rate of utterance and duration of excerptrdquoLanguage and Speech vol 6 pp 151ndash164 1963

[122] L Shockey ldquoPerception of reduced forms by non-native speak-ers of Englishrdquo in Sound Patterns of Spontaneous Speech DDuez Ed pp 97ndash100 ESCA Aix 1998

[123] M Ernestus H Baayen and R Schreuder ldquoThe recognition ofreduced word formsrdquo Brain and Language vol 81 no 1-3 pp162ndash173 2002

[124] R H Baayen C Shaoul J Willits and M Ramscar ldquoCom-prehension without segmentation a proof of concept withnaive discriminative learningrdquo Language Cognition and Neu-roscience vol 31 no 1 pp 106ndash128 2016

[125] C P Browman and L Goldstein ldquoArticulatory phonology Anoverviewrdquo Phonetica vol 49 no 3-4 pp 155ndash180 1992

[126] J A Tourville and F H Guenther ldquoThe DIVA model A neuraltheory of speech acquisition and productionrdquo Language andCognitive Processes vol 26 no 7 pp 952ndash981 2011

[127] G Csardi and T Nepusz ldquoThe igraph software package forcomplex network researchrdquo InterJournal Complex Systems vol1695 no 5 pp 1ndash9 2006

[128] M Ernestus and R H Baayen ldquoPredicting the unpredictableInterpreting neutralized segments in Dutchrdquo Language vol 79no 1 pp 5ndash38 2003

[129] R Weingarten G Nottbusch and U Will ldquoMorphemes sylla-bles and graphemes in written word productionrdquo inMultidisci-plinary Approaches to Speech Production T Pechmann and CHabel Eds pp 529ndash572 Mouton de Gruyter Berlin Germany2004

[130] R Bertram F E Toslashnnessen S Stromqvist J Hyona andP Niemi ldquoCascaded processing in written compound wordproductionrdquo Frontiers in Human Neuroscience vol 9 pp 1ndash102015

[131] T Cho ldquoEffects of morpheme boundaries on intergesturaltiming Evidence from Koreanrdquo Phonetica vol 58 no 3 pp129ndash162 2001

[132] S N Wood Generalized Additive Models Chapman ampHallCRC NY USA 2017

[133] M Seidenberg ldquoSublexical structures in visual word recogni-tion Access units or orthographic redundancyrdquo in Attentionand Performance XII M Coltheart Ed pp 245ndash264 LawrenceErlbaum Associates Hove 1987

[134] J M McQueen ldquoSegmentation of continuous speech usingphonotacticsrdquo Journal of Memory and Language vol 39 no 1pp 21ndash46 1998

[135] J B Hay Causes and Consequences of Word Structure Rout-ledge NY USA London UK 2003

[136] J B Hay ldquoFrom speech perception to morphology Affix-ordering revisitedrdquo Language vol 78 pp 527ndash555 2002

[137] J B Hay and R H Baayen ldquoPhonotactics parsing and produc-tivityrdquo Italian Journal of Linguistics vol 15 no 1 pp 99ndash1302003

[138] J Hay J Pierrehumbert and M Beckman ldquoSpeech perceptionwell-formedness and the statistics of the lexiconrdquo Papers inlaboratory phonology VI pp 58ndash74 2004

[139] M Ferro C Marzi and V Pirrelli ldquoA self-organizing modelof word storage and processing implications for morphologylearningrdquo Lingue e Linguaggio vol 10 no 2 pp 209ndash226 2011

[140] F ChersiM Ferro G Pezzulo andV Pirrelli ldquoTopological self-organization and prediction learning support both action andlexical chains in the brainrdquo Topics in Cognitive Science vol 6no 3 pp 476ndash491 2014

[141] V Pirrelli M Ferro and C Marzi ldquoComputational complexityof abstractive morphologyrdquo in Understanding and MeasuringMorphological Complexity M Bearman D Brown and G GCorbett Eds pp 141ndash166 Oxford University Press 2015

[142] C Marzi M Ferro and V Pirrelli ldquoIs inflectional irregularitydysfunctional to human processingrdquo in Abstract Booklet TheMental Lexicon 2018 V Kuperman Ed University of AlbertaEdmonton 2018

[143] E Keuleers M Stevens P Mandera and M Brysbaert ldquoWordknowledge in the crowd Measuring vocabulary size and wordprevalence in a massive online experimentrdquo The QuarterlyJournal of Experimental Psychology vol 68 no 8 pp 1665ndash16922015

[144] M Ernestus Voice Assimilation And Segment Reduction inCasual Dutch A Corpus-Based Study of the Phonology-PhoneticsInterface LOT Utrecht the Netherlands 2000

[145] R Kemps M Ernestus R Schreuder and H Baayen ldquoProcess-ing reducedword formsThe suffix restoration effectrdquoBrain andLanguage vol 90 no 1-3 pp 117ndash127 2004

[146] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) vol 1 pp 886ndash893 June 2005

[147] G Hickok ldquoThe architecture of speech production and the roleof the phoneme in speech processingrdquo Language Cognition andNeuroscience vol 29 no 1 pp 2ndash20 2014

[148] B V Tucker M Sims and R H Baayen Opposing Forceson Acoustic Duration Manuscript University of Alberta andUniversity of Tubingen 2018

Complexity 39

[149] I Plag J Homann and G Kunter ldquoHomophony and mor-phology The acoustics of word-final S in Englishrdquo Journal ofLinguistics vol 53 no 1 pp 181ndash216 2017

[150] F Tomaschek I Plag M Ernestus and R H BaayenModeling the Duration of Word-Final s in English withNaive Discriminative Learning Manuscript University ofSiegenTubingenNijmegen 2018

[151] R E Kalman ldquoA new approach to linear filtering and predictionproblemsrdquo Journal of Fluids Engineering vol 82 no 1 pp 35ndash451960

[152] B Russell ldquoOn denotingrdquo Mind vol 14 no 56 pp 479ndash4931905

[153] N Hornstein Logical Form FromGB toMinimalism Blackwell1995

[154] R Jackendoff Semantic StructuresMITPress Cambridge 1990[155] J J McCarthy ldquoA prosodic theory of non-concatenative mor-

phologyrdquo Linguistic Inquiry vol 12 pp 373ndash418 1981[156] A Ussishkin ldquoA Fixed Prosodic Theory of Nonconcatenative

Templaticmorphologyrdquo Natural Language amp Linguistic Theoryvol 23 no 1 pp 169ndash218 2005

[157] A Ussishkin ldquoAffix-favored contrast inequity and psycholin-guistic grounding for non-concatenative morphologyrdquo Mor-phology vol 16 no 1 pp 107ndash125 2006

[158] RW Young andWMorganTheNavajo Language A Grammarand Colloquial Dictionary University of New Mexico Press1980

[159] S C Levinson and A Majid ldquoThe island of time Yelı Dnye thelanguage of Rossel islandrdquo Frontiers in psychology vol 4 2013

[160] S C Levinson ldquoThe language of space in Yelı Dnyerdquo inGrammars of Space Explorations in Cognitive Diversity pp 157ndash203 Cambridge University Press 2006

[161] E Shafaei-Bajestan and R H Baayen ldquoWide Learning forAuditory Comprehensionrdquo in Proceedings of the Interspeech2018 2018

[162] J Hay and K Drager ldquoStuffed toys and speech perceptionrdquoLinguistics vol 48 no 4 pp 865ndash892 2010

[163] V Henri and C Henri ldquoOn our earliest recollections ofchildhoodrdquo Psychological Review vol 2 pp 215-216 1895

[164] S Freud ldquoChildhood and concealing memoriesrdquo in The BasicWritings of Sigmund Freud The Modern Library NY USA19051953

[165] P J Bauer and M Larkina ldquoThe onset of childhood amnesiain childhood A prospective investigation of the course anddeterminants of forgetting of early-life eventsrdquoMemory vol 22no 8 pp 907ndash924 2014

[166] M S Seidenberg and L M Gonnerman ldquoExplaining deriva-tional morphology as the convergence of codesrdquo Trends inCognitive Sciences vol 4 no 9 pp 353ndash361 2000

[167] M Coltheart K Rastle C Perry R Langdon and J ZieglerldquoDRC a dual route cascaded model of visual word recognitionand reading aloudrdquoPsychological Review vol 108 no 1 pp 204ndash256 2001

[168] M V Butz and E F Kutter How the Mind Comes into BeingIntroducing Cognitive Science from a Functional and Computa-tional Perspective Oxford University Press 2016

[169] M W Harm and M S Seidenberg ldquoPhonology readingacquisition and dyslexia Insights from connectionist modelsrdquoPsychological Review vol 106 no 3 pp 491ndash528 1999

[170] L M Gonnerman M S Seidenberg and E S AndersenldquoGraded semantic and phonological similarity effects in prim-ing Evidence for a distributed connectionist approach to

morphologyrdquo Journal of Experimental Psychology General vol136 no 2 pp 323ndash345 2007

[171] H Zhao O Gallo I Frosio and J Kautz ldquoLoss Functions forImage Restoration With Neural Networksrdquo IEEE Transactionson Computational Imaging vol 3 no 1 pp 47ndash57 2017

[172] C Clarke and P Luce ldquoPerceptual adaptation to speakercharacteristics Vot boundaries in stop voicing categorizationrdquoin Proceedings of the ISCA Workshop on Plasticity in SpeechPerception 2005

[173] DNorris JMMcQueen andACutler ldquoPerceptual learning inspeechrdquo Cognitive Psychology vol 47 no 2 pp 204ndash238 2003

[174] A Schweitzer and N Lewandowski ldquoSocial factors in conver-gence of f1 and f2 in spontaneous speechrdquo in Proceedings ofthe 10th International Seminar on Speech Production CologneGermany 2014

[175] M J Pickering and S Garrod ldquoToward a mechanistic psychol-ogy of dialoguerdquo Behavioral and Brain Sciences vol 27 no 2 pp169ndash190 2004

[176] C M Clarke and M F Garrett ldquoRapid adaptation to foreign-accented Englishrdquo The Journal of the Acoustical Society ofAmerica vol 116 no 6 pp 3647ndash3658 2004

[177] W J Levelt ldquoDie konnektionistische moderdquo Sprache UndKognition vol 10 no 2 pp 61ndash72 1991

[178] A Kostic ldquoInformational load constraints on processinginflected morphologyrdquo in Morphological Aspects of LanguageProcessing L B Feldman Ed Lawrence Erlbaum Inc Publish-ers New Jersey USA 1995

[179] J P BlevinsWord andParadigmMorphology OxfordUniversityPress 2016

[180] E Forstemann Uber Deutsche volksetymologie Zeitschrift furvergleichende Sprachforschung auf dem Gebiete des DeutschenGriechischen und Lateinischen 1852

[181] K Nault Morphological Therapy Protocal [Ph D thesis] Uni-versity of Alberta Edmonton 2010

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 31: The Discriminative Lexicon: A Unified Computational Model ...downloads.hindawi.com/journals/complexity/2019/4895891.pdf · as developed within computational linguistics and com-puter

Complexity 31

categorical output vectors) What is unclear is whethersuch networks will improve our understandingmdashthe currentlinear networks offer the analyst connection weights betweeninput and output representations that are straightforwardlylinguistically interpretable Where we think deep learningnetworks really will come into their own is bridging the gapbetween for instance the cochlea and retina on the one handand the heuristic representations (letter trigrams) that wehave used as a starting point for our functional theory of themappings between form and meaning

Network flexibility An important challenge for deeplearning networks designed to model human lexical process-ing is to keep the networks flexible and open to rapid updat-ing This openness to continuous learning is required notonly by findings on categorization [40 41 41 42] includingphonetic categorization [172 173] but is also visible in timeseries of reaction times Baayen and Hendrix [17] reportedimproved prediction of visual lexical decision reaction timesin the British Lexicon Project when a naive discriminativelearning network is updated from trial to trial simulating thewithin-experiment learning that goes on as subjects proceedthrough the experiment In natural speech the phenomenonof phonetic convergence between interlocutors [174] likewisebears witness to a lexicon that is constantly recalibratedin interaction with its environment (for the many ways inwhich interlocutorsrsquo speech aligns see [175]) Likewise therapidity with which listeners adapt to foreign accents [176]within a few minutes shows that lexical networks exhibitfast local optimization As deep learning networks typicallyrequire huge numbers of cycles through the training dataonce trained they are at risk of not having the requiredflexibility for fast local adaptation This risk is reduced for(incremental) wide learning as illustrated by the phoneticconvergence evidenced by the model of Arnold et al [18] forauditory comprehension (a very similar point was made byLevelt [177] in his critique of connectionist models in thenineties)

Open questions This initial study necessarily leavesmany questions unanswered Compounds and inflectedderived words have not been addressed and morphologicaloperations such as reduplication infixation and nonconcate-native morphology provide further testing grounds for thepresent approach We also need to address the processingof compounds with multiple constituents ubiquitous inlanguages as different as German Mandarin and EstonianFurthermore in actual utterances words undergo assimila-tion at their boundaries and hence a crucial test case forthe discriminative lexicon is to model the production andcomprehension of words in multi-word utterances For ourproduction model we have also completely ignored stresssyllabification and pitch which however likely has renderedthe production task more difficult than necessary

A challenge for discriminative morphology is the devel-opment of proper semantic matrices For languages withrich morphology such as Estonian current morphologicalparsers will identify case-inflected words as nominativesgenitives or partitives (among others) but these labelsfor inflectional forms do not correspond to the semanticfunctions encoded in these forms (see eg [178 179]) What

is required are computational tools that detect the appropriateinflectional lexomes for these words in the sentences orutterances in which they are embedded

A related problem specifically for the speech productionmodel is how to predict the strongly reduced word formsthat are rampant in conversational speech [111] Our auditorycomprehension network F119886 is confronted with reduced formsin training and the study of Arnold et al [18] indicates thatpresent identification accuracymay approximate human per-formance for single-word recognition Whereas for auditorycomprehension we are making progress what is required forspeech production is a much more detailed understanding ofthe circumstances under which speakers actually use reducedword forms The factors driving reduction may be in partcaptured by the semantic vectors but we anticipate thataspects of the unfolding discourse play an important role aswell which brings us back to the unresolved question of howto best model not isolated words but words in context

An important challenge for our theory which formalizesincremental implicit learning is how to address and modelthe wealth of explicit knowledge that speakers have abouttheir language We play with the way words sound in poetrywe are perplexed by wordsrsquo meanings and reinterpret themso that they make more sense (eg the folk etymology [180]reflected in modern English crawfish a crustacean and nota fish originally Middle English crevis compare ecrevissein French) we teach morphology in schools and we evenfind that making morphological relations explicit may bebeneficial for aphasic patients [181] The rich culture of worduse cannot but interact with the implicit system but howexactly this interaction unfolds andwhat its consequences areat both sides of the consciousunconscious divide is presentlyunclear

Although many questions remain our model is remark-ably successful in capturing many aspects of lexical andmorphological processing in both comprehension and pro-duction Since themodel builds on a combination of linguisti-callymotivated lsquosmartrsquo low-level representations for form andmeaning and large but otherwise straightforward andmathe-matically well-understood two-layer linear networks (to beclear networks without any hidden layers) the conclusionseems justified that algorithmically the ldquomental lexiconrdquomaybe much simpler than previously thought

Appendix

A Vectors Matrices and MatrixMultiplication

Figure 12 presents two data points 119886 and 119887 with coordinates(3 4) and (-2 -3) Arrows drawn from the origin to thesepoints are shown in blue with a solid and dashed linerespectively We refer to these points as vectors Vectors aredenoted with lower case letters in bold and we place the 119909coordinate of a point next to the 119910 coordinate

a = (3 4) b = (minus2 minus3) (A1)

32 Complexity

minus4 minus2 0 2 4

minus4

minus2

02

4

x

y

a

b

x

y

Figure 12 Points 119886 and 119887 can be transformed into points 119901 and 119902 bya linear transformation

We can represent a and b jointly by placing them next to eachother in a matrix For matrices we use capital letters in boldfont

A = ( 3 4minus2 minus3) (A2)

In addition to the points 119886 and 119887 Figure 12 also shows twoother datapoints 119909 and 119910 The matrix for these two points is

B = (minus5 24 minus1) (A3)

Let us assume that points 119886 and 119887 represent the forms oftwo words 1205961 and 1205962 and that the points 119909 and 119910 representthe meanings of these two words (below we discuss in detailhow exactly wordsrsquo forms and meanings can be representedas vectors of numbers) As morphology is the study of therelation between wordsrsquo forms and their meanings we areinterested in how to transform points 119886 and b into points119909 and 119910 and likewise in how to transform points 119909 and119910 back into points 119886 and 119887 The first transformation isrequired for comprehension and the second transformationfor production

We begin with the transformation for comprehensionFormally we have matrix A which we want to transforminto matrix B A way of mapping A onto B is by means ofa linear transformation using matrix multiplication For 2 times2 matrices matrix multiplication is defined as follows(1198861 11988621198871 1198872) (1199091 11990921199101 1199102) = (11988611199091 + 11988621199101 11988611199092 + 1198862119910211988711199091 + 11988721199101 11988711199092 + 11988721199102) (A4)

Such a mapping of A onto B is given by a matrix F

F = ( 1 2minus2 minus1) (A5)

and it is straightforward to verify that indeed( 3 4minus2 minus3) ( 1 2minus2 minus1) = (minus5 24 minus1) (A6)

Using matrix notation we can write

AF = B (A7)

Given A and B how do we obtain F The answeris straightforward but we need two further concepts theidentity matrix and the matrix inverse For multiplication ofnumbers the identity multiplication is multiplication with 1For matrices multiplication with the identity matrix I leavesthe locations of the points unchangedThe identitymatrix hasones on the main diagonal and zeros elsewhere It is easilyverified that indeed(1 00 1) (1199091 11990921199101 1199102) = (1199091 11990921199101 1199102) (1 00 1) = (1199091 11990921199101 1199102) (A8)

For numbers the inverse of multiplication with 119904 is dividingby 119904 (1119904 ) (119904119909) = 119909 (A9)

For matrices the inverse of a square matrix X is that matrixXminus1 such that their product is the identity matrix

Xminus1X = XXminus1 = I (A10)

For nonsquare matrices the inverse Yminus1 is defined such that

Yminus1 (YX) = (XY) 119884minus1 = X (A11)

We find the square matrix F that maps the square matrices Aonto B as follows

AF = B

Aminus1AF = Aminus1B

IF = Aminus1B

F = Aminus1B(A12)

Since for the present example A and its inverse happen to beidentical (Aminus1 = A) we obtain (A6)

Fmaps wordsrsquo form vectors onto wordsrsquo semantic vectorsLet us now consider the reverse and see how we can obtainwordsrsquo form vectors from wordsrsquo semantic vectors That isgiven B we want to find that matrix G which maps B ontoA ie

BG = A (A13)

Solving this equation proceeds exactly as above

BG = A

Bminus1BG = Bminus1A

IG = Bminus1A

G = Bminus1A(A14)

Complexity 33

The inverse of B is

Bminus1 = (13 2343 53) (A15)

and hence we have

(13 2343 53) ( 3 4minus2 minus3) = (minus13 minus2323 13 ) = G (A16)

The inverse of a matrix needs not exist A square matrixthat does not have an inverse is referred to as a singularmatrix Almost all matrices with which we work in theremainder of this study are singular For singular matricesan approximation of the inverse can be used such as theMoore-Penrose generalized inverse (In R the generalizedinverse is available in theMASS package function ginv Thenumeric library that is used by ginv is LAPACK availableat httpwwwnetliborglapack) In this study wedenote the generalized inverse of a matrix X by X1015840

The examples presented here are restricted to 2 times 2matrices but themathematics generalize tomatrices of largerdimensions When multiplying two matrices X and Y theonly constraint is that the number of columns ofX is the sameas the number of rows ofYThe resultingmatrix has the samenumber of rows as X and the same number of columns as Y

In the present study the rows of matrices represent wordsand columns the features of thesewords For instance the rowvectors of A can represent wordsrsquo form features and the rowvectors of and B the corresponding semantic features Thefirst row vector of A is mapped onto the first row vector ofB as follows (3 4) ( 1 2minus2 minus1) = (minus5 2) (A17)

A transformationmatrix such as F can be represented as atwo-layer networkThe network corresponding to F is shownin Figure 13 When the input vector (3 4) is presented to thenetwork the nodes 1198941 and 1198942 are set to 3 and 4 respectively Toobtain the activations 1199001 = minus5 and 1199002 = 2 for the output vector(minus5 2) we cumulate the evidence at the input weighted bythe connection strengths specified on the edges of the graphThus for 1199001 we obtain the value 3 times 1 + 4 times minus2 = minus5 and for1199002 we have 3 times 2 + 4 times minus1 = 2 In other words the networkmaps the input vector (3 4) onto the output vector (minus5 2)

An important property of linear maps is that they areproductive We can present a novel form vector say

s = (2 2) (A18)

to the network and it will map this vector onto a novelsemantic vector Using matrices this new semantic vector isstraightforwardly calculated(2 2) ( 1 2minus2 minus1) = (minus2 2) (A19)

1 2 minus2 minus1

i1 i2

o1 o2

semantic vectors come out here

form features go in here

connection weightsare the transformationmatrix

Figure 13 The linear network corresponding to the transformationmatrix F There are two input nodes 1198941 and 1198942 (in blue)and twooutput nodes 1199001 and 1199002 in red When the input vector (3 4) ispresented to the network the value at 1199001 is 3 times 1 + 4 times minus2 = minus5and that at 1199002 is 3 times 2 + 4 times minus1 = 2 The network maps the point(3 4) onto the point (minus5 2)B Graph-Based Triphone Sequencing

The hypothesis underlying graph-based triphone sequencingis that in the directed graph that has triphones as verticesand an edge between any two vertices that overlap (iesegments 2 and 3 of triphone 1 are identical to segments 1and 2 of triphone 2) the path from a wordrsquos initial triphone(the triphone starting with ) to a wordrsquos final triphone (thetriphone ending with a ) is the path receiving the bestsupport of all possible paths from the initial to the finaltriphone Unfortunately the directed graph containing allvertices and edges is too large tomake computations tractablein a reasonable amount of computation time The followingalgorithm is a heuristic algorithm that first collects potentiallyrelevant vertices and edges and then calculates all pathsstarting fromany initial vertex using theall simple pathsfunction from the igraph package [127] As a first step allvertices that are supported by the stem and whole wordwith network support exceeding 01 are selected To theresulting set those vertices are added that are supported bythe affix which is conditional on these vertices having anetwork support exceeding 095 A directed graph can nowbe constructed and for each initial vertex all paths startingat these vertices are calculated The subset of those pathsreaching a final vertex is selected and the support for eachpath is calculated As longer paths trivially will have greatersupport the support for a path is weighted for its lengthsimply by dividing the raw support by the path length Pathsare ranked by weighted path support and the path withmaximal support is selected as the acoustic image drivingarticulation

It is possible that no path from an initial vertex to a finalvertex is found due to critical boundary triphones not beinginstantiated in the data on which the model was trainedThis happens when the model is assessed on novel inflected

34 Complexity

nobo

dyso

met

hing

ever

ythi

ngno

thin

gev

eryo

nean

ythi

ngan

yone

som

eone

hers

him his

via

than I

exce

ptou

tside

thei

rsw

hoev

erth

ence

min

ein

side

amon

gst its

thei

rou

rm

yyo

uyo

ur her

she

your

selv

esw

hate

ver us we it

they he

them m

em

yself

him

self

hers

elfou

rselv

esyo

urse

lfits

elfth

emse

lves

abou

tw

ithou

tw

ithin

beyo

ndam

ong

betw

een

thro

ugho

utdu

ring

since by

from in of

with at fo

r tobe

low

abov

e up offdo

wn

upon on

agai

nst

tow

ard

tow

ards

arou

ndne

aral

ong

unde

rbe

hind

besid

eth

roug

hon

toin

toov

erac

ross

nobodysomethingeverythingnothingeveryoneanythinganyonesomeonehershimhisviathanIexceptoutsidetheirswhoeverthencemineinsideamongstitstheirourmyyouyourhersheyourselveswhateverusweittheyhethemmemyselfhimselfherselfourselvesyourselfitselfthemselvesaboutwithoutwithinbeyondamongbetweenthroughoutduringsincebyfrominofwithatfortobelowaboveupoffdownupononagainsttowardtowardsaroundnearalongunderbehindbesidethroughontointooveracross

minus01 0 01 02Value

Color Keyand Histogram

050

015

0025

00C

ount

Figure 14 Heatmap for the correlations of the semantic vectors of pronouns and prepositions Note that quantifier pronouns form a groupof their own that prepositions and the other pronouns form distinguisable groups and that within groups further subclusters are visible egforwe and us she and her and across and over All diagonal elements represent correlations equal to 1 (this is not brought out by the heatmapwhich color-codes diagonal elements with the color for the highest correlation of the range we specified 025)

and derived words under cross-validationTherefore verticesand edges are added to the graph for all nonfinal verticesthat are at the end of any of the paths starting at any initialtriphone Such novel vertices and edges are assigned zeronetwork support Typically paths from the initial vertex tothe final vertex can now be constructed and path support isevaluated as above

For an algorithm that allows triphone paths to includecycles which may be the case in languages with much richer

morphology than English see Baayen et al [85] and for codethe R packageWpmWithLdl described therein

C A Heatmap for Function Words(Pronouns and Prepositions)

The heatmap for the correlations of the semantic vectors ofpronouns and prepositions is presented in Figure 14

Complexity 35

Data Availability

With the exception of the data from theDistributed Little RedHen Lab all primary data are publicly available from the citedsources

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The authors are indebted to Geoff Hollis Jessie Nixon ChrisWestbury and LuisMienhardt for their constructive feedbackon earlier versions of this manuscript This research wassupported by an ERC advancedGrant (no 742545) to the firstauthor

References

[1] G Frege ldquoBegriffsschrift a formula language modeled uponthat of arithmetic for pure thoughtrdquo in From Frege to GodelA Source Book in Mathematical Logic 1879-1931 pp 1ndash82 1967

[2] B RussellAn Inquiry intoMeaning andTruth Allen andUnwinLondon UK 1942

[3] R Montague ldquoThe proper treatment of quantification in ordi-nary englishrdquo in Approaches to Natural Language pp 221ndash242Springer 1973

[4] N Chomsky Syntactic Structures vol 33 Mouton The Hague1957

[5] W J M Levelt Formal Grammars in Linguistics And Psycholin-guistics Volume 1 An Introduction to the Theory of FormalLanguages and Automata Volume 2 Applications in LinguisticTheory Volume 3 Psycholinguistic Applications John BenjaminsPublishing 2008

[6] D F Kleinschmidt and T Florian Jaeger ldquoRobust speechperception Recognize the familiar generalize to the similar andadapt to the novelrdquo Psychological Review vol 122 no 2 pp 148ndash203 2015

[7] L Breiman ldquoRandom forestsrdquo Machine Learning vol 45 no 1pp 5ndash32 2001

[8] C Strobl A-L Boulesteix A Zeileis and T Hothorn ldquoBiasin random forest variable importance measures illustrationssources and a solutionrdquo BMC Bioinformatics vol 8 2007

[9] A Hannun C Case J Casper et al ldquoDeep speechScaling up end-to-end speech recognitionrdquo 2014httpsarxivorgabs14125567

[10] Z Tuske P Golik R Schluter and H Ney ldquoAcoustic modelingwith deep neural networks using raw time signal for lvcsrrdquo inProceedings of the INTERSPEECH pp 890ndash894 2014

[11] J Schmidhuber ldquoDeep learning in neural networks anoverviewrdquo Neural Networks vol 61 pp 85ndash117 2015

[12] T Hannagan J C Ziegler S Dufau J Fagot and J GraingerldquoDeep learning of orthographic representations in baboonsrdquoPLoS ONE vol 9 no 1 2014

[13] J Grainger S Dufau M Montant J C Ziegler and J FagotldquoOrthographic processing in baboons (Papio papio)rdquo Sciencevol 336 no 6078 pp 245ndash248 2012

[14] D Scarf K Boy A U Reinert J Devine O Gunturkun andM Colombo ldquoOrthographic processing in pigeons (columba

livia)rdquo Proceedings of the National Academy of Sciences vol 113no 40 pp 11272ndash11276 2016

[15] M Linke F Broker M Ramscar and R H Baayen ldquoArebaboons learning ldquoorthographicrdquo representations Probablynotrdquo PLoS ONE vol 12 no 8 2017

[16] D H Hubel and T N Wiesel ldquoReceptive fields binocularinteraction and functional architecture in the catrsquos visualcortexrdquo The Journal of Physiology vol 160 pp 106ndash154 1962

[17] R H Baayen and P Hendrix ldquoTwo-layer networks non-linear separation and human learningrdquo in From Semantics toDialectometry Festschrift in honor of JohnNerbonne Tributes 32M Wieling M Kroon G Van Noord and G Bouma Eds pp13ndash22 College Publications 2017

[18] D Arnold F Tomaschek K Sering F Lopez and R H BaayenldquoWords from spontaneous conversational speech can be rec-ognized with human-like accuracy by an error-driven learningalgorithm that discriminates between meanings straight fromsmart acoustic features bypassing the phoneme as recognitionunitrdquo PLoS ONE vol 12 no 4 2017

[19] P Zwitserlood ldquoProcessing and representation of morphologi-cal complexity in native language comprehension and produc-tionrdquo in The Construction of Words G E Booij Ed pp 583ndash602 Springer 2018

[20] G Booij ldquoConstruction morphologyrdquo Language and LinguisticsCompass vol 4 no 7 pp 543ndash555 2010

[21] J P Blevins ldquoWord-based morphologyrdquo Journal of Linguisticsvol 42 no 3 pp 531ndash573 2006

[22] G Stump Inflectional Morphology A Theory of ParadigmStructure Cambridge University Press 2001

[23] R Beard ldquoOn the extent and nature of irregularity in thelexiconrdquo Lingua vol 42 no 4 pp 305ndash341 1977

[24] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1974

[25] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1991

[26] C Hockett ldquoThe origin of speechrdquo Scientific American vol 203pp 89ndash97 1960

[27] M Erelt Estonian Language M Erelt Ed Estonian AcademyPublishers Tallinn Estonia 2003

[28] P H Matthews Grammatical Theory in the United States fromBloomfield to Chomsky Cambridge University Press Cam-bridge 1993

[29] N Chomsky andMHalleTheSound Pattern of English Harperand Row NY USA 1968

[30] T Landauer and S Dumais ldquoA solution to Platorsquos problem Thelatent semantic analysis theory of acquisition induction andrepresentation of knowledgerdquo Psychological Review vol 104 no2 pp 211ndash240 1997

[31] W Weaver ldquoTranslationrdquo in Machine Translation of LanguagesFourteen Essays W N Locke and A D Booth Eds pp 15ndash23MIT Press Cambridge 1955

[32] J R Firth Selected Papers of J R Firth 1952-59 IndianaUniversity Press 1968

[33] K Lund and C Burgess ldquoProducing high-dimensional seman-tic spaces from lexical co-occurrencerdquo Behavior Research Meth-ods Instruments and Computers vol 28 no 2 pp 203ndash2081996

[34] C Shaoul and C Westbury ldquoExploring lexical co-occurrencespace using HiDExrdquo Behavior Research Methods vol 42 no 2pp 393ndash413 2010

36 Complexity

[35] T Mikolov I Sutskever K Chen G Corrado and J DeanldquoDistributed representations of words and phrases and theircompositionalityrdquo inAdvances in Neural Information ProcessingSystems pp 3111ndash3119 2013

[36] A Roelofs ldquoThe WEAVER model of word-form encoding inspeech productionrdquo Cognition vol 64 no 3 pp 249ndash284 1997

[37] W J M Levelt A Roelofs and A S Meyer ldquoA theory of lexicalaccess in speech productionrdquoBehavioral and Brain Sciences vol22 no 1 pp 1ndash38 1999

[38] M Taft ldquoInteractive-activation as a Framework for Under-standing Morphological Processingrdquo Language and CognitiveProcesses vol 9 no 3 pp 271ndash294 1994

[39] R Schreuder and R H Baayen ldquoModeling morphologicalprocessingrdquo inMorphological Aspects of Language Processing LB Feldman Ed pp 131ndash154 Lawrence ErlbaumHillsdale NewJersey USA 1995

[40] B C Love D L Medin and T M Gureckis ldquoSUSTAIN ANetwork Model of Category Learningrdquo Psychological Reviewvol 111 no 2 pp 309ndash332 2004

[41] C J Marsolek ldquoWhat antipriming reveals about primingrdquoTrends in Cognitive Sciences vol 12 no 5 pp 176ndash181 2008

[42] M Ramscar and R Port ldquoCategorization (without categories)rdquoin Handbook of Cognitive Linguistics E Dabrowska and DDivjak Eds pp 75ndash99 De Gruyter Berlin Germany 2015

[43] R A Rescorla and A R Wagner ldquoA theory of Pavlovianconditioning Variations in the effectiveness of reinforcementand nonreinforcementrdquo in Classical Conditioning II CurrentResearch and Theory A H Black and W F Prokasy Eds pp64ndash99 Appleton Century Crofts NY USA 1972

[44] B Widrow and M E Hoff ldquoAdaptive switching circuitsrdquo 1960WESCON Convention Record Part IV pp 96ndash104 1960

[45] R H Baayen R Schreuder and R Sproat ldquoMorphology inthe Mental Lexicon A Computational Model for Visual WordRecognitionrdquo in Lexicon Development for Speech and LanguageProcessing F van Eynde and D Gibbon Eds pp 267ndash291Kluwer Academic Publishers 2000

[46] M Taft ldquoA morphological-decomposition model of lexicalrepresentationrdquo Linguistics vol 26 no 4 pp 657ndash667 1988

[47] MWHarmandM S Seidenberg ldquoComputing themeanings ofwords in reading Cooperative division of labor between visualand phonological processesrdquo Psychological Review vol 111 no3 pp 662ndash720 2004

[48] P C Trimmer J M McNamara A Houston and J A RMarshall ldquoDoes natural selection favour the Rescorla-Wagnerrulerdquo Journal of Theoretical Biology vol 302 pp 39ndash52 2012

[49] R R Miller R C Barnet and N J Grahame ldquoAssessment ofthe Rescorla-Wagner modelrdquo Psychological Bulletin vol 117 no3 pp 363ndash386 1995

[50] L J Kamin ldquoPredictability surprise attention and condition-ingrdquo in Punishment and Aversive Behavior B A Campbell andRMChurch Eds pp 276ndash296Appleton-Century-CroftsNYUSA 1969

[51] R A Rescorla ldquoPavlovian conditioning Itrsquos not what you thinkit isrdquo American Psychologist vol 43 no 3 pp 151ndash160 1988

[52] M Ramscar D Yarlett M Dye K Denny and K ThorpeldquoThe Effects of Feature-Label-Order and Their Implications forSymbolic Learningrdquo Cognitive Science vol 34 no 6 pp 909ndash957 2010

[53] M Ramscar and D Yarlett ldquoLinguistic self-correction in theabsence of feedback A new approach to the logical problem oflanguage acquisitionrdquo Cognitive Science vol 31 no 6 pp 927ndash960 2007

[54] R H Baayen P Milin D F ETHurđevic P Hendrix and MMarelli ldquoAn Amorphous Model for Morphological Process-ing in Visual Comprehension Based on Naive DiscriminativeLearningrdquo Psychological Review vol 118 no 3 pp 438ndash4812011

[55] F M Del Prado Martın R Bertram T Haikio R Schreuderand R H Baayen ldquoMorphological family size in a morphologi-cally rich languageThe case of Finnish compared to Dutch andHebrewrdquo Journal of Experimental Psychology LearningMemoryand Cognition vol 30 pp 1271ndash1278 2004

[56] P Milin D F Durdevic and F M Del Prado Martın ldquoThesimultaneous effects of inflectional paradigms and classes onlexical recognition Evidence from Serbianrdquo Journal of Memoryand Language vol 60 no 1 pp 50ndash64 2009

[57] B K Bergen ldquoThe psychological reality of phonaesthemesrdquoLanguage vol 80 no 2 pp 290ndash311 2004

[58] P Milin L B Feldman M Ramscar P Hendrix and R HBaayen ldquoDiscrimination in lexical decisionrdquo PLoS ONE vol 12no 2 2017

[59] R H Baayen P Milin and M Ramscar ldquoFrequency in lexicalprocessingrdquo Aphasiology vol 30 no 11 pp 1174ndash1220 2016

[60] M Marelli and M Baroni ldquoAffixation in semantic space Mod-eling morpheme meanings with compositional distributionalsemanticsrdquo Psychological Review vol 122 no 3 pp 485ndash5152015

[61] T Sering PMilin andRH Baayen ldquoLanguage comprehensionas amultiple label classification problemrdquo StatisticaNeerlandicapp 1ndash15 2018

[62] R Kaye and RWilson Linear Algebra Oxford University Press1998

[63] S H Ivens and B L Koslin Demands for Reading LiteracyRequire New Accountability Methods Touchstone Applied Sci-ence Associates 1991

[64] T K Landauer P W Foltz and D Laham ldquoAn introduction tolatent semantic analysisrdquoDiscourse Processes vol 25 no 2-3 pp259ndash284 1998

[65] H Schmid ldquoImprovements in Part-of-Speech Tagging with anApplication to Germanrdquo in Proceedings of the ACL SIGDAT-Workshop pp 13ndash25 Dublin Ireland 1995

[66] R H Baayen R Piepenbrock and L Gulikers The CELEXLexical Database (CD-ROM) Linguistic Data ConsortiumUniversity of Pennsylvania Philadelphia PA 1995

[67] J Mitchell and M Lapata ldquoVector-based models of semanticcompositionrdquo in Proceedings of the ACL pp 236ndash244 2008

[68] A Lazaridou M Marelli R Zamparelli and M BaronildquoCompositionally derived representations of morphologicallycomplex words in distributional semanticsrdquo in Proceedings ofthe ACL (1) pp 1517ndash1526 2013

[69] R Cotterell H Schutze and J Eisner ldquoMorphological smooth-ing and extrapolation of word embeddingsrdquo in Proceedings ofthe 54th Annual Meeting of the Association for ComputationalLinguistics vol 1 pp 1651ndash1660 2016

[70] B D Zeller S Pado and J Snajder ldquoTowards semantic valida-tion of a derivational lexiconrdquo in Proceedings of the COLING2014 the 25th International Conference on Computational Lin-guistics Technical Papers pp 1728ndash1739 2014

[71] H Marchand The Categories and Types of Present-Day EnglishWord Formation A Synchronic-Diachronic Approach BeckrsquoscheVerlagsbuchhandlung Munchen Germany 1969

[72] L Bauer EnglishWord-Formation Cambridge University PressCambridge 1983

Complexity 37

[73] I PlagWord-Formation in English CambridgeUniversity PressCambridge 2003

[74] R Beard Lexeme-Morpheme Base Morphology A GeneralTheory of Inflection and Word Formation State University ofNew York Press Albany NY USA 1995

[75] K Geeraert J Newman and R H Baayen ldquoIdiom VariationExperimental Data and a Blueprint of a Computational ModelrdquoTopics in Cognitive Science vol 9 no 3 pp 653ndash669 2017

[76] C Shaoul S Bitschau N Schilling et al ldquondl2 Naive discrim-inative learning an implementation in R R packagerdquo 2015

[77] P Milin D Divjak and R H Baayen ldquoA learning perspectiveon individual differences in skilled reading Exploring andexploiting orthographic and semantic discrimination cuesrdquoJournal of Experimental Psychology Learning Memory andCognition vol 43 no 11 pp 1730ndash1751 2017

[78] M Ramscar P Hendrix C Shaoul P Milin and R H BaayenldquoNonlinear dynamics of lifelong learning the myth of cognitivedeclinerdquo Topics in Cognitive Science vol 6 pp 5ndash42 2014

[79] M Ramscar C C Sun P Hendrix and R H Baayen ldquoTheMis-measurement of Mind Life-Span Changes in Paired-Associate-Learning Scores Reflect the ldquoCostrdquo of Learning Not CognitiveDeclinerdquo Psychological Science vol 28 no 8 pp 1171ndash1179 2017

[80] G desRosiers and D Ivison ldquoPaired associate learning Form 1and form 2 of the Wechsler memory scalerdquo Archives of ClinicalNeuropsychology vol 3 no 1 pp 47ndash67 1988

[81] E Bruni N K Tran andM Baroni ldquoMultimodal distributionalsemanticsrdquo Journal of Artificial Intelligence Research vol 49 pp1ndash47 2014

[82] W N Venables and B D RipleyModern Applied Statistics withS-PLUS Springer NY USA 2002

[83] A B Warriner V Kuperman and M Brysbaert ldquoNorms ofvalence arousal and dominance for 13915 English lemmasrdquoBehavior Research Methods vol 45 no 4 pp 1191ndash1207 2013

[84] D Kastovsky ldquoProductivity in word formationrdquo Linguistics vol24 no 3 pp 585ndash600 1986

[85] R H Baayen Y Chuang and J P Blevins ldquoInflectionalmorphology with linear mappingsrdquoTheMental Lexicon vol 13no 2 pp 232ndash270 2018

[86] Burnage CELEX A Guide for Users Centre for Lexical Infor-mation Nijmegen The Netherlands 1988

[87] C McBride-Chang ldquoModels of Speech Perception and Phono-logical Processing in Readingrdquo Child Development vol 67 no4 pp 1836ndash1856 1996

[88] G C Van Orden H Kloos et al ldquoThe question of phonologyand readingrdquo inThe Science of Reading A Handbook pp 61ndash782005

[89] D Jared and K OrsquoDonnell ldquoSkilled adult readers activate themeanings of high-frequency words using phonology Evidencefrom eye trackingrdquoMemoryampCognition vol 45 no 2 pp 334ndash346 2017

[90] G S Dell ldquoA Spreading-Activation Theory of Retrieval inSentence Productionrdquo Psychological Review vol 93 no 3 pp283ndash321 1986

[91] A Marantz ldquoNo escape from morphemes in morphologicalprocessingrdquo Language and Cognitive Processes vol 28 no 7 pp905ndash916 2013

[92] M Bozic W D Marslen-Wilson E A Stamatakis M HDavis and L K Tyler ldquoDifferentiating morphology formand meaning Neural correlates of morphological complexityrdquoCognitive Neuroscience vol 19 no 9 pp 1464ndash1475 2007

[93] R F Port andA P Leary ldquoAgainst formal phonologyrdquoLanguagevol 81 no 4 pp 927ndash964 2005

[94] J P Blevins ldquoStems and paradigmsrdquo Language vol 79 no 4 pp737ndash767 2003

[95] R Forsyth and D Holmes ldquoFeature-finding for test classifica-tionrdquo Literary and Linguistic Computing vol 11 no 4 pp 163ndash174 1996

[96] H Pham and R H Baayen ldquoVietnamese compounds showan anti-frequency effect in visual lexical decisionrdquo LanguageCognition and Neuroscience vol 30 no 9 pp 1077ndash1095 2015

[97] L Cohen and S Dehaene ldquoVentral and dorsal contributions toword readingrdquo inTheCognitive Neurosciences M S GazzanigaEd pp 789ndash804 Massachusetts Institute of Technology 2009

[98] E Keuleers P Lacey K Rastle and M Brysbaert ldquoThe BritishLexicon Project Lexical decision data for 28730 monosyllabicand disyllabic English wordsrdquo Behavior Research Methods vol44 no 1 pp 287ndash304 2012

[99] R K Wagner J K Torgesen and C A Rashotte ldquoDevel-opment of reading-related phonological processing abilitiesnew evidence of bidirectional causality from a latent variablelongitudinal studyrdquoDevelopmental Psychology vol 30 no 1 pp73ndash87 1994

[100] K F EWong andH-C Chen ldquoOrthographic and phonologicalprocessing in reading Chinese text Evidence from eye fixa-tionsrdquo Language and Cognitive Processes vol 14 no 5-6 pp461ndash480 1999

[101] R L Newman D Jared and C A Haigh ldquoDoes phonologyplay a role when skilled readers read high-frequency wordsEvidence from ERPsrdquo Language and Cognitive Processes vol 27no 9 pp 1361ndash1384 2012

[102] D Jared J Ashby S J Agauas and B A Levy ldquoPhonologicalactivation of word meanings in grade 5 readersrdquo Journal ofExperimental Psychology LearningMemory andCognition vol42 no 4 pp 524ndash541 2016

[103] T Bitan A Kaftory A Meiri-Leib Z Eviatar and O PelegldquoPhonological ambiguity modulates resolution of semanticambiguity during reading An fMRI study of Hebrewrdquo Neu-ropsychology vol 31 no 7 pp 759ndash777 2017

[104] D Jared and S Bainbridge ldquoReading homophone puns Evi-dence from eye trackingrdquo Canadian Journal of ExperimentalPsychology vol 71 no 1 pp 2ndash13 2017

[105] S Amenta M Marelli and S Sulpizio ldquoFrom sound tomeaning Phonology-to-Semantics mapping in visual wordrecognitionrdquo Psychonomic Bulletin amp Review vol 24 no 3 pp887ndash893 2017

[106] M Perrone-Bertolotti J Kujala J R Vidal et al ldquoHow silent issilent reading intracerebral evidence for top-down activationof temporal voice areas during readingrdquo The Journal of Neuro-science vol 32 no 49 pp 17554ndash17562 2012

[107] B Yao P Belin andC Scheepers ldquoSilent reading of direct versusindirect speech activates voice-selective areas in the auditorycortexrdquo Journal of Cognitive Neuroscience vol 23 no 10 pp3146ndash3152 2011

[108] M Coltheart B Curtis P Atkins and M Haller ldquoMod-els of Reading Aloud Dual-Route and Parallel-Distributed-Processing Approachesrdquo Psychological Review vol 100 no 4pp 589ndash608 1993

[109] M Coltheart ldquoModeling reading The dual-route approachrdquo inThe Science of Reading A Handbook pp 6ndash23 2005

38 Complexity

[110] M Pitt K Johnson E Hume S Kiesling and W RaymondldquoThe Buckeye corpus of conversational speech Labeling con-ventions and a test of transcriber reliabilityrdquo Speech Communi-cation vol 45 no 1 pp 89ndash95 2005

[111] K Johnson ldquoMassive reduction in conversational AmericanEnglishrdquo in Proceedings of the Spontaneous Speech Data andAnalysis Proceedings of the 1st Session of the 10th InternationalSymposium pp 29ndash54 The National International Institute forJapanese Language Tokyo Japan 2004

[112] C Phillips ldquoLevels of representation in the electrophysiology ofspeech perceptionrdquo Cognitive Science vol 25 no 5 pp 711ndash7312001

[113] R L Diehl A J Lotto and L L Holt ldquoSpeech perceptionrdquoAnnual Review of Psychology vol 55 pp 149ndash179 2004

[114] D Norris and J M McQueen ldquoShortlist B a Bayesian model ofcontinuous speech recognitionrdquo Psychological Review vol 115no 2 pp 357ndash395 2008

[115] S Hawkins ldquoRoles and representations of systematic finephonetic detail in speech understandingrdquo Journal of Phoneticsvol 31 no 3-4 pp 373ndash405 2003

[116] CCucchiarini andH Strik ldquoAutomatic Phonetic TranscriptionAn overviewrdquo in Proceedings of the 15th ICPhS pp 347ndash350Barcelona Spain 2003

[117] K JohnsonTheAuditoryPerceptual Basis for Speech Segmenta-tion Ohio State University Working Papers in Linguistics 1997

[118] H Fletcher ldquoAuditory patternsrdquo Reviews of Modern Physics vol12 no 1 pp 47ndash65 1940

[119] D Arnold ldquoAcousticndlcoder Coding sound files for use withndl R package version 101rdquo 2017

[120] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2016

[121] J Pickett and I Pollack ldquoIntelligibility of excerpts from fluentspeech Effects of rate of utterance and duration of excerptrdquoLanguage and Speech vol 6 pp 151ndash164 1963

[122] L Shockey ldquoPerception of reduced forms by non-native speak-ers of Englishrdquo in Sound Patterns of Spontaneous Speech DDuez Ed pp 97ndash100 ESCA Aix 1998

[123] M Ernestus H Baayen and R Schreuder ldquoThe recognition ofreduced word formsrdquo Brain and Language vol 81 no 1-3 pp162ndash173 2002

[124] R H Baayen C Shaoul J Willits and M Ramscar ldquoCom-prehension without segmentation a proof of concept withnaive discriminative learningrdquo Language Cognition and Neu-roscience vol 31 no 1 pp 106ndash128 2016

[125] C P Browman and L Goldstein ldquoArticulatory phonology Anoverviewrdquo Phonetica vol 49 no 3-4 pp 155ndash180 1992

[126] J A Tourville and F H Guenther ldquoThe DIVA model A neuraltheory of speech acquisition and productionrdquo Language andCognitive Processes vol 26 no 7 pp 952ndash981 2011

[127] G Csardi and T Nepusz ldquoThe igraph software package forcomplex network researchrdquo InterJournal Complex Systems vol1695 no 5 pp 1ndash9 2006

[128] M Ernestus and R H Baayen ldquoPredicting the unpredictableInterpreting neutralized segments in Dutchrdquo Language vol 79no 1 pp 5ndash38 2003

[129] R Weingarten G Nottbusch and U Will ldquoMorphemes sylla-bles and graphemes in written word productionrdquo inMultidisci-plinary Approaches to Speech Production T Pechmann and CHabel Eds pp 529ndash572 Mouton de Gruyter Berlin Germany2004

[130] R Bertram F E Toslashnnessen S Stromqvist J Hyona andP Niemi ldquoCascaded processing in written compound wordproductionrdquo Frontiers in Human Neuroscience vol 9 pp 1ndash102015

[131] T Cho ldquoEffects of morpheme boundaries on intergesturaltiming Evidence from Koreanrdquo Phonetica vol 58 no 3 pp129ndash162 2001

[132] S N Wood Generalized Additive Models Chapman ampHallCRC NY USA 2017

[133] M Seidenberg ldquoSublexical structures in visual word recogni-tion Access units or orthographic redundancyrdquo in Attentionand Performance XII M Coltheart Ed pp 245ndash264 LawrenceErlbaum Associates Hove 1987

[134] J M McQueen ldquoSegmentation of continuous speech usingphonotacticsrdquo Journal of Memory and Language vol 39 no 1pp 21ndash46 1998

[135] J B Hay Causes and Consequences of Word Structure Rout-ledge NY USA London UK 2003

[136] J B Hay ldquoFrom speech perception to morphology Affix-ordering revisitedrdquo Language vol 78 pp 527ndash555 2002

[137] J B Hay and R H Baayen ldquoPhonotactics parsing and produc-tivityrdquo Italian Journal of Linguistics vol 15 no 1 pp 99ndash1302003

[138] J Hay J Pierrehumbert and M Beckman ldquoSpeech perceptionwell-formedness and the statistics of the lexiconrdquo Papers inlaboratory phonology VI pp 58ndash74 2004

[139] M Ferro C Marzi and V Pirrelli ldquoA self-organizing modelof word storage and processing implications for morphologylearningrdquo Lingue e Linguaggio vol 10 no 2 pp 209ndash226 2011

[140] F ChersiM Ferro G Pezzulo andV Pirrelli ldquoTopological self-organization and prediction learning support both action andlexical chains in the brainrdquo Topics in Cognitive Science vol 6no 3 pp 476ndash491 2014

[141] V Pirrelli M Ferro and C Marzi ldquoComputational complexityof abstractive morphologyrdquo in Understanding and MeasuringMorphological Complexity M Bearman D Brown and G GCorbett Eds pp 141ndash166 Oxford University Press 2015

[142] C Marzi M Ferro and V Pirrelli ldquoIs inflectional irregularitydysfunctional to human processingrdquo in Abstract Booklet TheMental Lexicon 2018 V Kuperman Ed University of AlbertaEdmonton 2018

[143] E Keuleers M Stevens P Mandera and M Brysbaert ldquoWordknowledge in the crowd Measuring vocabulary size and wordprevalence in a massive online experimentrdquo The QuarterlyJournal of Experimental Psychology vol 68 no 8 pp 1665ndash16922015

[144] M Ernestus Voice Assimilation And Segment Reduction inCasual Dutch A Corpus-Based Study of the Phonology-PhoneticsInterface LOT Utrecht the Netherlands 2000

[145] R Kemps M Ernestus R Schreuder and H Baayen ldquoProcess-ing reducedword formsThe suffix restoration effectrdquoBrain andLanguage vol 90 no 1-3 pp 117ndash127 2004

[146] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) vol 1 pp 886ndash893 June 2005

[147] G Hickok ldquoThe architecture of speech production and the roleof the phoneme in speech processingrdquo Language Cognition andNeuroscience vol 29 no 1 pp 2ndash20 2014

[148] B V Tucker M Sims and R H Baayen Opposing Forceson Acoustic Duration Manuscript University of Alberta andUniversity of Tubingen 2018

Complexity 39

[149] I Plag J Homann and G Kunter ldquoHomophony and mor-phology The acoustics of word-final S in Englishrdquo Journal ofLinguistics vol 53 no 1 pp 181ndash216 2017

[150] F Tomaschek I Plag M Ernestus and R H BaayenModeling the Duration of Word-Final s in English withNaive Discriminative Learning Manuscript University ofSiegenTubingenNijmegen 2018

[151] R E Kalman ldquoA new approach to linear filtering and predictionproblemsrdquo Journal of Fluids Engineering vol 82 no 1 pp 35ndash451960

[152] B Russell ldquoOn denotingrdquo Mind vol 14 no 56 pp 479ndash4931905

[153] N Hornstein Logical Form FromGB toMinimalism Blackwell1995

[154] R Jackendoff Semantic StructuresMITPress Cambridge 1990[155] J J McCarthy ldquoA prosodic theory of non-concatenative mor-

phologyrdquo Linguistic Inquiry vol 12 pp 373ndash418 1981[156] A Ussishkin ldquoA Fixed Prosodic Theory of Nonconcatenative

Templaticmorphologyrdquo Natural Language amp Linguistic Theoryvol 23 no 1 pp 169ndash218 2005

[157] A Ussishkin ldquoAffix-favored contrast inequity and psycholin-guistic grounding for non-concatenative morphologyrdquo Mor-phology vol 16 no 1 pp 107ndash125 2006

[158] RW Young andWMorganTheNavajo Language A Grammarand Colloquial Dictionary University of New Mexico Press1980

[159] S C Levinson and A Majid ldquoThe island of time Yelı Dnye thelanguage of Rossel islandrdquo Frontiers in psychology vol 4 2013

[160] S C Levinson ldquoThe language of space in Yelı Dnyerdquo inGrammars of Space Explorations in Cognitive Diversity pp 157ndash203 Cambridge University Press 2006

[161] E Shafaei-Bajestan and R H Baayen ldquoWide Learning forAuditory Comprehensionrdquo in Proceedings of the Interspeech2018 2018

[162] J Hay and K Drager ldquoStuffed toys and speech perceptionrdquoLinguistics vol 48 no 4 pp 865ndash892 2010

[163] V Henri and C Henri ldquoOn our earliest recollections ofchildhoodrdquo Psychological Review vol 2 pp 215-216 1895

[164] S Freud ldquoChildhood and concealing memoriesrdquo in The BasicWritings of Sigmund Freud The Modern Library NY USA19051953

[165] P J Bauer and M Larkina ldquoThe onset of childhood amnesiain childhood A prospective investigation of the course anddeterminants of forgetting of early-life eventsrdquoMemory vol 22no 8 pp 907ndash924 2014

[166] M S Seidenberg and L M Gonnerman ldquoExplaining deriva-tional morphology as the convergence of codesrdquo Trends inCognitive Sciences vol 4 no 9 pp 353ndash361 2000

[167] M Coltheart K Rastle C Perry R Langdon and J ZieglerldquoDRC a dual route cascaded model of visual word recognitionand reading aloudrdquoPsychological Review vol 108 no 1 pp 204ndash256 2001

[168] M V Butz and E F Kutter How the Mind Comes into BeingIntroducing Cognitive Science from a Functional and Computa-tional Perspective Oxford University Press 2016

[169] M W Harm and M S Seidenberg ldquoPhonology readingacquisition and dyslexia Insights from connectionist modelsrdquoPsychological Review vol 106 no 3 pp 491ndash528 1999

[170] L M Gonnerman M S Seidenberg and E S AndersenldquoGraded semantic and phonological similarity effects in prim-ing Evidence for a distributed connectionist approach to

morphologyrdquo Journal of Experimental Psychology General vol136 no 2 pp 323ndash345 2007

[171] H Zhao O Gallo I Frosio and J Kautz ldquoLoss Functions forImage Restoration With Neural Networksrdquo IEEE Transactionson Computational Imaging vol 3 no 1 pp 47ndash57 2017

[172] C Clarke and P Luce ldquoPerceptual adaptation to speakercharacteristics Vot boundaries in stop voicing categorizationrdquoin Proceedings of the ISCA Workshop on Plasticity in SpeechPerception 2005

[173] DNorris JMMcQueen andACutler ldquoPerceptual learning inspeechrdquo Cognitive Psychology vol 47 no 2 pp 204ndash238 2003

[174] A Schweitzer and N Lewandowski ldquoSocial factors in conver-gence of f1 and f2 in spontaneous speechrdquo in Proceedings ofthe 10th International Seminar on Speech Production CologneGermany 2014

[175] M J Pickering and S Garrod ldquoToward a mechanistic psychol-ogy of dialoguerdquo Behavioral and Brain Sciences vol 27 no 2 pp169ndash190 2004

[176] C M Clarke and M F Garrett ldquoRapid adaptation to foreign-accented Englishrdquo The Journal of the Acoustical Society ofAmerica vol 116 no 6 pp 3647ndash3658 2004

[177] W J Levelt ldquoDie konnektionistische moderdquo Sprache UndKognition vol 10 no 2 pp 61ndash72 1991

[178] A Kostic ldquoInformational load constraints on processinginflected morphologyrdquo in Morphological Aspects of LanguageProcessing L B Feldman Ed Lawrence Erlbaum Inc Publish-ers New Jersey USA 1995

[179] J P BlevinsWord andParadigmMorphology OxfordUniversityPress 2016

[180] E Forstemann Uber Deutsche volksetymologie Zeitschrift furvergleichende Sprachforschung auf dem Gebiete des DeutschenGriechischen und Lateinischen 1852

[181] K Nault Morphological Therapy Protocal [Ph D thesis] Uni-versity of Alberta Edmonton 2010

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 32: The Discriminative Lexicon: A Unified Computational Model ...downloads.hindawi.com/journals/complexity/2019/4895891.pdf · as developed within computational linguistics and com-puter

32 Complexity

minus4 minus2 0 2 4

minus4

minus2

02

4

x

y

a

b

x

y

Figure 12 Points 119886 and 119887 can be transformed into points 119901 and 119902 bya linear transformation

We can represent a and b jointly by placing them next to eachother in a matrix For matrices we use capital letters in boldfont

A = ( 3 4minus2 minus3) (A2)

In addition to the points 119886 and 119887 Figure 12 also shows twoother datapoints 119909 and 119910 The matrix for these two points is

B = (minus5 24 minus1) (A3)

Let us assume that points 119886 and 119887 represent the forms oftwo words 1205961 and 1205962 and that the points 119909 and 119910 representthe meanings of these two words (below we discuss in detailhow exactly wordsrsquo forms and meanings can be representedas vectors of numbers) As morphology is the study of therelation between wordsrsquo forms and their meanings we areinterested in how to transform points 119886 and b into points119909 and 119910 and likewise in how to transform points 119909 and119910 back into points 119886 and 119887 The first transformation isrequired for comprehension and the second transformationfor production

We begin with the transformation for comprehensionFormally we have matrix A which we want to transforminto matrix B A way of mapping A onto B is by means ofa linear transformation using matrix multiplication For 2 times2 matrices matrix multiplication is defined as follows(1198861 11988621198871 1198872) (1199091 11990921199101 1199102) = (11988611199091 + 11988621199101 11988611199092 + 1198862119910211988711199091 + 11988721199101 11988711199092 + 11988721199102) (A4)

Such a mapping of A onto B is given by a matrix F

F = ( 1 2minus2 minus1) (A5)

and it is straightforward to verify that indeed( 3 4minus2 minus3) ( 1 2minus2 minus1) = (minus5 24 minus1) (A6)

Using matrix notation we can write

AF = B (A7)

Given A and B how do we obtain F The answeris straightforward but we need two further concepts theidentity matrix and the matrix inverse For multiplication ofnumbers the identity multiplication is multiplication with 1For matrices multiplication with the identity matrix I leavesthe locations of the points unchangedThe identitymatrix hasones on the main diagonal and zeros elsewhere It is easilyverified that indeed(1 00 1) (1199091 11990921199101 1199102) = (1199091 11990921199101 1199102) (1 00 1) = (1199091 11990921199101 1199102) (A8)

For numbers the inverse of multiplication with 119904 is dividingby 119904 (1119904 ) (119904119909) = 119909 (A9)

For matrices the inverse of a square matrix X is that matrixXminus1 such that their product is the identity matrix

Xminus1X = XXminus1 = I (A10)

For nonsquare matrices the inverse Yminus1 is defined such that

Yminus1 (YX) = (XY) 119884minus1 = X (A11)

We find the square matrix F that maps the square matrices Aonto B as follows

AF = B

Aminus1AF = Aminus1B

IF = Aminus1B

F = Aminus1B(A12)

Since for the present example A and its inverse happen to beidentical (Aminus1 = A) we obtain (A6)

Fmaps wordsrsquo form vectors onto wordsrsquo semantic vectorsLet us now consider the reverse and see how we can obtainwordsrsquo form vectors from wordsrsquo semantic vectors That isgiven B we want to find that matrix G which maps B ontoA ie

BG = A (A13)

Solving this equation proceeds exactly as above

BG = A

Bminus1BG = Bminus1A

IG = Bminus1A

G = Bminus1A(A14)

Complexity 33

The inverse of B is

Bminus1 = (13 2343 53) (A15)

and hence we have

(13 2343 53) ( 3 4minus2 minus3) = (minus13 minus2323 13 ) = G (A16)

The inverse of a matrix needs not exist A square matrixthat does not have an inverse is referred to as a singularmatrix Almost all matrices with which we work in theremainder of this study are singular For singular matricesan approximation of the inverse can be used such as theMoore-Penrose generalized inverse (In R the generalizedinverse is available in theMASS package function ginv Thenumeric library that is used by ginv is LAPACK availableat httpwwwnetliborglapack) In this study wedenote the generalized inverse of a matrix X by X1015840

The examples presented here are restricted to 2 times 2matrices but themathematics generalize tomatrices of largerdimensions When multiplying two matrices X and Y theonly constraint is that the number of columns ofX is the sameas the number of rows ofYThe resultingmatrix has the samenumber of rows as X and the same number of columns as Y

In the present study the rows of matrices represent wordsand columns the features of thesewords For instance the rowvectors of A can represent wordsrsquo form features and the rowvectors of and B the corresponding semantic features Thefirst row vector of A is mapped onto the first row vector ofB as follows (3 4) ( 1 2minus2 minus1) = (minus5 2) (A17)

A transformationmatrix such as F can be represented as atwo-layer networkThe network corresponding to F is shownin Figure 13 When the input vector (3 4) is presented to thenetwork the nodes 1198941 and 1198942 are set to 3 and 4 respectively Toobtain the activations 1199001 = minus5 and 1199002 = 2 for the output vector(minus5 2) we cumulate the evidence at the input weighted bythe connection strengths specified on the edges of the graphThus for 1199001 we obtain the value 3 times 1 + 4 times minus2 = minus5 and for1199002 we have 3 times 2 + 4 times minus1 = 2 In other words the networkmaps the input vector (3 4) onto the output vector (minus5 2)

An important property of linear maps is that they areproductive We can present a novel form vector say

s = (2 2) (A18)

to the network and it will map this vector onto a novelsemantic vector Using matrices this new semantic vector isstraightforwardly calculated(2 2) ( 1 2minus2 minus1) = (minus2 2) (A19)

1 2 minus2 minus1

i1 i2

o1 o2

semantic vectors come out here

form features go in here

connection weightsare the transformationmatrix

Figure 13 The linear network corresponding to the transformationmatrix F There are two input nodes 1198941 and 1198942 (in blue)and twooutput nodes 1199001 and 1199002 in red When the input vector (3 4) ispresented to the network the value at 1199001 is 3 times 1 + 4 times minus2 = minus5and that at 1199002 is 3 times 2 + 4 times minus1 = 2 The network maps the point(3 4) onto the point (minus5 2)B Graph-Based Triphone Sequencing

The hypothesis underlying graph-based triphone sequencingis that in the directed graph that has triphones as verticesand an edge between any two vertices that overlap (iesegments 2 and 3 of triphone 1 are identical to segments 1and 2 of triphone 2) the path from a wordrsquos initial triphone(the triphone starting with ) to a wordrsquos final triphone (thetriphone ending with a ) is the path receiving the bestsupport of all possible paths from the initial to the finaltriphone Unfortunately the directed graph containing allvertices and edges is too large tomake computations tractablein a reasonable amount of computation time The followingalgorithm is a heuristic algorithm that first collects potentiallyrelevant vertices and edges and then calculates all pathsstarting fromany initial vertex using theall simple pathsfunction from the igraph package [127] As a first step allvertices that are supported by the stem and whole wordwith network support exceeding 01 are selected To theresulting set those vertices are added that are supported bythe affix which is conditional on these vertices having anetwork support exceeding 095 A directed graph can nowbe constructed and for each initial vertex all paths startingat these vertices are calculated The subset of those pathsreaching a final vertex is selected and the support for eachpath is calculated As longer paths trivially will have greatersupport the support for a path is weighted for its lengthsimply by dividing the raw support by the path length Pathsare ranked by weighted path support and the path withmaximal support is selected as the acoustic image drivingarticulation

It is possible that no path from an initial vertex to a finalvertex is found due to critical boundary triphones not beinginstantiated in the data on which the model was trainedThis happens when the model is assessed on novel inflected

34 Complexity

nobo

dyso

met

hing

ever

ythi

ngno

thin

gev

eryo

nean

ythi

ngan

yone

som

eone

hers

him his

via

than I

exce

ptou

tside

thei

rsw

hoev

erth

ence

min

ein

side

amon

gst its

thei

rou

rm

yyo

uyo

ur her

she

your

selv

esw

hate

ver us we it

they he

them m

em

yself

him

self

hers

elfou

rselv

esyo

urse

lfits

elfth

emse

lves

abou

tw

ithou

tw

ithin

beyo

ndam

ong

betw

een

thro

ugho

utdu

ring

since by

from in of

with at fo

r tobe

low

abov

e up offdo

wn

upon on

agai

nst

tow

ard

tow

ards

arou

ndne

aral

ong

unde

rbe

hind

besid

eth

roug

hon

toin

toov

erac

ross

nobodysomethingeverythingnothingeveryoneanythinganyonesomeonehershimhisviathanIexceptoutsidetheirswhoeverthencemineinsideamongstitstheirourmyyouyourhersheyourselveswhateverusweittheyhethemmemyselfhimselfherselfourselvesyourselfitselfthemselvesaboutwithoutwithinbeyondamongbetweenthroughoutduringsincebyfrominofwithatfortobelowaboveupoffdownupononagainsttowardtowardsaroundnearalongunderbehindbesidethroughontointooveracross

minus01 0 01 02Value

Color Keyand Histogram

050

015

0025

00C

ount

Figure 14 Heatmap for the correlations of the semantic vectors of pronouns and prepositions Note that quantifier pronouns form a groupof their own that prepositions and the other pronouns form distinguisable groups and that within groups further subclusters are visible egforwe and us she and her and across and over All diagonal elements represent correlations equal to 1 (this is not brought out by the heatmapwhich color-codes diagonal elements with the color for the highest correlation of the range we specified 025)

and derived words under cross-validationTherefore verticesand edges are added to the graph for all nonfinal verticesthat are at the end of any of the paths starting at any initialtriphone Such novel vertices and edges are assigned zeronetwork support Typically paths from the initial vertex tothe final vertex can now be constructed and path support isevaluated as above

For an algorithm that allows triphone paths to includecycles which may be the case in languages with much richer

morphology than English see Baayen et al [85] and for codethe R packageWpmWithLdl described therein

C A Heatmap for Function Words(Pronouns and Prepositions)

The heatmap for the correlations of the semantic vectors ofpronouns and prepositions is presented in Figure 14

Complexity 35

Data Availability

With the exception of the data from theDistributed Little RedHen Lab all primary data are publicly available from the citedsources

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The authors are indebted to Geoff Hollis Jessie Nixon ChrisWestbury and LuisMienhardt for their constructive feedbackon earlier versions of this manuscript This research wassupported by an ERC advancedGrant (no 742545) to the firstauthor

References

[1] G Frege ldquoBegriffsschrift a formula language modeled uponthat of arithmetic for pure thoughtrdquo in From Frege to GodelA Source Book in Mathematical Logic 1879-1931 pp 1ndash82 1967

[2] B RussellAn Inquiry intoMeaning andTruth Allen andUnwinLondon UK 1942

[3] R Montague ldquoThe proper treatment of quantification in ordi-nary englishrdquo in Approaches to Natural Language pp 221ndash242Springer 1973

[4] N Chomsky Syntactic Structures vol 33 Mouton The Hague1957

[5] W J M Levelt Formal Grammars in Linguistics And Psycholin-guistics Volume 1 An Introduction to the Theory of FormalLanguages and Automata Volume 2 Applications in LinguisticTheory Volume 3 Psycholinguistic Applications John BenjaminsPublishing 2008

[6] D F Kleinschmidt and T Florian Jaeger ldquoRobust speechperception Recognize the familiar generalize to the similar andadapt to the novelrdquo Psychological Review vol 122 no 2 pp 148ndash203 2015

[7] L Breiman ldquoRandom forestsrdquo Machine Learning vol 45 no 1pp 5ndash32 2001

[8] C Strobl A-L Boulesteix A Zeileis and T Hothorn ldquoBiasin random forest variable importance measures illustrationssources and a solutionrdquo BMC Bioinformatics vol 8 2007

[9] A Hannun C Case J Casper et al ldquoDeep speechScaling up end-to-end speech recognitionrdquo 2014httpsarxivorgabs14125567

[10] Z Tuske P Golik R Schluter and H Ney ldquoAcoustic modelingwith deep neural networks using raw time signal for lvcsrrdquo inProceedings of the INTERSPEECH pp 890ndash894 2014

[11] J Schmidhuber ldquoDeep learning in neural networks anoverviewrdquo Neural Networks vol 61 pp 85ndash117 2015

[12] T Hannagan J C Ziegler S Dufau J Fagot and J GraingerldquoDeep learning of orthographic representations in baboonsrdquoPLoS ONE vol 9 no 1 2014

[13] J Grainger S Dufau M Montant J C Ziegler and J FagotldquoOrthographic processing in baboons (Papio papio)rdquo Sciencevol 336 no 6078 pp 245ndash248 2012

[14] D Scarf K Boy A U Reinert J Devine O Gunturkun andM Colombo ldquoOrthographic processing in pigeons (columba

livia)rdquo Proceedings of the National Academy of Sciences vol 113no 40 pp 11272ndash11276 2016

[15] M Linke F Broker M Ramscar and R H Baayen ldquoArebaboons learning ldquoorthographicrdquo representations Probablynotrdquo PLoS ONE vol 12 no 8 2017

[16] D H Hubel and T N Wiesel ldquoReceptive fields binocularinteraction and functional architecture in the catrsquos visualcortexrdquo The Journal of Physiology vol 160 pp 106ndash154 1962

[17] R H Baayen and P Hendrix ldquoTwo-layer networks non-linear separation and human learningrdquo in From Semantics toDialectometry Festschrift in honor of JohnNerbonne Tributes 32M Wieling M Kroon G Van Noord and G Bouma Eds pp13ndash22 College Publications 2017

[18] D Arnold F Tomaschek K Sering F Lopez and R H BaayenldquoWords from spontaneous conversational speech can be rec-ognized with human-like accuracy by an error-driven learningalgorithm that discriminates between meanings straight fromsmart acoustic features bypassing the phoneme as recognitionunitrdquo PLoS ONE vol 12 no 4 2017

[19] P Zwitserlood ldquoProcessing and representation of morphologi-cal complexity in native language comprehension and produc-tionrdquo in The Construction of Words G E Booij Ed pp 583ndash602 Springer 2018

[20] G Booij ldquoConstruction morphologyrdquo Language and LinguisticsCompass vol 4 no 7 pp 543ndash555 2010

[21] J P Blevins ldquoWord-based morphologyrdquo Journal of Linguisticsvol 42 no 3 pp 531ndash573 2006

[22] G Stump Inflectional Morphology A Theory of ParadigmStructure Cambridge University Press 2001

[23] R Beard ldquoOn the extent and nature of irregularity in thelexiconrdquo Lingua vol 42 no 4 pp 305ndash341 1977

[24] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1974

[25] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1991

[26] C Hockett ldquoThe origin of speechrdquo Scientific American vol 203pp 89ndash97 1960

[27] M Erelt Estonian Language M Erelt Ed Estonian AcademyPublishers Tallinn Estonia 2003

[28] P H Matthews Grammatical Theory in the United States fromBloomfield to Chomsky Cambridge University Press Cam-bridge 1993

[29] N Chomsky andMHalleTheSound Pattern of English Harperand Row NY USA 1968

[30] T Landauer and S Dumais ldquoA solution to Platorsquos problem Thelatent semantic analysis theory of acquisition induction andrepresentation of knowledgerdquo Psychological Review vol 104 no2 pp 211ndash240 1997

[31] W Weaver ldquoTranslationrdquo in Machine Translation of LanguagesFourteen Essays W N Locke and A D Booth Eds pp 15ndash23MIT Press Cambridge 1955

[32] J R Firth Selected Papers of J R Firth 1952-59 IndianaUniversity Press 1968

[33] K Lund and C Burgess ldquoProducing high-dimensional seman-tic spaces from lexical co-occurrencerdquo Behavior Research Meth-ods Instruments and Computers vol 28 no 2 pp 203ndash2081996

[34] C Shaoul and C Westbury ldquoExploring lexical co-occurrencespace using HiDExrdquo Behavior Research Methods vol 42 no 2pp 393ndash413 2010

36 Complexity

[35] T Mikolov I Sutskever K Chen G Corrado and J DeanldquoDistributed representations of words and phrases and theircompositionalityrdquo inAdvances in Neural Information ProcessingSystems pp 3111ndash3119 2013

[36] A Roelofs ldquoThe WEAVER model of word-form encoding inspeech productionrdquo Cognition vol 64 no 3 pp 249ndash284 1997

[37] W J M Levelt A Roelofs and A S Meyer ldquoA theory of lexicalaccess in speech productionrdquoBehavioral and Brain Sciences vol22 no 1 pp 1ndash38 1999

[38] M Taft ldquoInteractive-activation as a Framework for Under-standing Morphological Processingrdquo Language and CognitiveProcesses vol 9 no 3 pp 271ndash294 1994

[39] R Schreuder and R H Baayen ldquoModeling morphologicalprocessingrdquo inMorphological Aspects of Language Processing LB Feldman Ed pp 131ndash154 Lawrence ErlbaumHillsdale NewJersey USA 1995

[40] B C Love D L Medin and T M Gureckis ldquoSUSTAIN ANetwork Model of Category Learningrdquo Psychological Reviewvol 111 no 2 pp 309ndash332 2004

[41] C J Marsolek ldquoWhat antipriming reveals about primingrdquoTrends in Cognitive Sciences vol 12 no 5 pp 176ndash181 2008

[42] M Ramscar and R Port ldquoCategorization (without categories)rdquoin Handbook of Cognitive Linguistics E Dabrowska and DDivjak Eds pp 75ndash99 De Gruyter Berlin Germany 2015

[43] R A Rescorla and A R Wagner ldquoA theory of Pavlovianconditioning Variations in the effectiveness of reinforcementand nonreinforcementrdquo in Classical Conditioning II CurrentResearch and Theory A H Black and W F Prokasy Eds pp64ndash99 Appleton Century Crofts NY USA 1972

[44] B Widrow and M E Hoff ldquoAdaptive switching circuitsrdquo 1960WESCON Convention Record Part IV pp 96ndash104 1960

[45] R H Baayen R Schreuder and R Sproat ldquoMorphology inthe Mental Lexicon A Computational Model for Visual WordRecognitionrdquo in Lexicon Development for Speech and LanguageProcessing F van Eynde and D Gibbon Eds pp 267ndash291Kluwer Academic Publishers 2000

[46] M Taft ldquoA morphological-decomposition model of lexicalrepresentationrdquo Linguistics vol 26 no 4 pp 657ndash667 1988

[47] MWHarmandM S Seidenberg ldquoComputing themeanings ofwords in reading Cooperative division of labor between visualand phonological processesrdquo Psychological Review vol 111 no3 pp 662ndash720 2004

[48] P C Trimmer J M McNamara A Houston and J A RMarshall ldquoDoes natural selection favour the Rescorla-Wagnerrulerdquo Journal of Theoretical Biology vol 302 pp 39ndash52 2012

[49] R R Miller R C Barnet and N J Grahame ldquoAssessment ofthe Rescorla-Wagner modelrdquo Psychological Bulletin vol 117 no3 pp 363ndash386 1995

[50] L J Kamin ldquoPredictability surprise attention and condition-ingrdquo in Punishment and Aversive Behavior B A Campbell andRMChurch Eds pp 276ndash296Appleton-Century-CroftsNYUSA 1969

[51] R A Rescorla ldquoPavlovian conditioning Itrsquos not what you thinkit isrdquo American Psychologist vol 43 no 3 pp 151ndash160 1988

[52] M Ramscar D Yarlett M Dye K Denny and K ThorpeldquoThe Effects of Feature-Label-Order and Their Implications forSymbolic Learningrdquo Cognitive Science vol 34 no 6 pp 909ndash957 2010

[53] M Ramscar and D Yarlett ldquoLinguistic self-correction in theabsence of feedback A new approach to the logical problem oflanguage acquisitionrdquo Cognitive Science vol 31 no 6 pp 927ndash960 2007

[54] R H Baayen P Milin D F ETHurđevic P Hendrix and MMarelli ldquoAn Amorphous Model for Morphological Process-ing in Visual Comprehension Based on Naive DiscriminativeLearningrdquo Psychological Review vol 118 no 3 pp 438ndash4812011

[55] F M Del Prado Martın R Bertram T Haikio R Schreuderand R H Baayen ldquoMorphological family size in a morphologi-cally rich languageThe case of Finnish compared to Dutch andHebrewrdquo Journal of Experimental Psychology LearningMemoryand Cognition vol 30 pp 1271ndash1278 2004

[56] P Milin D F Durdevic and F M Del Prado Martın ldquoThesimultaneous effects of inflectional paradigms and classes onlexical recognition Evidence from Serbianrdquo Journal of Memoryand Language vol 60 no 1 pp 50ndash64 2009

[57] B K Bergen ldquoThe psychological reality of phonaesthemesrdquoLanguage vol 80 no 2 pp 290ndash311 2004

[58] P Milin L B Feldman M Ramscar P Hendrix and R HBaayen ldquoDiscrimination in lexical decisionrdquo PLoS ONE vol 12no 2 2017

[59] R H Baayen P Milin and M Ramscar ldquoFrequency in lexicalprocessingrdquo Aphasiology vol 30 no 11 pp 1174ndash1220 2016

[60] M Marelli and M Baroni ldquoAffixation in semantic space Mod-eling morpheme meanings with compositional distributionalsemanticsrdquo Psychological Review vol 122 no 3 pp 485ndash5152015

[61] T Sering PMilin andRH Baayen ldquoLanguage comprehensionas amultiple label classification problemrdquo StatisticaNeerlandicapp 1ndash15 2018

[62] R Kaye and RWilson Linear Algebra Oxford University Press1998

[63] S H Ivens and B L Koslin Demands for Reading LiteracyRequire New Accountability Methods Touchstone Applied Sci-ence Associates 1991

[64] T K Landauer P W Foltz and D Laham ldquoAn introduction tolatent semantic analysisrdquoDiscourse Processes vol 25 no 2-3 pp259ndash284 1998

[65] H Schmid ldquoImprovements in Part-of-Speech Tagging with anApplication to Germanrdquo in Proceedings of the ACL SIGDAT-Workshop pp 13ndash25 Dublin Ireland 1995

[66] R H Baayen R Piepenbrock and L Gulikers The CELEXLexical Database (CD-ROM) Linguistic Data ConsortiumUniversity of Pennsylvania Philadelphia PA 1995

[67] J Mitchell and M Lapata ldquoVector-based models of semanticcompositionrdquo in Proceedings of the ACL pp 236ndash244 2008

[68] A Lazaridou M Marelli R Zamparelli and M BaronildquoCompositionally derived representations of morphologicallycomplex words in distributional semanticsrdquo in Proceedings ofthe ACL (1) pp 1517ndash1526 2013

[69] R Cotterell H Schutze and J Eisner ldquoMorphological smooth-ing and extrapolation of word embeddingsrdquo in Proceedings ofthe 54th Annual Meeting of the Association for ComputationalLinguistics vol 1 pp 1651ndash1660 2016

[70] B D Zeller S Pado and J Snajder ldquoTowards semantic valida-tion of a derivational lexiconrdquo in Proceedings of the COLING2014 the 25th International Conference on Computational Lin-guistics Technical Papers pp 1728ndash1739 2014

[71] H Marchand The Categories and Types of Present-Day EnglishWord Formation A Synchronic-Diachronic Approach BeckrsquoscheVerlagsbuchhandlung Munchen Germany 1969

[72] L Bauer EnglishWord-Formation Cambridge University PressCambridge 1983

Complexity 37

[73] I PlagWord-Formation in English CambridgeUniversity PressCambridge 2003

[74] R Beard Lexeme-Morpheme Base Morphology A GeneralTheory of Inflection and Word Formation State University ofNew York Press Albany NY USA 1995

[75] K Geeraert J Newman and R H Baayen ldquoIdiom VariationExperimental Data and a Blueprint of a Computational ModelrdquoTopics in Cognitive Science vol 9 no 3 pp 653ndash669 2017

[76] C Shaoul S Bitschau N Schilling et al ldquondl2 Naive discrim-inative learning an implementation in R R packagerdquo 2015

[77] P Milin D Divjak and R H Baayen ldquoA learning perspectiveon individual differences in skilled reading Exploring andexploiting orthographic and semantic discrimination cuesrdquoJournal of Experimental Psychology Learning Memory andCognition vol 43 no 11 pp 1730ndash1751 2017

[78] M Ramscar P Hendrix C Shaoul P Milin and R H BaayenldquoNonlinear dynamics of lifelong learning the myth of cognitivedeclinerdquo Topics in Cognitive Science vol 6 pp 5ndash42 2014

[79] M Ramscar C C Sun P Hendrix and R H Baayen ldquoTheMis-measurement of Mind Life-Span Changes in Paired-Associate-Learning Scores Reflect the ldquoCostrdquo of Learning Not CognitiveDeclinerdquo Psychological Science vol 28 no 8 pp 1171ndash1179 2017

[80] G desRosiers and D Ivison ldquoPaired associate learning Form 1and form 2 of the Wechsler memory scalerdquo Archives of ClinicalNeuropsychology vol 3 no 1 pp 47ndash67 1988

[81] E Bruni N K Tran andM Baroni ldquoMultimodal distributionalsemanticsrdquo Journal of Artificial Intelligence Research vol 49 pp1ndash47 2014

[82] W N Venables and B D RipleyModern Applied Statistics withS-PLUS Springer NY USA 2002

[83] A B Warriner V Kuperman and M Brysbaert ldquoNorms ofvalence arousal and dominance for 13915 English lemmasrdquoBehavior Research Methods vol 45 no 4 pp 1191ndash1207 2013

[84] D Kastovsky ldquoProductivity in word formationrdquo Linguistics vol24 no 3 pp 585ndash600 1986

[85] R H Baayen Y Chuang and J P Blevins ldquoInflectionalmorphology with linear mappingsrdquoTheMental Lexicon vol 13no 2 pp 232ndash270 2018

[86] Burnage CELEX A Guide for Users Centre for Lexical Infor-mation Nijmegen The Netherlands 1988

[87] C McBride-Chang ldquoModels of Speech Perception and Phono-logical Processing in Readingrdquo Child Development vol 67 no4 pp 1836ndash1856 1996

[88] G C Van Orden H Kloos et al ldquoThe question of phonologyand readingrdquo inThe Science of Reading A Handbook pp 61ndash782005

[89] D Jared and K OrsquoDonnell ldquoSkilled adult readers activate themeanings of high-frequency words using phonology Evidencefrom eye trackingrdquoMemoryampCognition vol 45 no 2 pp 334ndash346 2017

[90] G S Dell ldquoA Spreading-Activation Theory of Retrieval inSentence Productionrdquo Psychological Review vol 93 no 3 pp283ndash321 1986

[91] A Marantz ldquoNo escape from morphemes in morphologicalprocessingrdquo Language and Cognitive Processes vol 28 no 7 pp905ndash916 2013

[92] M Bozic W D Marslen-Wilson E A Stamatakis M HDavis and L K Tyler ldquoDifferentiating morphology formand meaning Neural correlates of morphological complexityrdquoCognitive Neuroscience vol 19 no 9 pp 1464ndash1475 2007

[93] R F Port andA P Leary ldquoAgainst formal phonologyrdquoLanguagevol 81 no 4 pp 927ndash964 2005

[94] J P Blevins ldquoStems and paradigmsrdquo Language vol 79 no 4 pp737ndash767 2003

[95] R Forsyth and D Holmes ldquoFeature-finding for test classifica-tionrdquo Literary and Linguistic Computing vol 11 no 4 pp 163ndash174 1996

[96] H Pham and R H Baayen ldquoVietnamese compounds showan anti-frequency effect in visual lexical decisionrdquo LanguageCognition and Neuroscience vol 30 no 9 pp 1077ndash1095 2015

[97] L Cohen and S Dehaene ldquoVentral and dorsal contributions toword readingrdquo inTheCognitive Neurosciences M S GazzanigaEd pp 789ndash804 Massachusetts Institute of Technology 2009

[98] E Keuleers P Lacey K Rastle and M Brysbaert ldquoThe BritishLexicon Project Lexical decision data for 28730 monosyllabicand disyllabic English wordsrdquo Behavior Research Methods vol44 no 1 pp 287ndash304 2012

[99] R K Wagner J K Torgesen and C A Rashotte ldquoDevel-opment of reading-related phonological processing abilitiesnew evidence of bidirectional causality from a latent variablelongitudinal studyrdquoDevelopmental Psychology vol 30 no 1 pp73ndash87 1994

[100] K F EWong andH-C Chen ldquoOrthographic and phonologicalprocessing in reading Chinese text Evidence from eye fixa-tionsrdquo Language and Cognitive Processes vol 14 no 5-6 pp461ndash480 1999

[101] R L Newman D Jared and C A Haigh ldquoDoes phonologyplay a role when skilled readers read high-frequency wordsEvidence from ERPsrdquo Language and Cognitive Processes vol 27no 9 pp 1361ndash1384 2012

[102] D Jared J Ashby S J Agauas and B A Levy ldquoPhonologicalactivation of word meanings in grade 5 readersrdquo Journal ofExperimental Psychology LearningMemory andCognition vol42 no 4 pp 524ndash541 2016

[103] T Bitan A Kaftory A Meiri-Leib Z Eviatar and O PelegldquoPhonological ambiguity modulates resolution of semanticambiguity during reading An fMRI study of Hebrewrdquo Neu-ropsychology vol 31 no 7 pp 759ndash777 2017

[104] D Jared and S Bainbridge ldquoReading homophone puns Evi-dence from eye trackingrdquo Canadian Journal of ExperimentalPsychology vol 71 no 1 pp 2ndash13 2017

[105] S Amenta M Marelli and S Sulpizio ldquoFrom sound tomeaning Phonology-to-Semantics mapping in visual wordrecognitionrdquo Psychonomic Bulletin amp Review vol 24 no 3 pp887ndash893 2017

[106] M Perrone-Bertolotti J Kujala J R Vidal et al ldquoHow silent issilent reading intracerebral evidence for top-down activationof temporal voice areas during readingrdquo The Journal of Neuro-science vol 32 no 49 pp 17554ndash17562 2012

[107] B Yao P Belin andC Scheepers ldquoSilent reading of direct versusindirect speech activates voice-selective areas in the auditorycortexrdquo Journal of Cognitive Neuroscience vol 23 no 10 pp3146ndash3152 2011

[108] M Coltheart B Curtis P Atkins and M Haller ldquoMod-els of Reading Aloud Dual-Route and Parallel-Distributed-Processing Approachesrdquo Psychological Review vol 100 no 4pp 589ndash608 1993

[109] M Coltheart ldquoModeling reading The dual-route approachrdquo inThe Science of Reading A Handbook pp 6ndash23 2005

38 Complexity

[110] M Pitt K Johnson E Hume S Kiesling and W RaymondldquoThe Buckeye corpus of conversational speech Labeling con-ventions and a test of transcriber reliabilityrdquo Speech Communi-cation vol 45 no 1 pp 89ndash95 2005

[111] K Johnson ldquoMassive reduction in conversational AmericanEnglishrdquo in Proceedings of the Spontaneous Speech Data andAnalysis Proceedings of the 1st Session of the 10th InternationalSymposium pp 29ndash54 The National International Institute forJapanese Language Tokyo Japan 2004

[112] C Phillips ldquoLevels of representation in the electrophysiology ofspeech perceptionrdquo Cognitive Science vol 25 no 5 pp 711ndash7312001

[113] R L Diehl A J Lotto and L L Holt ldquoSpeech perceptionrdquoAnnual Review of Psychology vol 55 pp 149ndash179 2004

[114] D Norris and J M McQueen ldquoShortlist B a Bayesian model ofcontinuous speech recognitionrdquo Psychological Review vol 115no 2 pp 357ndash395 2008

[115] S Hawkins ldquoRoles and representations of systematic finephonetic detail in speech understandingrdquo Journal of Phoneticsvol 31 no 3-4 pp 373ndash405 2003

[116] CCucchiarini andH Strik ldquoAutomatic Phonetic TranscriptionAn overviewrdquo in Proceedings of the 15th ICPhS pp 347ndash350Barcelona Spain 2003

[117] K JohnsonTheAuditoryPerceptual Basis for Speech Segmenta-tion Ohio State University Working Papers in Linguistics 1997

[118] H Fletcher ldquoAuditory patternsrdquo Reviews of Modern Physics vol12 no 1 pp 47ndash65 1940

[119] D Arnold ldquoAcousticndlcoder Coding sound files for use withndl R package version 101rdquo 2017

[120] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2016

[121] J Pickett and I Pollack ldquoIntelligibility of excerpts from fluentspeech Effects of rate of utterance and duration of excerptrdquoLanguage and Speech vol 6 pp 151ndash164 1963

[122] L Shockey ldquoPerception of reduced forms by non-native speak-ers of Englishrdquo in Sound Patterns of Spontaneous Speech DDuez Ed pp 97ndash100 ESCA Aix 1998

[123] M Ernestus H Baayen and R Schreuder ldquoThe recognition ofreduced word formsrdquo Brain and Language vol 81 no 1-3 pp162ndash173 2002

[124] R H Baayen C Shaoul J Willits and M Ramscar ldquoCom-prehension without segmentation a proof of concept withnaive discriminative learningrdquo Language Cognition and Neu-roscience vol 31 no 1 pp 106ndash128 2016

[125] C P Browman and L Goldstein ldquoArticulatory phonology Anoverviewrdquo Phonetica vol 49 no 3-4 pp 155ndash180 1992

[126] J A Tourville and F H Guenther ldquoThe DIVA model A neuraltheory of speech acquisition and productionrdquo Language andCognitive Processes vol 26 no 7 pp 952ndash981 2011

[127] G Csardi and T Nepusz ldquoThe igraph software package forcomplex network researchrdquo InterJournal Complex Systems vol1695 no 5 pp 1ndash9 2006

[128] M Ernestus and R H Baayen ldquoPredicting the unpredictableInterpreting neutralized segments in Dutchrdquo Language vol 79no 1 pp 5ndash38 2003

[129] R Weingarten G Nottbusch and U Will ldquoMorphemes sylla-bles and graphemes in written word productionrdquo inMultidisci-plinary Approaches to Speech Production T Pechmann and CHabel Eds pp 529ndash572 Mouton de Gruyter Berlin Germany2004

[130] R Bertram F E Toslashnnessen S Stromqvist J Hyona andP Niemi ldquoCascaded processing in written compound wordproductionrdquo Frontiers in Human Neuroscience vol 9 pp 1ndash102015

[131] T Cho ldquoEffects of morpheme boundaries on intergesturaltiming Evidence from Koreanrdquo Phonetica vol 58 no 3 pp129ndash162 2001

[132] S N Wood Generalized Additive Models Chapman ampHallCRC NY USA 2017

[133] M Seidenberg ldquoSublexical structures in visual word recogni-tion Access units or orthographic redundancyrdquo in Attentionand Performance XII M Coltheart Ed pp 245ndash264 LawrenceErlbaum Associates Hove 1987

[134] J M McQueen ldquoSegmentation of continuous speech usingphonotacticsrdquo Journal of Memory and Language vol 39 no 1pp 21ndash46 1998

[135] J B Hay Causes and Consequences of Word Structure Rout-ledge NY USA London UK 2003

[136] J B Hay ldquoFrom speech perception to morphology Affix-ordering revisitedrdquo Language vol 78 pp 527ndash555 2002

[137] J B Hay and R H Baayen ldquoPhonotactics parsing and produc-tivityrdquo Italian Journal of Linguistics vol 15 no 1 pp 99ndash1302003

[138] J Hay J Pierrehumbert and M Beckman ldquoSpeech perceptionwell-formedness and the statistics of the lexiconrdquo Papers inlaboratory phonology VI pp 58ndash74 2004

[139] M Ferro C Marzi and V Pirrelli ldquoA self-organizing modelof word storage and processing implications for morphologylearningrdquo Lingue e Linguaggio vol 10 no 2 pp 209ndash226 2011

[140] F ChersiM Ferro G Pezzulo andV Pirrelli ldquoTopological self-organization and prediction learning support both action andlexical chains in the brainrdquo Topics in Cognitive Science vol 6no 3 pp 476ndash491 2014

[141] V Pirrelli M Ferro and C Marzi ldquoComputational complexityof abstractive morphologyrdquo in Understanding and MeasuringMorphological Complexity M Bearman D Brown and G GCorbett Eds pp 141ndash166 Oxford University Press 2015

[142] C Marzi M Ferro and V Pirrelli ldquoIs inflectional irregularitydysfunctional to human processingrdquo in Abstract Booklet TheMental Lexicon 2018 V Kuperman Ed University of AlbertaEdmonton 2018

[143] E Keuleers M Stevens P Mandera and M Brysbaert ldquoWordknowledge in the crowd Measuring vocabulary size and wordprevalence in a massive online experimentrdquo The QuarterlyJournal of Experimental Psychology vol 68 no 8 pp 1665ndash16922015

[144] M Ernestus Voice Assimilation And Segment Reduction inCasual Dutch A Corpus-Based Study of the Phonology-PhoneticsInterface LOT Utrecht the Netherlands 2000

[145] R Kemps M Ernestus R Schreuder and H Baayen ldquoProcess-ing reducedword formsThe suffix restoration effectrdquoBrain andLanguage vol 90 no 1-3 pp 117ndash127 2004

[146] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) vol 1 pp 886ndash893 June 2005

[147] G Hickok ldquoThe architecture of speech production and the roleof the phoneme in speech processingrdquo Language Cognition andNeuroscience vol 29 no 1 pp 2ndash20 2014

[148] B V Tucker M Sims and R H Baayen Opposing Forceson Acoustic Duration Manuscript University of Alberta andUniversity of Tubingen 2018

Complexity 39

[149] I Plag J Homann and G Kunter ldquoHomophony and mor-phology The acoustics of word-final S in Englishrdquo Journal ofLinguistics vol 53 no 1 pp 181ndash216 2017

[150] F Tomaschek I Plag M Ernestus and R H BaayenModeling the Duration of Word-Final s in English withNaive Discriminative Learning Manuscript University ofSiegenTubingenNijmegen 2018

[151] R E Kalman ldquoA new approach to linear filtering and predictionproblemsrdquo Journal of Fluids Engineering vol 82 no 1 pp 35ndash451960

[152] B Russell ldquoOn denotingrdquo Mind vol 14 no 56 pp 479ndash4931905

[153] N Hornstein Logical Form FromGB toMinimalism Blackwell1995

[154] R Jackendoff Semantic StructuresMITPress Cambridge 1990[155] J J McCarthy ldquoA prosodic theory of non-concatenative mor-

phologyrdquo Linguistic Inquiry vol 12 pp 373ndash418 1981[156] A Ussishkin ldquoA Fixed Prosodic Theory of Nonconcatenative

Templaticmorphologyrdquo Natural Language amp Linguistic Theoryvol 23 no 1 pp 169ndash218 2005

[157] A Ussishkin ldquoAffix-favored contrast inequity and psycholin-guistic grounding for non-concatenative morphologyrdquo Mor-phology vol 16 no 1 pp 107ndash125 2006

[158] RW Young andWMorganTheNavajo Language A Grammarand Colloquial Dictionary University of New Mexico Press1980

[159] S C Levinson and A Majid ldquoThe island of time Yelı Dnye thelanguage of Rossel islandrdquo Frontiers in psychology vol 4 2013

[160] S C Levinson ldquoThe language of space in Yelı Dnyerdquo inGrammars of Space Explorations in Cognitive Diversity pp 157ndash203 Cambridge University Press 2006

[161] E Shafaei-Bajestan and R H Baayen ldquoWide Learning forAuditory Comprehensionrdquo in Proceedings of the Interspeech2018 2018

[162] J Hay and K Drager ldquoStuffed toys and speech perceptionrdquoLinguistics vol 48 no 4 pp 865ndash892 2010

[163] V Henri and C Henri ldquoOn our earliest recollections ofchildhoodrdquo Psychological Review vol 2 pp 215-216 1895

[164] S Freud ldquoChildhood and concealing memoriesrdquo in The BasicWritings of Sigmund Freud The Modern Library NY USA19051953

[165] P J Bauer and M Larkina ldquoThe onset of childhood amnesiain childhood A prospective investigation of the course anddeterminants of forgetting of early-life eventsrdquoMemory vol 22no 8 pp 907ndash924 2014

[166] M S Seidenberg and L M Gonnerman ldquoExplaining deriva-tional morphology as the convergence of codesrdquo Trends inCognitive Sciences vol 4 no 9 pp 353ndash361 2000

[167] M Coltheart K Rastle C Perry R Langdon and J ZieglerldquoDRC a dual route cascaded model of visual word recognitionand reading aloudrdquoPsychological Review vol 108 no 1 pp 204ndash256 2001

[168] M V Butz and E F Kutter How the Mind Comes into BeingIntroducing Cognitive Science from a Functional and Computa-tional Perspective Oxford University Press 2016

[169] M W Harm and M S Seidenberg ldquoPhonology readingacquisition and dyslexia Insights from connectionist modelsrdquoPsychological Review vol 106 no 3 pp 491ndash528 1999

[170] L M Gonnerman M S Seidenberg and E S AndersenldquoGraded semantic and phonological similarity effects in prim-ing Evidence for a distributed connectionist approach to

morphologyrdquo Journal of Experimental Psychology General vol136 no 2 pp 323ndash345 2007

[171] H Zhao O Gallo I Frosio and J Kautz ldquoLoss Functions forImage Restoration With Neural Networksrdquo IEEE Transactionson Computational Imaging vol 3 no 1 pp 47ndash57 2017

[172] C Clarke and P Luce ldquoPerceptual adaptation to speakercharacteristics Vot boundaries in stop voicing categorizationrdquoin Proceedings of the ISCA Workshop on Plasticity in SpeechPerception 2005

[173] DNorris JMMcQueen andACutler ldquoPerceptual learning inspeechrdquo Cognitive Psychology vol 47 no 2 pp 204ndash238 2003

[174] A Schweitzer and N Lewandowski ldquoSocial factors in conver-gence of f1 and f2 in spontaneous speechrdquo in Proceedings ofthe 10th International Seminar on Speech Production CologneGermany 2014

[175] M J Pickering and S Garrod ldquoToward a mechanistic psychol-ogy of dialoguerdquo Behavioral and Brain Sciences vol 27 no 2 pp169ndash190 2004

[176] C M Clarke and M F Garrett ldquoRapid adaptation to foreign-accented Englishrdquo The Journal of the Acoustical Society ofAmerica vol 116 no 6 pp 3647ndash3658 2004

[177] W J Levelt ldquoDie konnektionistische moderdquo Sprache UndKognition vol 10 no 2 pp 61ndash72 1991

[178] A Kostic ldquoInformational load constraints on processinginflected morphologyrdquo in Morphological Aspects of LanguageProcessing L B Feldman Ed Lawrence Erlbaum Inc Publish-ers New Jersey USA 1995

[179] J P BlevinsWord andParadigmMorphology OxfordUniversityPress 2016

[180] E Forstemann Uber Deutsche volksetymologie Zeitschrift furvergleichende Sprachforschung auf dem Gebiete des DeutschenGriechischen und Lateinischen 1852

[181] K Nault Morphological Therapy Protocal [Ph D thesis] Uni-versity of Alberta Edmonton 2010

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 33: The Discriminative Lexicon: A Unified Computational Model ...downloads.hindawi.com/journals/complexity/2019/4895891.pdf · as developed within computational linguistics and com-puter

Complexity 33

The inverse of B is

Bminus1 = (13 2343 53) (A15)

and hence we have

(13 2343 53) ( 3 4minus2 minus3) = (minus13 minus2323 13 ) = G (A16)

The inverse of a matrix needs not exist A square matrixthat does not have an inverse is referred to as a singularmatrix Almost all matrices with which we work in theremainder of this study are singular For singular matricesan approximation of the inverse can be used such as theMoore-Penrose generalized inverse (In R the generalizedinverse is available in theMASS package function ginv Thenumeric library that is used by ginv is LAPACK availableat httpwwwnetliborglapack) In this study wedenote the generalized inverse of a matrix X by X1015840

The examples presented here are restricted to 2 times 2matrices but themathematics generalize tomatrices of largerdimensions When multiplying two matrices X and Y theonly constraint is that the number of columns ofX is the sameas the number of rows ofYThe resultingmatrix has the samenumber of rows as X and the same number of columns as Y

In the present study the rows of matrices represent wordsand columns the features of thesewords For instance the rowvectors of A can represent wordsrsquo form features and the rowvectors of and B the corresponding semantic features Thefirst row vector of A is mapped onto the first row vector ofB as follows (3 4) ( 1 2minus2 minus1) = (minus5 2) (A17)

A transformationmatrix such as F can be represented as atwo-layer networkThe network corresponding to F is shownin Figure 13 When the input vector (3 4) is presented to thenetwork the nodes 1198941 and 1198942 are set to 3 and 4 respectively Toobtain the activations 1199001 = minus5 and 1199002 = 2 for the output vector(minus5 2) we cumulate the evidence at the input weighted bythe connection strengths specified on the edges of the graphThus for 1199001 we obtain the value 3 times 1 + 4 times minus2 = minus5 and for1199002 we have 3 times 2 + 4 times minus1 = 2 In other words the networkmaps the input vector (3 4) onto the output vector (minus5 2)

An important property of linear maps is that they areproductive We can present a novel form vector say

s = (2 2) (A18)

to the network and it will map this vector onto a novelsemantic vector Using matrices this new semantic vector isstraightforwardly calculated(2 2) ( 1 2minus2 minus1) = (minus2 2) (A19)

1 2 minus2 minus1

i1 i2

o1 o2

semantic vectors come out here

form features go in here

connection weightsare the transformationmatrix

Figure 13 The linear network corresponding to the transformationmatrix F There are two input nodes 1198941 and 1198942 (in blue)and twooutput nodes 1199001 and 1199002 in red When the input vector (3 4) ispresented to the network the value at 1199001 is 3 times 1 + 4 times minus2 = minus5and that at 1199002 is 3 times 2 + 4 times minus1 = 2 The network maps the point(3 4) onto the point (minus5 2)B Graph-Based Triphone Sequencing

The hypothesis underlying graph-based triphone sequencingis that in the directed graph that has triphones as verticesand an edge between any two vertices that overlap (iesegments 2 and 3 of triphone 1 are identical to segments 1and 2 of triphone 2) the path from a wordrsquos initial triphone(the triphone starting with ) to a wordrsquos final triphone (thetriphone ending with a ) is the path receiving the bestsupport of all possible paths from the initial to the finaltriphone Unfortunately the directed graph containing allvertices and edges is too large tomake computations tractablein a reasonable amount of computation time The followingalgorithm is a heuristic algorithm that first collects potentiallyrelevant vertices and edges and then calculates all pathsstarting fromany initial vertex using theall simple pathsfunction from the igraph package [127] As a first step allvertices that are supported by the stem and whole wordwith network support exceeding 01 are selected To theresulting set those vertices are added that are supported bythe affix which is conditional on these vertices having anetwork support exceeding 095 A directed graph can nowbe constructed and for each initial vertex all paths startingat these vertices are calculated The subset of those pathsreaching a final vertex is selected and the support for eachpath is calculated As longer paths trivially will have greatersupport the support for a path is weighted for its lengthsimply by dividing the raw support by the path length Pathsare ranked by weighted path support and the path withmaximal support is selected as the acoustic image drivingarticulation

It is possible that no path from an initial vertex to a finalvertex is found due to critical boundary triphones not beinginstantiated in the data on which the model was trainedThis happens when the model is assessed on novel inflected

34 Complexity

nobo

dyso

met

hing

ever

ythi

ngno

thin

gev

eryo

nean

ythi

ngan

yone

som

eone

hers

him his

via

than I

exce

ptou

tside

thei

rsw

hoev

erth

ence

min

ein

side

amon

gst its

thei

rou

rm

yyo

uyo

ur her

she

your

selv

esw

hate

ver us we it

they he

them m

em

yself

him

self

hers

elfou

rselv

esyo

urse

lfits

elfth

emse

lves

abou

tw

ithou

tw

ithin

beyo

ndam

ong

betw

een

thro

ugho

utdu

ring

since by

from in of

with at fo

r tobe

low

abov

e up offdo

wn

upon on

agai

nst

tow

ard

tow

ards

arou

ndne

aral

ong

unde

rbe

hind

besid

eth

roug

hon

toin

toov

erac

ross

nobodysomethingeverythingnothingeveryoneanythinganyonesomeonehershimhisviathanIexceptoutsidetheirswhoeverthencemineinsideamongstitstheirourmyyouyourhersheyourselveswhateverusweittheyhethemmemyselfhimselfherselfourselvesyourselfitselfthemselvesaboutwithoutwithinbeyondamongbetweenthroughoutduringsincebyfrominofwithatfortobelowaboveupoffdownupononagainsttowardtowardsaroundnearalongunderbehindbesidethroughontointooveracross

minus01 0 01 02Value

Color Keyand Histogram

050

015

0025

00C

ount

Figure 14 Heatmap for the correlations of the semantic vectors of pronouns and prepositions Note that quantifier pronouns form a groupof their own that prepositions and the other pronouns form distinguisable groups and that within groups further subclusters are visible egforwe and us she and her and across and over All diagonal elements represent correlations equal to 1 (this is not brought out by the heatmapwhich color-codes diagonal elements with the color for the highest correlation of the range we specified 025)

and derived words under cross-validationTherefore verticesand edges are added to the graph for all nonfinal verticesthat are at the end of any of the paths starting at any initialtriphone Such novel vertices and edges are assigned zeronetwork support Typically paths from the initial vertex tothe final vertex can now be constructed and path support isevaluated as above

For an algorithm that allows triphone paths to includecycles which may be the case in languages with much richer

morphology than English see Baayen et al [85] and for codethe R packageWpmWithLdl described therein

C A Heatmap for Function Words(Pronouns and Prepositions)

The heatmap for the correlations of the semantic vectors ofpronouns and prepositions is presented in Figure 14

Complexity 35

Data Availability

With the exception of the data from theDistributed Little RedHen Lab all primary data are publicly available from the citedsources

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The authors are indebted to Geoff Hollis Jessie Nixon ChrisWestbury and LuisMienhardt for their constructive feedbackon earlier versions of this manuscript This research wassupported by an ERC advancedGrant (no 742545) to the firstauthor

References

[1] G Frege ldquoBegriffsschrift a formula language modeled uponthat of arithmetic for pure thoughtrdquo in From Frege to GodelA Source Book in Mathematical Logic 1879-1931 pp 1ndash82 1967

[2] B RussellAn Inquiry intoMeaning andTruth Allen andUnwinLondon UK 1942

[3] R Montague ldquoThe proper treatment of quantification in ordi-nary englishrdquo in Approaches to Natural Language pp 221ndash242Springer 1973

[4] N Chomsky Syntactic Structures vol 33 Mouton The Hague1957

[5] W J M Levelt Formal Grammars in Linguistics And Psycholin-guistics Volume 1 An Introduction to the Theory of FormalLanguages and Automata Volume 2 Applications in LinguisticTheory Volume 3 Psycholinguistic Applications John BenjaminsPublishing 2008

[6] D F Kleinschmidt and T Florian Jaeger ldquoRobust speechperception Recognize the familiar generalize to the similar andadapt to the novelrdquo Psychological Review vol 122 no 2 pp 148ndash203 2015

[7] L Breiman ldquoRandom forestsrdquo Machine Learning vol 45 no 1pp 5ndash32 2001

[8] C Strobl A-L Boulesteix A Zeileis and T Hothorn ldquoBiasin random forest variable importance measures illustrationssources and a solutionrdquo BMC Bioinformatics vol 8 2007

[9] A Hannun C Case J Casper et al ldquoDeep speechScaling up end-to-end speech recognitionrdquo 2014httpsarxivorgabs14125567

[10] Z Tuske P Golik R Schluter and H Ney ldquoAcoustic modelingwith deep neural networks using raw time signal for lvcsrrdquo inProceedings of the INTERSPEECH pp 890ndash894 2014

[11] J Schmidhuber ldquoDeep learning in neural networks anoverviewrdquo Neural Networks vol 61 pp 85ndash117 2015

[12] T Hannagan J C Ziegler S Dufau J Fagot and J GraingerldquoDeep learning of orthographic representations in baboonsrdquoPLoS ONE vol 9 no 1 2014

[13] J Grainger S Dufau M Montant J C Ziegler and J FagotldquoOrthographic processing in baboons (Papio papio)rdquo Sciencevol 336 no 6078 pp 245ndash248 2012

[14] D Scarf K Boy A U Reinert J Devine O Gunturkun andM Colombo ldquoOrthographic processing in pigeons (columba

livia)rdquo Proceedings of the National Academy of Sciences vol 113no 40 pp 11272ndash11276 2016

[15] M Linke F Broker M Ramscar and R H Baayen ldquoArebaboons learning ldquoorthographicrdquo representations Probablynotrdquo PLoS ONE vol 12 no 8 2017

[16] D H Hubel and T N Wiesel ldquoReceptive fields binocularinteraction and functional architecture in the catrsquos visualcortexrdquo The Journal of Physiology vol 160 pp 106ndash154 1962

[17] R H Baayen and P Hendrix ldquoTwo-layer networks non-linear separation and human learningrdquo in From Semantics toDialectometry Festschrift in honor of JohnNerbonne Tributes 32M Wieling M Kroon G Van Noord and G Bouma Eds pp13ndash22 College Publications 2017

[18] D Arnold F Tomaschek K Sering F Lopez and R H BaayenldquoWords from spontaneous conversational speech can be rec-ognized with human-like accuracy by an error-driven learningalgorithm that discriminates between meanings straight fromsmart acoustic features bypassing the phoneme as recognitionunitrdquo PLoS ONE vol 12 no 4 2017

[19] P Zwitserlood ldquoProcessing and representation of morphologi-cal complexity in native language comprehension and produc-tionrdquo in The Construction of Words G E Booij Ed pp 583ndash602 Springer 2018

[20] G Booij ldquoConstruction morphologyrdquo Language and LinguisticsCompass vol 4 no 7 pp 543ndash555 2010

[21] J P Blevins ldquoWord-based morphologyrdquo Journal of Linguisticsvol 42 no 3 pp 531ndash573 2006

[22] G Stump Inflectional Morphology A Theory of ParadigmStructure Cambridge University Press 2001

[23] R Beard ldquoOn the extent and nature of irregularity in thelexiconrdquo Lingua vol 42 no 4 pp 305ndash341 1977

[24] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1974

[25] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1991

[26] C Hockett ldquoThe origin of speechrdquo Scientific American vol 203pp 89ndash97 1960

[27] M Erelt Estonian Language M Erelt Ed Estonian AcademyPublishers Tallinn Estonia 2003

[28] P H Matthews Grammatical Theory in the United States fromBloomfield to Chomsky Cambridge University Press Cam-bridge 1993

[29] N Chomsky andMHalleTheSound Pattern of English Harperand Row NY USA 1968

[30] T Landauer and S Dumais ldquoA solution to Platorsquos problem Thelatent semantic analysis theory of acquisition induction andrepresentation of knowledgerdquo Psychological Review vol 104 no2 pp 211ndash240 1997

[31] W Weaver ldquoTranslationrdquo in Machine Translation of LanguagesFourteen Essays W N Locke and A D Booth Eds pp 15ndash23MIT Press Cambridge 1955

[32] J R Firth Selected Papers of J R Firth 1952-59 IndianaUniversity Press 1968

[33] K Lund and C Burgess ldquoProducing high-dimensional seman-tic spaces from lexical co-occurrencerdquo Behavior Research Meth-ods Instruments and Computers vol 28 no 2 pp 203ndash2081996

[34] C Shaoul and C Westbury ldquoExploring lexical co-occurrencespace using HiDExrdquo Behavior Research Methods vol 42 no 2pp 393ndash413 2010

36 Complexity

[35] T Mikolov I Sutskever K Chen G Corrado and J DeanldquoDistributed representations of words and phrases and theircompositionalityrdquo inAdvances in Neural Information ProcessingSystems pp 3111ndash3119 2013

[36] A Roelofs ldquoThe WEAVER model of word-form encoding inspeech productionrdquo Cognition vol 64 no 3 pp 249ndash284 1997

[37] W J M Levelt A Roelofs and A S Meyer ldquoA theory of lexicalaccess in speech productionrdquoBehavioral and Brain Sciences vol22 no 1 pp 1ndash38 1999

[38] M Taft ldquoInteractive-activation as a Framework for Under-standing Morphological Processingrdquo Language and CognitiveProcesses vol 9 no 3 pp 271ndash294 1994

[39] R Schreuder and R H Baayen ldquoModeling morphologicalprocessingrdquo inMorphological Aspects of Language Processing LB Feldman Ed pp 131ndash154 Lawrence ErlbaumHillsdale NewJersey USA 1995

[40] B C Love D L Medin and T M Gureckis ldquoSUSTAIN ANetwork Model of Category Learningrdquo Psychological Reviewvol 111 no 2 pp 309ndash332 2004

[41] C J Marsolek ldquoWhat antipriming reveals about primingrdquoTrends in Cognitive Sciences vol 12 no 5 pp 176ndash181 2008

[42] M Ramscar and R Port ldquoCategorization (without categories)rdquoin Handbook of Cognitive Linguistics E Dabrowska and DDivjak Eds pp 75ndash99 De Gruyter Berlin Germany 2015

[43] R A Rescorla and A R Wagner ldquoA theory of Pavlovianconditioning Variations in the effectiveness of reinforcementand nonreinforcementrdquo in Classical Conditioning II CurrentResearch and Theory A H Black and W F Prokasy Eds pp64ndash99 Appleton Century Crofts NY USA 1972

[44] B Widrow and M E Hoff ldquoAdaptive switching circuitsrdquo 1960WESCON Convention Record Part IV pp 96ndash104 1960

[45] R H Baayen R Schreuder and R Sproat ldquoMorphology inthe Mental Lexicon A Computational Model for Visual WordRecognitionrdquo in Lexicon Development for Speech and LanguageProcessing F van Eynde and D Gibbon Eds pp 267ndash291Kluwer Academic Publishers 2000

[46] M Taft ldquoA morphological-decomposition model of lexicalrepresentationrdquo Linguistics vol 26 no 4 pp 657ndash667 1988

[47] MWHarmandM S Seidenberg ldquoComputing themeanings ofwords in reading Cooperative division of labor between visualand phonological processesrdquo Psychological Review vol 111 no3 pp 662ndash720 2004

[48] P C Trimmer J M McNamara A Houston and J A RMarshall ldquoDoes natural selection favour the Rescorla-Wagnerrulerdquo Journal of Theoretical Biology vol 302 pp 39ndash52 2012

[49] R R Miller R C Barnet and N J Grahame ldquoAssessment ofthe Rescorla-Wagner modelrdquo Psychological Bulletin vol 117 no3 pp 363ndash386 1995

[50] L J Kamin ldquoPredictability surprise attention and condition-ingrdquo in Punishment and Aversive Behavior B A Campbell andRMChurch Eds pp 276ndash296Appleton-Century-CroftsNYUSA 1969

[51] R A Rescorla ldquoPavlovian conditioning Itrsquos not what you thinkit isrdquo American Psychologist vol 43 no 3 pp 151ndash160 1988

[52] M Ramscar D Yarlett M Dye K Denny and K ThorpeldquoThe Effects of Feature-Label-Order and Their Implications forSymbolic Learningrdquo Cognitive Science vol 34 no 6 pp 909ndash957 2010

[53] M Ramscar and D Yarlett ldquoLinguistic self-correction in theabsence of feedback A new approach to the logical problem oflanguage acquisitionrdquo Cognitive Science vol 31 no 6 pp 927ndash960 2007

[54] R H Baayen P Milin D F ETHurđevic P Hendrix and MMarelli ldquoAn Amorphous Model for Morphological Process-ing in Visual Comprehension Based on Naive DiscriminativeLearningrdquo Psychological Review vol 118 no 3 pp 438ndash4812011

[55] F M Del Prado Martın R Bertram T Haikio R Schreuderand R H Baayen ldquoMorphological family size in a morphologi-cally rich languageThe case of Finnish compared to Dutch andHebrewrdquo Journal of Experimental Psychology LearningMemoryand Cognition vol 30 pp 1271ndash1278 2004

[56] P Milin D F Durdevic and F M Del Prado Martın ldquoThesimultaneous effects of inflectional paradigms and classes onlexical recognition Evidence from Serbianrdquo Journal of Memoryand Language vol 60 no 1 pp 50ndash64 2009

[57] B K Bergen ldquoThe psychological reality of phonaesthemesrdquoLanguage vol 80 no 2 pp 290ndash311 2004

[58] P Milin L B Feldman M Ramscar P Hendrix and R HBaayen ldquoDiscrimination in lexical decisionrdquo PLoS ONE vol 12no 2 2017

[59] R H Baayen P Milin and M Ramscar ldquoFrequency in lexicalprocessingrdquo Aphasiology vol 30 no 11 pp 1174ndash1220 2016

[60] M Marelli and M Baroni ldquoAffixation in semantic space Mod-eling morpheme meanings with compositional distributionalsemanticsrdquo Psychological Review vol 122 no 3 pp 485ndash5152015

[61] T Sering PMilin andRH Baayen ldquoLanguage comprehensionas amultiple label classification problemrdquo StatisticaNeerlandicapp 1ndash15 2018

[62] R Kaye and RWilson Linear Algebra Oxford University Press1998

[63] S H Ivens and B L Koslin Demands for Reading LiteracyRequire New Accountability Methods Touchstone Applied Sci-ence Associates 1991

[64] T K Landauer P W Foltz and D Laham ldquoAn introduction tolatent semantic analysisrdquoDiscourse Processes vol 25 no 2-3 pp259ndash284 1998

[65] H Schmid ldquoImprovements in Part-of-Speech Tagging with anApplication to Germanrdquo in Proceedings of the ACL SIGDAT-Workshop pp 13ndash25 Dublin Ireland 1995

[66] R H Baayen R Piepenbrock and L Gulikers The CELEXLexical Database (CD-ROM) Linguistic Data ConsortiumUniversity of Pennsylvania Philadelphia PA 1995

[67] J Mitchell and M Lapata ldquoVector-based models of semanticcompositionrdquo in Proceedings of the ACL pp 236ndash244 2008

[68] A Lazaridou M Marelli R Zamparelli and M BaronildquoCompositionally derived representations of morphologicallycomplex words in distributional semanticsrdquo in Proceedings ofthe ACL (1) pp 1517ndash1526 2013

[69] R Cotterell H Schutze and J Eisner ldquoMorphological smooth-ing and extrapolation of word embeddingsrdquo in Proceedings ofthe 54th Annual Meeting of the Association for ComputationalLinguistics vol 1 pp 1651ndash1660 2016

[70] B D Zeller S Pado and J Snajder ldquoTowards semantic valida-tion of a derivational lexiconrdquo in Proceedings of the COLING2014 the 25th International Conference on Computational Lin-guistics Technical Papers pp 1728ndash1739 2014

[71] H Marchand The Categories and Types of Present-Day EnglishWord Formation A Synchronic-Diachronic Approach BeckrsquoscheVerlagsbuchhandlung Munchen Germany 1969

[72] L Bauer EnglishWord-Formation Cambridge University PressCambridge 1983

Complexity 37

[73] I PlagWord-Formation in English CambridgeUniversity PressCambridge 2003

[74] R Beard Lexeme-Morpheme Base Morphology A GeneralTheory of Inflection and Word Formation State University ofNew York Press Albany NY USA 1995

[75] K Geeraert J Newman and R H Baayen ldquoIdiom VariationExperimental Data and a Blueprint of a Computational ModelrdquoTopics in Cognitive Science vol 9 no 3 pp 653ndash669 2017

[76] C Shaoul S Bitschau N Schilling et al ldquondl2 Naive discrim-inative learning an implementation in R R packagerdquo 2015

[77] P Milin D Divjak and R H Baayen ldquoA learning perspectiveon individual differences in skilled reading Exploring andexploiting orthographic and semantic discrimination cuesrdquoJournal of Experimental Psychology Learning Memory andCognition vol 43 no 11 pp 1730ndash1751 2017

[78] M Ramscar P Hendrix C Shaoul P Milin and R H BaayenldquoNonlinear dynamics of lifelong learning the myth of cognitivedeclinerdquo Topics in Cognitive Science vol 6 pp 5ndash42 2014

[79] M Ramscar C C Sun P Hendrix and R H Baayen ldquoTheMis-measurement of Mind Life-Span Changes in Paired-Associate-Learning Scores Reflect the ldquoCostrdquo of Learning Not CognitiveDeclinerdquo Psychological Science vol 28 no 8 pp 1171ndash1179 2017

[80] G desRosiers and D Ivison ldquoPaired associate learning Form 1and form 2 of the Wechsler memory scalerdquo Archives of ClinicalNeuropsychology vol 3 no 1 pp 47ndash67 1988

[81] E Bruni N K Tran andM Baroni ldquoMultimodal distributionalsemanticsrdquo Journal of Artificial Intelligence Research vol 49 pp1ndash47 2014

[82] W N Venables and B D RipleyModern Applied Statistics withS-PLUS Springer NY USA 2002

[83] A B Warriner V Kuperman and M Brysbaert ldquoNorms ofvalence arousal and dominance for 13915 English lemmasrdquoBehavior Research Methods vol 45 no 4 pp 1191ndash1207 2013

[84] D Kastovsky ldquoProductivity in word formationrdquo Linguistics vol24 no 3 pp 585ndash600 1986

[85] R H Baayen Y Chuang and J P Blevins ldquoInflectionalmorphology with linear mappingsrdquoTheMental Lexicon vol 13no 2 pp 232ndash270 2018

[86] Burnage CELEX A Guide for Users Centre for Lexical Infor-mation Nijmegen The Netherlands 1988

[87] C McBride-Chang ldquoModels of Speech Perception and Phono-logical Processing in Readingrdquo Child Development vol 67 no4 pp 1836ndash1856 1996

[88] G C Van Orden H Kloos et al ldquoThe question of phonologyand readingrdquo inThe Science of Reading A Handbook pp 61ndash782005

[89] D Jared and K OrsquoDonnell ldquoSkilled adult readers activate themeanings of high-frequency words using phonology Evidencefrom eye trackingrdquoMemoryampCognition vol 45 no 2 pp 334ndash346 2017

[90] G S Dell ldquoA Spreading-Activation Theory of Retrieval inSentence Productionrdquo Psychological Review vol 93 no 3 pp283ndash321 1986

[91] A Marantz ldquoNo escape from morphemes in morphologicalprocessingrdquo Language and Cognitive Processes vol 28 no 7 pp905ndash916 2013

[92] M Bozic W D Marslen-Wilson E A Stamatakis M HDavis and L K Tyler ldquoDifferentiating morphology formand meaning Neural correlates of morphological complexityrdquoCognitive Neuroscience vol 19 no 9 pp 1464ndash1475 2007

[93] R F Port andA P Leary ldquoAgainst formal phonologyrdquoLanguagevol 81 no 4 pp 927ndash964 2005

[94] J P Blevins ldquoStems and paradigmsrdquo Language vol 79 no 4 pp737ndash767 2003

[95] R Forsyth and D Holmes ldquoFeature-finding for test classifica-tionrdquo Literary and Linguistic Computing vol 11 no 4 pp 163ndash174 1996

[96] H Pham and R H Baayen ldquoVietnamese compounds showan anti-frequency effect in visual lexical decisionrdquo LanguageCognition and Neuroscience vol 30 no 9 pp 1077ndash1095 2015

[97] L Cohen and S Dehaene ldquoVentral and dorsal contributions toword readingrdquo inTheCognitive Neurosciences M S GazzanigaEd pp 789ndash804 Massachusetts Institute of Technology 2009

[98] E Keuleers P Lacey K Rastle and M Brysbaert ldquoThe BritishLexicon Project Lexical decision data for 28730 monosyllabicand disyllabic English wordsrdquo Behavior Research Methods vol44 no 1 pp 287ndash304 2012

[99] R K Wagner J K Torgesen and C A Rashotte ldquoDevel-opment of reading-related phonological processing abilitiesnew evidence of bidirectional causality from a latent variablelongitudinal studyrdquoDevelopmental Psychology vol 30 no 1 pp73ndash87 1994

[100] K F EWong andH-C Chen ldquoOrthographic and phonologicalprocessing in reading Chinese text Evidence from eye fixa-tionsrdquo Language and Cognitive Processes vol 14 no 5-6 pp461ndash480 1999

[101] R L Newman D Jared and C A Haigh ldquoDoes phonologyplay a role when skilled readers read high-frequency wordsEvidence from ERPsrdquo Language and Cognitive Processes vol 27no 9 pp 1361ndash1384 2012

[102] D Jared J Ashby S J Agauas and B A Levy ldquoPhonologicalactivation of word meanings in grade 5 readersrdquo Journal ofExperimental Psychology LearningMemory andCognition vol42 no 4 pp 524ndash541 2016

[103] T Bitan A Kaftory A Meiri-Leib Z Eviatar and O PelegldquoPhonological ambiguity modulates resolution of semanticambiguity during reading An fMRI study of Hebrewrdquo Neu-ropsychology vol 31 no 7 pp 759ndash777 2017

[104] D Jared and S Bainbridge ldquoReading homophone puns Evi-dence from eye trackingrdquo Canadian Journal of ExperimentalPsychology vol 71 no 1 pp 2ndash13 2017

[105] S Amenta M Marelli and S Sulpizio ldquoFrom sound tomeaning Phonology-to-Semantics mapping in visual wordrecognitionrdquo Psychonomic Bulletin amp Review vol 24 no 3 pp887ndash893 2017

[106] M Perrone-Bertolotti J Kujala J R Vidal et al ldquoHow silent issilent reading intracerebral evidence for top-down activationof temporal voice areas during readingrdquo The Journal of Neuro-science vol 32 no 49 pp 17554ndash17562 2012

[107] B Yao P Belin andC Scheepers ldquoSilent reading of direct versusindirect speech activates voice-selective areas in the auditorycortexrdquo Journal of Cognitive Neuroscience vol 23 no 10 pp3146ndash3152 2011

[108] M Coltheart B Curtis P Atkins and M Haller ldquoMod-els of Reading Aloud Dual-Route and Parallel-Distributed-Processing Approachesrdquo Psychological Review vol 100 no 4pp 589ndash608 1993

[109] M Coltheart ldquoModeling reading The dual-route approachrdquo inThe Science of Reading A Handbook pp 6ndash23 2005

38 Complexity

[110] M Pitt K Johnson E Hume S Kiesling and W RaymondldquoThe Buckeye corpus of conversational speech Labeling con-ventions and a test of transcriber reliabilityrdquo Speech Communi-cation vol 45 no 1 pp 89ndash95 2005

[111] K Johnson ldquoMassive reduction in conversational AmericanEnglishrdquo in Proceedings of the Spontaneous Speech Data andAnalysis Proceedings of the 1st Session of the 10th InternationalSymposium pp 29ndash54 The National International Institute forJapanese Language Tokyo Japan 2004

[112] C Phillips ldquoLevels of representation in the electrophysiology ofspeech perceptionrdquo Cognitive Science vol 25 no 5 pp 711ndash7312001

[113] R L Diehl A J Lotto and L L Holt ldquoSpeech perceptionrdquoAnnual Review of Psychology vol 55 pp 149ndash179 2004

[114] D Norris and J M McQueen ldquoShortlist B a Bayesian model ofcontinuous speech recognitionrdquo Psychological Review vol 115no 2 pp 357ndash395 2008

[115] S Hawkins ldquoRoles and representations of systematic finephonetic detail in speech understandingrdquo Journal of Phoneticsvol 31 no 3-4 pp 373ndash405 2003

[116] CCucchiarini andH Strik ldquoAutomatic Phonetic TranscriptionAn overviewrdquo in Proceedings of the 15th ICPhS pp 347ndash350Barcelona Spain 2003

[117] K JohnsonTheAuditoryPerceptual Basis for Speech Segmenta-tion Ohio State University Working Papers in Linguistics 1997

[118] H Fletcher ldquoAuditory patternsrdquo Reviews of Modern Physics vol12 no 1 pp 47ndash65 1940

[119] D Arnold ldquoAcousticndlcoder Coding sound files for use withndl R package version 101rdquo 2017

[120] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2016

[121] J Pickett and I Pollack ldquoIntelligibility of excerpts from fluentspeech Effects of rate of utterance and duration of excerptrdquoLanguage and Speech vol 6 pp 151ndash164 1963

[122] L Shockey ldquoPerception of reduced forms by non-native speak-ers of Englishrdquo in Sound Patterns of Spontaneous Speech DDuez Ed pp 97ndash100 ESCA Aix 1998

[123] M Ernestus H Baayen and R Schreuder ldquoThe recognition ofreduced word formsrdquo Brain and Language vol 81 no 1-3 pp162ndash173 2002

[124] R H Baayen C Shaoul J Willits and M Ramscar ldquoCom-prehension without segmentation a proof of concept withnaive discriminative learningrdquo Language Cognition and Neu-roscience vol 31 no 1 pp 106ndash128 2016

[125] C P Browman and L Goldstein ldquoArticulatory phonology Anoverviewrdquo Phonetica vol 49 no 3-4 pp 155ndash180 1992

[126] J A Tourville and F H Guenther ldquoThe DIVA model A neuraltheory of speech acquisition and productionrdquo Language andCognitive Processes vol 26 no 7 pp 952ndash981 2011

[127] G Csardi and T Nepusz ldquoThe igraph software package forcomplex network researchrdquo InterJournal Complex Systems vol1695 no 5 pp 1ndash9 2006

[128] M Ernestus and R H Baayen ldquoPredicting the unpredictableInterpreting neutralized segments in Dutchrdquo Language vol 79no 1 pp 5ndash38 2003

[129] R Weingarten G Nottbusch and U Will ldquoMorphemes sylla-bles and graphemes in written word productionrdquo inMultidisci-plinary Approaches to Speech Production T Pechmann and CHabel Eds pp 529ndash572 Mouton de Gruyter Berlin Germany2004

[130] R Bertram F E Toslashnnessen S Stromqvist J Hyona andP Niemi ldquoCascaded processing in written compound wordproductionrdquo Frontiers in Human Neuroscience vol 9 pp 1ndash102015

[131] T Cho ldquoEffects of morpheme boundaries on intergesturaltiming Evidence from Koreanrdquo Phonetica vol 58 no 3 pp129ndash162 2001

[132] S N Wood Generalized Additive Models Chapman ampHallCRC NY USA 2017

[133] M Seidenberg ldquoSublexical structures in visual word recogni-tion Access units or orthographic redundancyrdquo in Attentionand Performance XII M Coltheart Ed pp 245ndash264 LawrenceErlbaum Associates Hove 1987

[134] J M McQueen ldquoSegmentation of continuous speech usingphonotacticsrdquo Journal of Memory and Language vol 39 no 1pp 21ndash46 1998

[135] J B Hay Causes and Consequences of Word Structure Rout-ledge NY USA London UK 2003

[136] J B Hay ldquoFrom speech perception to morphology Affix-ordering revisitedrdquo Language vol 78 pp 527ndash555 2002

[137] J B Hay and R H Baayen ldquoPhonotactics parsing and produc-tivityrdquo Italian Journal of Linguistics vol 15 no 1 pp 99ndash1302003

[138] J Hay J Pierrehumbert and M Beckman ldquoSpeech perceptionwell-formedness and the statistics of the lexiconrdquo Papers inlaboratory phonology VI pp 58ndash74 2004

[139] M Ferro C Marzi and V Pirrelli ldquoA self-organizing modelof word storage and processing implications for morphologylearningrdquo Lingue e Linguaggio vol 10 no 2 pp 209ndash226 2011

[140] F ChersiM Ferro G Pezzulo andV Pirrelli ldquoTopological self-organization and prediction learning support both action andlexical chains in the brainrdquo Topics in Cognitive Science vol 6no 3 pp 476ndash491 2014

[141] V Pirrelli M Ferro and C Marzi ldquoComputational complexityof abstractive morphologyrdquo in Understanding and MeasuringMorphological Complexity M Bearman D Brown and G GCorbett Eds pp 141ndash166 Oxford University Press 2015

[142] C Marzi M Ferro and V Pirrelli ldquoIs inflectional irregularitydysfunctional to human processingrdquo in Abstract Booklet TheMental Lexicon 2018 V Kuperman Ed University of AlbertaEdmonton 2018

[143] E Keuleers M Stevens P Mandera and M Brysbaert ldquoWordknowledge in the crowd Measuring vocabulary size and wordprevalence in a massive online experimentrdquo The QuarterlyJournal of Experimental Psychology vol 68 no 8 pp 1665ndash16922015

[144] M Ernestus Voice Assimilation And Segment Reduction inCasual Dutch A Corpus-Based Study of the Phonology-PhoneticsInterface LOT Utrecht the Netherlands 2000

[145] R Kemps M Ernestus R Schreuder and H Baayen ldquoProcess-ing reducedword formsThe suffix restoration effectrdquoBrain andLanguage vol 90 no 1-3 pp 117ndash127 2004

[146] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) vol 1 pp 886ndash893 June 2005

[147] G Hickok ldquoThe architecture of speech production and the roleof the phoneme in speech processingrdquo Language Cognition andNeuroscience vol 29 no 1 pp 2ndash20 2014

[148] B V Tucker M Sims and R H Baayen Opposing Forceson Acoustic Duration Manuscript University of Alberta andUniversity of Tubingen 2018

Complexity 39

[149] I Plag J Homann and G Kunter ldquoHomophony and mor-phology The acoustics of word-final S in Englishrdquo Journal ofLinguistics vol 53 no 1 pp 181ndash216 2017

[150] F Tomaschek I Plag M Ernestus and R H BaayenModeling the Duration of Word-Final s in English withNaive Discriminative Learning Manuscript University ofSiegenTubingenNijmegen 2018

[151] R E Kalman ldquoA new approach to linear filtering and predictionproblemsrdquo Journal of Fluids Engineering vol 82 no 1 pp 35ndash451960

[152] B Russell ldquoOn denotingrdquo Mind vol 14 no 56 pp 479ndash4931905

[153] N Hornstein Logical Form FromGB toMinimalism Blackwell1995

[154] R Jackendoff Semantic StructuresMITPress Cambridge 1990[155] J J McCarthy ldquoA prosodic theory of non-concatenative mor-

phologyrdquo Linguistic Inquiry vol 12 pp 373ndash418 1981[156] A Ussishkin ldquoA Fixed Prosodic Theory of Nonconcatenative

Templaticmorphologyrdquo Natural Language amp Linguistic Theoryvol 23 no 1 pp 169ndash218 2005

[157] A Ussishkin ldquoAffix-favored contrast inequity and psycholin-guistic grounding for non-concatenative morphologyrdquo Mor-phology vol 16 no 1 pp 107ndash125 2006

[158] RW Young andWMorganTheNavajo Language A Grammarand Colloquial Dictionary University of New Mexico Press1980

[159] S C Levinson and A Majid ldquoThe island of time Yelı Dnye thelanguage of Rossel islandrdquo Frontiers in psychology vol 4 2013

[160] S C Levinson ldquoThe language of space in Yelı Dnyerdquo inGrammars of Space Explorations in Cognitive Diversity pp 157ndash203 Cambridge University Press 2006

[161] E Shafaei-Bajestan and R H Baayen ldquoWide Learning forAuditory Comprehensionrdquo in Proceedings of the Interspeech2018 2018

[162] J Hay and K Drager ldquoStuffed toys and speech perceptionrdquoLinguistics vol 48 no 4 pp 865ndash892 2010

[163] V Henri and C Henri ldquoOn our earliest recollections ofchildhoodrdquo Psychological Review vol 2 pp 215-216 1895

[164] S Freud ldquoChildhood and concealing memoriesrdquo in The BasicWritings of Sigmund Freud The Modern Library NY USA19051953

[165] P J Bauer and M Larkina ldquoThe onset of childhood amnesiain childhood A prospective investigation of the course anddeterminants of forgetting of early-life eventsrdquoMemory vol 22no 8 pp 907ndash924 2014

[166] M S Seidenberg and L M Gonnerman ldquoExplaining deriva-tional morphology as the convergence of codesrdquo Trends inCognitive Sciences vol 4 no 9 pp 353ndash361 2000

[167] M Coltheart K Rastle C Perry R Langdon and J ZieglerldquoDRC a dual route cascaded model of visual word recognitionand reading aloudrdquoPsychological Review vol 108 no 1 pp 204ndash256 2001

[168] M V Butz and E F Kutter How the Mind Comes into BeingIntroducing Cognitive Science from a Functional and Computa-tional Perspective Oxford University Press 2016

[169] M W Harm and M S Seidenberg ldquoPhonology readingacquisition and dyslexia Insights from connectionist modelsrdquoPsychological Review vol 106 no 3 pp 491ndash528 1999

[170] L M Gonnerman M S Seidenberg and E S AndersenldquoGraded semantic and phonological similarity effects in prim-ing Evidence for a distributed connectionist approach to

morphologyrdquo Journal of Experimental Psychology General vol136 no 2 pp 323ndash345 2007

[171] H Zhao O Gallo I Frosio and J Kautz ldquoLoss Functions forImage Restoration With Neural Networksrdquo IEEE Transactionson Computational Imaging vol 3 no 1 pp 47ndash57 2017

[172] C Clarke and P Luce ldquoPerceptual adaptation to speakercharacteristics Vot boundaries in stop voicing categorizationrdquoin Proceedings of the ISCA Workshop on Plasticity in SpeechPerception 2005

[173] DNorris JMMcQueen andACutler ldquoPerceptual learning inspeechrdquo Cognitive Psychology vol 47 no 2 pp 204ndash238 2003

[174] A Schweitzer and N Lewandowski ldquoSocial factors in conver-gence of f1 and f2 in spontaneous speechrdquo in Proceedings ofthe 10th International Seminar on Speech Production CologneGermany 2014

[175] M J Pickering and S Garrod ldquoToward a mechanistic psychol-ogy of dialoguerdquo Behavioral and Brain Sciences vol 27 no 2 pp169ndash190 2004

[176] C M Clarke and M F Garrett ldquoRapid adaptation to foreign-accented Englishrdquo The Journal of the Acoustical Society ofAmerica vol 116 no 6 pp 3647ndash3658 2004

[177] W J Levelt ldquoDie konnektionistische moderdquo Sprache UndKognition vol 10 no 2 pp 61ndash72 1991

[178] A Kostic ldquoInformational load constraints on processinginflected morphologyrdquo in Morphological Aspects of LanguageProcessing L B Feldman Ed Lawrence Erlbaum Inc Publish-ers New Jersey USA 1995

[179] J P BlevinsWord andParadigmMorphology OxfordUniversityPress 2016

[180] E Forstemann Uber Deutsche volksetymologie Zeitschrift furvergleichende Sprachforschung auf dem Gebiete des DeutschenGriechischen und Lateinischen 1852

[181] K Nault Morphological Therapy Protocal [Ph D thesis] Uni-versity of Alberta Edmonton 2010

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 34: The Discriminative Lexicon: A Unified Computational Model ...downloads.hindawi.com/journals/complexity/2019/4895891.pdf · as developed within computational linguistics and com-puter

34 Complexity

nobo

dyso

met

hing

ever

ythi

ngno

thin

gev

eryo

nean

ythi

ngan

yone

som

eone

hers

him his

via

than I

exce

ptou

tside

thei

rsw

hoev

erth

ence

min

ein

side

amon

gst its

thei

rou

rm

yyo

uyo

ur her

she

your

selv

esw

hate

ver us we it

they he

them m

em

yself

him

self

hers

elfou

rselv

esyo

urse

lfits

elfth

emse

lves

abou

tw

ithou

tw

ithin

beyo

ndam

ong

betw

een

thro

ugho

utdu

ring

since by

from in of

with at fo

r tobe

low

abov

e up offdo

wn

upon on

agai

nst

tow

ard

tow

ards

arou

ndne

aral

ong

unde

rbe

hind

besid

eth

roug

hon

toin

toov

erac

ross

nobodysomethingeverythingnothingeveryoneanythinganyonesomeonehershimhisviathanIexceptoutsidetheirswhoeverthencemineinsideamongstitstheirourmyyouyourhersheyourselveswhateverusweittheyhethemmemyselfhimselfherselfourselvesyourselfitselfthemselvesaboutwithoutwithinbeyondamongbetweenthroughoutduringsincebyfrominofwithatfortobelowaboveupoffdownupononagainsttowardtowardsaroundnearalongunderbehindbesidethroughontointooveracross

minus01 0 01 02Value

Color Keyand Histogram

050

015

0025

00C

ount

Figure 14 Heatmap for the correlations of the semantic vectors of pronouns and prepositions Note that quantifier pronouns form a groupof their own that prepositions and the other pronouns form distinguisable groups and that within groups further subclusters are visible egforwe and us she and her and across and over All diagonal elements represent correlations equal to 1 (this is not brought out by the heatmapwhich color-codes diagonal elements with the color for the highest correlation of the range we specified 025)

and derived words under cross-validationTherefore verticesand edges are added to the graph for all nonfinal verticesthat are at the end of any of the paths starting at any initialtriphone Such novel vertices and edges are assigned zeronetwork support Typically paths from the initial vertex tothe final vertex can now be constructed and path support isevaluated as above

For an algorithm that allows triphone paths to includecycles which may be the case in languages with much richer

morphology than English see Baayen et al [85] and for codethe R packageWpmWithLdl described therein

C A Heatmap for Function Words(Pronouns and Prepositions)

The heatmap for the correlations of the semantic vectors ofpronouns and prepositions is presented in Figure 14

Complexity 35

Data Availability

With the exception of the data from theDistributed Little RedHen Lab all primary data are publicly available from the citedsources

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The authors are indebted to Geoff Hollis Jessie Nixon ChrisWestbury and LuisMienhardt for their constructive feedbackon earlier versions of this manuscript This research wassupported by an ERC advancedGrant (no 742545) to the firstauthor

References

[1] G Frege ldquoBegriffsschrift a formula language modeled uponthat of arithmetic for pure thoughtrdquo in From Frege to GodelA Source Book in Mathematical Logic 1879-1931 pp 1ndash82 1967

[2] B RussellAn Inquiry intoMeaning andTruth Allen andUnwinLondon UK 1942

[3] R Montague ldquoThe proper treatment of quantification in ordi-nary englishrdquo in Approaches to Natural Language pp 221ndash242Springer 1973

[4] N Chomsky Syntactic Structures vol 33 Mouton The Hague1957

[5] W J M Levelt Formal Grammars in Linguistics And Psycholin-guistics Volume 1 An Introduction to the Theory of FormalLanguages and Automata Volume 2 Applications in LinguisticTheory Volume 3 Psycholinguistic Applications John BenjaminsPublishing 2008

[6] D F Kleinschmidt and T Florian Jaeger ldquoRobust speechperception Recognize the familiar generalize to the similar andadapt to the novelrdquo Psychological Review vol 122 no 2 pp 148ndash203 2015

[7] L Breiman ldquoRandom forestsrdquo Machine Learning vol 45 no 1pp 5ndash32 2001

[8] C Strobl A-L Boulesteix A Zeileis and T Hothorn ldquoBiasin random forest variable importance measures illustrationssources and a solutionrdquo BMC Bioinformatics vol 8 2007

[9] A Hannun C Case J Casper et al ldquoDeep speechScaling up end-to-end speech recognitionrdquo 2014httpsarxivorgabs14125567

[10] Z Tuske P Golik R Schluter and H Ney ldquoAcoustic modelingwith deep neural networks using raw time signal for lvcsrrdquo inProceedings of the INTERSPEECH pp 890ndash894 2014

[11] J Schmidhuber ldquoDeep learning in neural networks anoverviewrdquo Neural Networks vol 61 pp 85ndash117 2015

[12] T Hannagan J C Ziegler S Dufau J Fagot and J GraingerldquoDeep learning of orthographic representations in baboonsrdquoPLoS ONE vol 9 no 1 2014

[13] J Grainger S Dufau M Montant J C Ziegler and J FagotldquoOrthographic processing in baboons (Papio papio)rdquo Sciencevol 336 no 6078 pp 245ndash248 2012

[14] D Scarf K Boy A U Reinert J Devine O Gunturkun andM Colombo ldquoOrthographic processing in pigeons (columba

livia)rdquo Proceedings of the National Academy of Sciences vol 113no 40 pp 11272ndash11276 2016

[15] M Linke F Broker M Ramscar and R H Baayen ldquoArebaboons learning ldquoorthographicrdquo representations Probablynotrdquo PLoS ONE vol 12 no 8 2017

[16] D H Hubel and T N Wiesel ldquoReceptive fields binocularinteraction and functional architecture in the catrsquos visualcortexrdquo The Journal of Physiology vol 160 pp 106ndash154 1962

[17] R H Baayen and P Hendrix ldquoTwo-layer networks non-linear separation and human learningrdquo in From Semantics toDialectometry Festschrift in honor of JohnNerbonne Tributes 32M Wieling M Kroon G Van Noord and G Bouma Eds pp13ndash22 College Publications 2017

[18] D Arnold F Tomaschek K Sering F Lopez and R H BaayenldquoWords from spontaneous conversational speech can be rec-ognized with human-like accuracy by an error-driven learningalgorithm that discriminates between meanings straight fromsmart acoustic features bypassing the phoneme as recognitionunitrdquo PLoS ONE vol 12 no 4 2017

[19] P Zwitserlood ldquoProcessing and representation of morphologi-cal complexity in native language comprehension and produc-tionrdquo in The Construction of Words G E Booij Ed pp 583ndash602 Springer 2018

[20] G Booij ldquoConstruction morphologyrdquo Language and LinguisticsCompass vol 4 no 7 pp 543ndash555 2010

[21] J P Blevins ldquoWord-based morphologyrdquo Journal of Linguisticsvol 42 no 3 pp 531ndash573 2006

[22] G Stump Inflectional Morphology A Theory of ParadigmStructure Cambridge University Press 2001

[23] R Beard ldquoOn the extent and nature of irregularity in thelexiconrdquo Lingua vol 42 no 4 pp 305ndash341 1977

[24] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1974

[25] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1991

[26] C Hockett ldquoThe origin of speechrdquo Scientific American vol 203pp 89ndash97 1960

[27] M Erelt Estonian Language M Erelt Ed Estonian AcademyPublishers Tallinn Estonia 2003

[28] P H Matthews Grammatical Theory in the United States fromBloomfield to Chomsky Cambridge University Press Cam-bridge 1993

[29] N Chomsky andMHalleTheSound Pattern of English Harperand Row NY USA 1968

[30] T Landauer and S Dumais ldquoA solution to Platorsquos problem Thelatent semantic analysis theory of acquisition induction andrepresentation of knowledgerdquo Psychological Review vol 104 no2 pp 211ndash240 1997

[31] W Weaver ldquoTranslationrdquo in Machine Translation of LanguagesFourteen Essays W N Locke and A D Booth Eds pp 15ndash23MIT Press Cambridge 1955

[32] J R Firth Selected Papers of J R Firth 1952-59 IndianaUniversity Press 1968

[33] K Lund and C Burgess ldquoProducing high-dimensional seman-tic spaces from lexical co-occurrencerdquo Behavior Research Meth-ods Instruments and Computers vol 28 no 2 pp 203ndash2081996

[34] C Shaoul and C Westbury ldquoExploring lexical co-occurrencespace using HiDExrdquo Behavior Research Methods vol 42 no 2pp 393ndash413 2010

36 Complexity

[35] T Mikolov I Sutskever K Chen G Corrado and J DeanldquoDistributed representations of words and phrases and theircompositionalityrdquo inAdvances in Neural Information ProcessingSystems pp 3111ndash3119 2013

[36] A Roelofs ldquoThe WEAVER model of word-form encoding inspeech productionrdquo Cognition vol 64 no 3 pp 249ndash284 1997

[37] W J M Levelt A Roelofs and A S Meyer ldquoA theory of lexicalaccess in speech productionrdquoBehavioral and Brain Sciences vol22 no 1 pp 1ndash38 1999

[38] M Taft ldquoInteractive-activation as a Framework for Under-standing Morphological Processingrdquo Language and CognitiveProcesses vol 9 no 3 pp 271ndash294 1994

[39] R Schreuder and R H Baayen ldquoModeling morphologicalprocessingrdquo inMorphological Aspects of Language Processing LB Feldman Ed pp 131ndash154 Lawrence ErlbaumHillsdale NewJersey USA 1995

[40] B C Love D L Medin and T M Gureckis ldquoSUSTAIN ANetwork Model of Category Learningrdquo Psychological Reviewvol 111 no 2 pp 309ndash332 2004

[41] C J Marsolek ldquoWhat antipriming reveals about primingrdquoTrends in Cognitive Sciences vol 12 no 5 pp 176ndash181 2008

[42] M Ramscar and R Port ldquoCategorization (without categories)rdquoin Handbook of Cognitive Linguistics E Dabrowska and DDivjak Eds pp 75ndash99 De Gruyter Berlin Germany 2015

[43] R A Rescorla and A R Wagner ldquoA theory of Pavlovianconditioning Variations in the effectiveness of reinforcementand nonreinforcementrdquo in Classical Conditioning II CurrentResearch and Theory A H Black and W F Prokasy Eds pp64ndash99 Appleton Century Crofts NY USA 1972

[44] B Widrow and M E Hoff ldquoAdaptive switching circuitsrdquo 1960WESCON Convention Record Part IV pp 96ndash104 1960

[45] R H Baayen R Schreuder and R Sproat ldquoMorphology inthe Mental Lexicon A Computational Model for Visual WordRecognitionrdquo in Lexicon Development for Speech and LanguageProcessing F van Eynde and D Gibbon Eds pp 267ndash291Kluwer Academic Publishers 2000

[46] M Taft ldquoA morphological-decomposition model of lexicalrepresentationrdquo Linguistics vol 26 no 4 pp 657ndash667 1988

[47] MWHarmandM S Seidenberg ldquoComputing themeanings ofwords in reading Cooperative division of labor between visualand phonological processesrdquo Psychological Review vol 111 no3 pp 662ndash720 2004

[48] P C Trimmer J M McNamara A Houston and J A RMarshall ldquoDoes natural selection favour the Rescorla-Wagnerrulerdquo Journal of Theoretical Biology vol 302 pp 39ndash52 2012

[49] R R Miller R C Barnet and N J Grahame ldquoAssessment ofthe Rescorla-Wagner modelrdquo Psychological Bulletin vol 117 no3 pp 363ndash386 1995

[50] L J Kamin ldquoPredictability surprise attention and condition-ingrdquo in Punishment and Aversive Behavior B A Campbell andRMChurch Eds pp 276ndash296Appleton-Century-CroftsNYUSA 1969

[51] R A Rescorla ldquoPavlovian conditioning Itrsquos not what you thinkit isrdquo American Psychologist vol 43 no 3 pp 151ndash160 1988

[52] M Ramscar D Yarlett M Dye K Denny and K ThorpeldquoThe Effects of Feature-Label-Order and Their Implications forSymbolic Learningrdquo Cognitive Science vol 34 no 6 pp 909ndash957 2010

[53] M Ramscar and D Yarlett ldquoLinguistic self-correction in theabsence of feedback A new approach to the logical problem oflanguage acquisitionrdquo Cognitive Science vol 31 no 6 pp 927ndash960 2007

[54] R H Baayen P Milin D F ETHurđevic P Hendrix and MMarelli ldquoAn Amorphous Model for Morphological Process-ing in Visual Comprehension Based on Naive DiscriminativeLearningrdquo Psychological Review vol 118 no 3 pp 438ndash4812011

[55] F M Del Prado Martın R Bertram T Haikio R Schreuderand R H Baayen ldquoMorphological family size in a morphologi-cally rich languageThe case of Finnish compared to Dutch andHebrewrdquo Journal of Experimental Psychology LearningMemoryand Cognition vol 30 pp 1271ndash1278 2004

[56] P Milin D F Durdevic and F M Del Prado Martın ldquoThesimultaneous effects of inflectional paradigms and classes onlexical recognition Evidence from Serbianrdquo Journal of Memoryand Language vol 60 no 1 pp 50ndash64 2009

[57] B K Bergen ldquoThe psychological reality of phonaesthemesrdquoLanguage vol 80 no 2 pp 290ndash311 2004

[58] P Milin L B Feldman M Ramscar P Hendrix and R HBaayen ldquoDiscrimination in lexical decisionrdquo PLoS ONE vol 12no 2 2017

[59] R H Baayen P Milin and M Ramscar ldquoFrequency in lexicalprocessingrdquo Aphasiology vol 30 no 11 pp 1174ndash1220 2016

[60] M Marelli and M Baroni ldquoAffixation in semantic space Mod-eling morpheme meanings with compositional distributionalsemanticsrdquo Psychological Review vol 122 no 3 pp 485ndash5152015

[61] T Sering PMilin andRH Baayen ldquoLanguage comprehensionas amultiple label classification problemrdquo StatisticaNeerlandicapp 1ndash15 2018

[62] R Kaye and RWilson Linear Algebra Oxford University Press1998

[63] S H Ivens and B L Koslin Demands for Reading LiteracyRequire New Accountability Methods Touchstone Applied Sci-ence Associates 1991

[64] T K Landauer P W Foltz and D Laham ldquoAn introduction tolatent semantic analysisrdquoDiscourse Processes vol 25 no 2-3 pp259ndash284 1998

[65] H Schmid ldquoImprovements in Part-of-Speech Tagging with anApplication to Germanrdquo in Proceedings of the ACL SIGDAT-Workshop pp 13ndash25 Dublin Ireland 1995

[66] R H Baayen R Piepenbrock and L Gulikers The CELEXLexical Database (CD-ROM) Linguistic Data ConsortiumUniversity of Pennsylvania Philadelphia PA 1995

[67] J Mitchell and M Lapata ldquoVector-based models of semanticcompositionrdquo in Proceedings of the ACL pp 236ndash244 2008

[68] A Lazaridou M Marelli R Zamparelli and M BaronildquoCompositionally derived representations of morphologicallycomplex words in distributional semanticsrdquo in Proceedings ofthe ACL (1) pp 1517ndash1526 2013

[69] R Cotterell H Schutze and J Eisner ldquoMorphological smooth-ing and extrapolation of word embeddingsrdquo in Proceedings ofthe 54th Annual Meeting of the Association for ComputationalLinguistics vol 1 pp 1651ndash1660 2016

[70] B D Zeller S Pado and J Snajder ldquoTowards semantic valida-tion of a derivational lexiconrdquo in Proceedings of the COLING2014 the 25th International Conference on Computational Lin-guistics Technical Papers pp 1728ndash1739 2014

[71] H Marchand The Categories and Types of Present-Day EnglishWord Formation A Synchronic-Diachronic Approach BeckrsquoscheVerlagsbuchhandlung Munchen Germany 1969

[72] L Bauer EnglishWord-Formation Cambridge University PressCambridge 1983

Complexity 37

[73] I PlagWord-Formation in English CambridgeUniversity PressCambridge 2003

[74] R Beard Lexeme-Morpheme Base Morphology A GeneralTheory of Inflection and Word Formation State University ofNew York Press Albany NY USA 1995

[75] K Geeraert J Newman and R H Baayen ldquoIdiom VariationExperimental Data and a Blueprint of a Computational ModelrdquoTopics in Cognitive Science vol 9 no 3 pp 653ndash669 2017

[76] C Shaoul S Bitschau N Schilling et al ldquondl2 Naive discrim-inative learning an implementation in R R packagerdquo 2015

[77] P Milin D Divjak and R H Baayen ldquoA learning perspectiveon individual differences in skilled reading Exploring andexploiting orthographic and semantic discrimination cuesrdquoJournal of Experimental Psychology Learning Memory andCognition vol 43 no 11 pp 1730ndash1751 2017

[78] M Ramscar P Hendrix C Shaoul P Milin and R H BaayenldquoNonlinear dynamics of lifelong learning the myth of cognitivedeclinerdquo Topics in Cognitive Science vol 6 pp 5ndash42 2014

[79] M Ramscar C C Sun P Hendrix and R H Baayen ldquoTheMis-measurement of Mind Life-Span Changes in Paired-Associate-Learning Scores Reflect the ldquoCostrdquo of Learning Not CognitiveDeclinerdquo Psychological Science vol 28 no 8 pp 1171ndash1179 2017

[80] G desRosiers and D Ivison ldquoPaired associate learning Form 1and form 2 of the Wechsler memory scalerdquo Archives of ClinicalNeuropsychology vol 3 no 1 pp 47ndash67 1988

[81] E Bruni N K Tran andM Baroni ldquoMultimodal distributionalsemanticsrdquo Journal of Artificial Intelligence Research vol 49 pp1ndash47 2014

[82] W N Venables and B D RipleyModern Applied Statistics withS-PLUS Springer NY USA 2002

[83] A B Warriner V Kuperman and M Brysbaert ldquoNorms ofvalence arousal and dominance for 13915 English lemmasrdquoBehavior Research Methods vol 45 no 4 pp 1191ndash1207 2013

[84] D Kastovsky ldquoProductivity in word formationrdquo Linguistics vol24 no 3 pp 585ndash600 1986

[85] R H Baayen Y Chuang and J P Blevins ldquoInflectionalmorphology with linear mappingsrdquoTheMental Lexicon vol 13no 2 pp 232ndash270 2018

[86] Burnage CELEX A Guide for Users Centre for Lexical Infor-mation Nijmegen The Netherlands 1988

[87] C McBride-Chang ldquoModels of Speech Perception and Phono-logical Processing in Readingrdquo Child Development vol 67 no4 pp 1836ndash1856 1996

[88] G C Van Orden H Kloos et al ldquoThe question of phonologyand readingrdquo inThe Science of Reading A Handbook pp 61ndash782005

[89] D Jared and K OrsquoDonnell ldquoSkilled adult readers activate themeanings of high-frequency words using phonology Evidencefrom eye trackingrdquoMemoryampCognition vol 45 no 2 pp 334ndash346 2017

[90] G S Dell ldquoA Spreading-Activation Theory of Retrieval inSentence Productionrdquo Psychological Review vol 93 no 3 pp283ndash321 1986

[91] A Marantz ldquoNo escape from morphemes in morphologicalprocessingrdquo Language and Cognitive Processes vol 28 no 7 pp905ndash916 2013

[92] M Bozic W D Marslen-Wilson E A Stamatakis M HDavis and L K Tyler ldquoDifferentiating morphology formand meaning Neural correlates of morphological complexityrdquoCognitive Neuroscience vol 19 no 9 pp 1464ndash1475 2007

[93] R F Port andA P Leary ldquoAgainst formal phonologyrdquoLanguagevol 81 no 4 pp 927ndash964 2005

[94] J P Blevins ldquoStems and paradigmsrdquo Language vol 79 no 4 pp737ndash767 2003

[95] R Forsyth and D Holmes ldquoFeature-finding for test classifica-tionrdquo Literary and Linguistic Computing vol 11 no 4 pp 163ndash174 1996

[96] H Pham and R H Baayen ldquoVietnamese compounds showan anti-frequency effect in visual lexical decisionrdquo LanguageCognition and Neuroscience vol 30 no 9 pp 1077ndash1095 2015

[97] L Cohen and S Dehaene ldquoVentral and dorsal contributions toword readingrdquo inTheCognitive Neurosciences M S GazzanigaEd pp 789ndash804 Massachusetts Institute of Technology 2009

[98] E Keuleers P Lacey K Rastle and M Brysbaert ldquoThe BritishLexicon Project Lexical decision data for 28730 monosyllabicand disyllabic English wordsrdquo Behavior Research Methods vol44 no 1 pp 287ndash304 2012

[99] R K Wagner J K Torgesen and C A Rashotte ldquoDevel-opment of reading-related phonological processing abilitiesnew evidence of bidirectional causality from a latent variablelongitudinal studyrdquoDevelopmental Psychology vol 30 no 1 pp73ndash87 1994

[100] K F EWong andH-C Chen ldquoOrthographic and phonologicalprocessing in reading Chinese text Evidence from eye fixa-tionsrdquo Language and Cognitive Processes vol 14 no 5-6 pp461ndash480 1999

[101] R L Newman D Jared and C A Haigh ldquoDoes phonologyplay a role when skilled readers read high-frequency wordsEvidence from ERPsrdquo Language and Cognitive Processes vol 27no 9 pp 1361ndash1384 2012

[102] D Jared J Ashby S J Agauas and B A Levy ldquoPhonologicalactivation of word meanings in grade 5 readersrdquo Journal ofExperimental Psychology LearningMemory andCognition vol42 no 4 pp 524ndash541 2016

[103] T Bitan A Kaftory A Meiri-Leib Z Eviatar and O PelegldquoPhonological ambiguity modulates resolution of semanticambiguity during reading An fMRI study of Hebrewrdquo Neu-ropsychology vol 31 no 7 pp 759ndash777 2017

[104] D Jared and S Bainbridge ldquoReading homophone puns Evi-dence from eye trackingrdquo Canadian Journal of ExperimentalPsychology vol 71 no 1 pp 2ndash13 2017

[105] S Amenta M Marelli and S Sulpizio ldquoFrom sound tomeaning Phonology-to-Semantics mapping in visual wordrecognitionrdquo Psychonomic Bulletin amp Review vol 24 no 3 pp887ndash893 2017

[106] M Perrone-Bertolotti J Kujala J R Vidal et al ldquoHow silent issilent reading intracerebral evidence for top-down activationof temporal voice areas during readingrdquo The Journal of Neuro-science vol 32 no 49 pp 17554ndash17562 2012

[107] B Yao P Belin andC Scheepers ldquoSilent reading of direct versusindirect speech activates voice-selective areas in the auditorycortexrdquo Journal of Cognitive Neuroscience vol 23 no 10 pp3146ndash3152 2011

[108] M Coltheart B Curtis P Atkins and M Haller ldquoMod-els of Reading Aloud Dual-Route and Parallel-Distributed-Processing Approachesrdquo Psychological Review vol 100 no 4pp 589ndash608 1993

[109] M Coltheart ldquoModeling reading The dual-route approachrdquo inThe Science of Reading A Handbook pp 6ndash23 2005

38 Complexity

[110] M Pitt K Johnson E Hume S Kiesling and W RaymondldquoThe Buckeye corpus of conversational speech Labeling con-ventions and a test of transcriber reliabilityrdquo Speech Communi-cation vol 45 no 1 pp 89ndash95 2005

[111] K Johnson ldquoMassive reduction in conversational AmericanEnglishrdquo in Proceedings of the Spontaneous Speech Data andAnalysis Proceedings of the 1st Session of the 10th InternationalSymposium pp 29ndash54 The National International Institute forJapanese Language Tokyo Japan 2004

[112] C Phillips ldquoLevels of representation in the electrophysiology ofspeech perceptionrdquo Cognitive Science vol 25 no 5 pp 711ndash7312001

[113] R L Diehl A J Lotto and L L Holt ldquoSpeech perceptionrdquoAnnual Review of Psychology vol 55 pp 149ndash179 2004

[114] D Norris and J M McQueen ldquoShortlist B a Bayesian model ofcontinuous speech recognitionrdquo Psychological Review vol 115no 2 pp 357ndash395 2008

[115] S Hawkins ldquoRoles and representations of systematic finephonetic detail in speech understandingrdquo Journal of Phoneticsvol 31 no 3-4 pp 373ndash405 2003

[116] CCucchiarini andH Strik ldquoAutomatic Phonetic TranscriptionAn overviewrdquo in Proceedings of the 15th ICPhS pp 347ndash350Barcelona Spain 2003

[117] K JohnsonTheAuditoryPerceptual Basis for Speech Segmenta-tion Ohio State University Working Papers in Linguistics 1997

[118] H Fletcher ldquoAuditory patternsrdquo Reviews of Modern Physics vol12 no 1 pp 47ndash65 1940

[119] D Arnold ldquoAcousticndlcoder Coding sound files for use withndl R package version 101rdquo 2017

[120] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2016

[121] J Pickett and I Pollack ldquoIntelligibility of excerpts from fluentspeech Effects of rate of utterance and duration of excerptrdquoLanguage and Speech vol 6 pp 151ndash164 1963

[122] L Shockey ldquoPerception of reduced forms by non-native speak-ers of Englishrdquo in Sound Patterns of Spontaneous Speech DDuez Ed pp 97ndash100 ESCA Aix 1998

[123] M Ernestus H Baayen and R Schreuder ldquoThe recognition ofreduced word formsrdquo Brain and Language vol 81 no 1-3 pp162ndash173 2002

[124] R H Baayen C Shaoul J Willits and M Ramscar ldquoCom-prehension without segmentation a proof of concept withnaive discriminative learningrdquo Language Cognition and Neu-roscience vol 31 no 1 pp 106ndash128 2016

[125] C P Browman and L Goldstein ldquoArticulatory phonology Anoverviewrdquo Phonetica vol 49 no 3-4 pp 155ndash180 1992

[126] J A Tourville and F H Guenther ldquoThe DIVA model A neuraltheory of speech acquisition and productionrdquo Language andCognitive Processes vol 26 no 7 pp 952ndash981 2011

[127] G Csardi and T Nepusz ldquoThe igraph software package forcomplex network researchrdquo InterJournal Complex Systems vol1695 no 5 pp 1ndash9 2006

[128] M Ernestus and R H Baayen ldquoPredicting the unpredictableInterpreting neutralized segments in Dutchrdquo Language vol 79no 1 pp 5ndash38 2003

[129] R Weingarten G Nottbusch and U Will ldquoMorphemes sylla-bles and graphemes in written word productionrdquo inMultidisci-plinary Approaches to Speech Production T Pechmann and CHabel Eds pp 529ndash572 Mouton de Gruyter Berlin Germany2004

[130] R Bertram F E Toslashnnessen S Stromqvist J Hyona andP Niemi ldquoCascaded processing in written compound wordproductionrdquo Frontiers in Human Neuroscience vol 9 pp 1ndash102015

[131] T Cho ldquoEffects of morpheme boundaries on intergesturaltiming Evidence from Koreanrdquo Phonetica vol 58 no 3 pp129ndash162 2001

[132] S N Wood Generalized Additive Models Chapman ampHallCRC NY USA 2017

[133] M Seidenberg ldquoSublexical structures in visual word recogni-tion Access units or orthographic redundancyrdquo in Attentionand Performance XII M Coltheart Ed pp 245ndash264 LawrenceErlbaum Associates Hove 1987

[134] J M McQueen ldquoSegmentation of continuous speech usingphonotacticsrdquo Journal of Memory and Language vol 39 no 1pp 21ndash46 1998

[135] J B Hay Causes and Consequences of Word Structure Rout-ledge NY USA London UK 2003

[136] J B Hay ldquoFrom speech perception to morphology Affix-ordering revisitedrdquo Language vol 78 pp 527ndash555 2002

[137] J B Hay and R H Baayen ldquoPhonotactics parsing and produc-tivityrdquo Italian Journal of Linguistics vol 15 no 1 pp 99ndash1302003

[138] J Hay J Pierrehumbert and M Beckman ldquoSpeech perceptionwell-formedness and the statistics of the lexiconrdquo Papers inlaboratory phonology VI pp 58ndash74 2004

[139] M Ferro C Marzi and V Pirrelli ldquoA self-organizing modelof word storage and processing implications for morphologylearningrdquo Lingue e Linguaggio vol 10 no 2 pp 209ndash226 2011

[140] F ChersiM Ferro G Pezzulo andV Pirrelli ldquoTopological self-organization and prediction learning support both action andlexical chains in the brainrdquo Topics in Cognitive Science vol 6no 3 pp 476ndash491 2014

[141] V Pirrelli M Ferro and C Marzi ldquoComputational complexityof abstractive morphologyrdquo in Understanding and MeasuringMorphological Complexity M Bearman D Brown and G GCorbett Eds pp 141ndash166 Oxford University Press 2015

[142] C Marzi M Ferro and V Pirrelli ldquoIs inflectional irregularitydysfunctional to human processingrdquo in Abstract Booklet TheMental Lexicon 2018 V Kuperman Ed University of AlbertaEdmonton 2018

[143] E Keuleers M Stevens P Mandera and M Brysbaert ldquoWordknowledge in the crowd Measuring vocabulary size and wordprevalence in a massive online experimentrdquo The QuarterlyJournal of Experimental Psychology vol 68 no 8 pp 1665ndash16922015

[144] M Ernestus Voice Assimilation And Segment Reduction inCasual Dutch A Corpus-Based Study of the Phonology-PhoneticsInterface LOT Utrecht the Netherlands 2000

[145] R Kemps M Ernestus R Schreuder and H Baayen ldquoProcess-ing reducedword formsThe suffix restoration effectrdquoBrain andLanguage vol 90 no 1-3 pp 117ndash127 2004

[146] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) vol 1 pp 886ndash893 June 2005

[147] G Hickok ldquoThe architecture of speech production and the roleof the phoneme in speech processingrdquo Language Cognition andNeuroscience vol 29 no 1 pp 2ndash20 2014

[148] B V Tucker M Sims and R H Baayen Opposing Forceson Acoustic Duration Manuscript University of Alberta andUniversity of Tubingen 2018

Complexity 39

[149] I Plag J Homann and G Kunter ldquoHomophony and mor-phology The acoustics of word-final S in Englishrdquo Journal ofLinguistics vol 53 no 1 pp 181ndash216 2017

[150] F Tomaschek I Plag M Ernestus and R H BaayenModeling the Duration of Word-Final s in English withNaive Discriminative Learning Manuscript University ofSiegenTubingenNijmegen 2018

[151] R E Kalman ldquoA new approach to linear filtering and predictionproblemsrdquo Journal of Fluids Engineering vol 82 no 1 pp 35ndash451960

[152] B Russell ldquoOn denotingrdquo Mind vol 14 no 56 pp 479ndash4931905

[153] N Hornstein Logical Form FromGB toMinimalism Blackwell1995

[154] R Jackendoff Semantic StructuresMITPress Cambridge 1990[155] J J McCarthy ldquoA prosodic theory of non-concatenative mor-

phologyrdquo Linguistic Inquiry vol 12 pp 373ndash418 1981[156] A Ussishkin ldquoA Fixed Prosodic Theory of Nonconcatenative

Templaticmorphologyrdquo Natural Language amp Linguistic Theoryvol 23 no 1 pp 169ndash218 2005

[157] A Ussishkin ldquoAffix-favored contrast inequity and psycholin-guistic grounding for non-concatenative morphologyrdquo Mor-phology vol 16 no 1 pp 107ndash125 2006

[158] RW Young andWMorganTheNavajo Language A Grammarand Colloquial Dictionary University of New Mexico Press1980

[159] S C Levinson and A Majid ldquoThe island of time Yelı Dnye thelanguage of Rossel islandrdquo Frontiers in psychology vol 4 2013

[160] S C Levinson ldquoThe language of space in Yelı Dnyerdquo inGrammars of Space Explorations in Cognitive Diversity pp 157ndash203 Cambridge University Press 2006

[161] E Shafaei-Bajestan and R H Baayen ldquoWide Learning forAuditory Comprehensionrdquo in Proceedings of the Interspeech2018 2018

[162] J Hay and K Drager ldquoStuffed toys and speech perceptionrdquoLinguistics vol 48 no 4 pp 865ndash892 2010

[163] V Henri and C Henri ldquoOn our earliest recollections ofchildhoodrdquo Psychological Review vol 2 pp 215-216 1895

[164] S Freud ldquoChildhood and concealing memoriesrdquo in The BasicWritings of Sigmund Freud The Modern Library NY USA19051953

[165] P J Bauer and M Larkina ldquoThe onset of childhood amnesiain childhood A prospective investigation of the course anddeterminants of forgetting of early-life eventsrdquoMemory vol 22no 8 pp 907ndash924 2014

[166] M S Seidenberg and L M Gonnerman ldquoExplaining deriva-tional morphology as the convergence of codesrdquo Trends inCognitive Sciences vol 4 no 9 pp 353ndash361 2000

[167] M Coltheart K Rastle C Perry R Langdon and J ZieglerldquoDRC a dual route cascaded model of visual word recognitionand reading aloudrdquoPsychological Review vol 108 no 1 pp 204ndash256 2001

[168] M V Butz and E F Kutter How the Mind Comes into BeingIntroducing Cognitive Science from a Functional and Computa-tional Perspective Oxford University Press 2016

[169] M W Harm and M S Seidenberg ldquoPhonology readingacquisition and dyslexia Insights from connectionist modelsrdquoPsychological Review vol 106 no 3 pp 491ndash528 1999

[170] L M Gonnerman M S Seidenberg and E S AndersenldquoGraded semantic and phonological similarity effects in prim-ing Evidence for a distributed connectionist approach to

morphologyrdquo Journal of Experimental Psychology General vol136 no 2 pp 323ndash345 2007

[171] H Zhao O Gallo I Frosio and J Kautz ldquoLoss Functions forImage Restoration With Neural Networksrdquo IEEE Transactionson Computational Imaging vol 3 no 1 pp 47ndash57 2017

[172] C Clarke and P Luce ldquoPerceptual adaptation to speakercharacteristics Vot boundaries in stop voicing categorizationrdquoin Proceedings of the ISCA Workshop on Plasticity in SpeechPerception 2005

[173] DNorris JMMcQueen andACutler ldquoPerceptual learning inspeechrdquo Cognitive Psychology vol 47 no 2 pp 204ndash238 2003

[174] A Schweitzer and N Lewandowski ldquoSocial factors in conver-gence of f1 and f2 in spontaneous speechrdquo in Proceedings ofthe 10th International Seminar on Speech Production CologneGermany 2014

[175] M J Pickering and S Garrod ldquoToward a mechanistic psychol-ogy of dialoguerdquo Behavioral and Brain Sciences vol 27 no 2 pp169ndash190 2004

[176] C M Clarke and M F Garrett ldquoRapid adaptation to foreign-accented Englishrdquo The Journal of the Acoustical Society ofAmerica vol 116 no 6 pp 3647ndash3658 2004

[177] W J Levelt ldquoDie konnektionistische moderdquo Sprache UndKognition vol 10 no 2 pp 61ndash72 1991

[178] A Kostic ldquoInformational load constraints on processinginflected morphologyrdquo in Morphological Aspects of LanguageProcessing L B Feldman Ed Lawrence Erlbaum Inc Publish-ers New Jersey USA 1995

[179] J P BlevinsWord andParadigmMorphology OxfordUniversityPress 2016

[180] E Forstemann Uber Deutsche volksetymologie Zeitschrift furvergleichende Sprachforschung auf dem Gebiete des DeutschenGriechischen und Lateinischen 1852

[181] K Nault Morphological Therapy Protocal [Ph D thesis] Uni-versity of Alberta Edmonton 2010

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 35: The Discriminative Lexicon: A Unified Computational Model ...downloads.hindawi.com/journals/complexity/2019/4895891.pdf · as developed within computational linguistics and com-puter

Complexity 35

Data Availability

With the exception of the data from theDistributed Little RedHen Lab all primary data are publicly available from the citedsources

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The authors are indebted to Geoff Hollis Jessie Nixon ChrisWestbury and LuisMienhardt for their constructive feedbackon earlier versions of this manuscript This research wassupported by an ERC advancedGrant (no 742545) to the firstauthor

References

[1] G Frege ldquoBegriffsschrift a formula language modeled uponthat of arithmetic for pure thoughtrdquo in From Frege to GodelA Source Book in Mathematical Logic 1879-1931 pp 1ndash82 1967

[2] B RussellAn Inquiry intoMeaning andTruth Allen andUnwinLondon UK 1942

[3] R Montague ldquoThe proper treatment of quantification in ordi-nary englishrdquo in Approaches to Natural Language pp 221ndash242Springer 1973

[4] N Chomsky Syntactic Structures vol 33 Mouton The Hague1957

[5] W J M Levelt Formal Grammars in Linguistics And Psycholin-guistics Volume 1 An Introduction to the Theory of FormalLanguages and Automata Volume 2 Applications in LinguisticTheory Volume 3 Psycholinguistic Applications John BenjaminsPublishing 2008

[6] D F Kleinschmidt and T Florian Jaeger ldquoRobust speechperception Recognize the familiar generalize to the similar andadapt to the novelrdquo Psychological Review vol 122 no 2 pp 148ndash203 2015

[7] L Breiman ldquoRandom forestsrdquo Machine Learning vol 45 no 1pp 5ndash32 2001

[8] C Strobl A-L Boulesteix A Zeileis and T Hothorn ldquoBiasin random forest variable importance measures illustrationssources and a solutionrdquo BMC Bioinformatics vol 8 2007

[9] A Hannun C Case J Casper et al ldquoDeep speechScaling up end-to-end speech recognitionrdquo 2014httpsarxivorgabs14125567

[10] Z Tuske P Golik R Schluter and H Ney ldquoAcoustic modelingwith deep neural networks using raw time signal for lvcsrrdquo inProceedings of the INTERSPEECH pp 890ndash894 2014

[11] J Schmidhuber ldquoDeep learning in neural networks anoverviewrdquo Neural Networks vol 61 pp 85ndash117 2015

[12] T Hannagan J C Ziegler S Dufau J Fagot and J GraingerldquoDeep learning of orthographic representations in baboonsrdquoPLoS ONE vol 9 no 1 2014

[13] J Grainger S Dufau M Montant J C Ziegler and J FagotldquoOrthographic processing in baboons (Papio papio)rdquo Sciencevol 336 no 6078 pp 245ndash248 2012

[14] D Scarf K Boy A U Reinert J Devine O Gunturkun andM Colombo ldquoOrthographic processing in pigeons (columba

livia)rdquo Proceedings of the National Academy of Sciences vol 113no 40 pp 11272ndash11276 2016

[15] M Linke F Broker M Ramscar and R H Baayen ldquoArebaboons learning ldquoorthographicrdquo representations Probablynotrdquo PLoS ONE vol 12 no 8 2017

[16] D H Hubel and T N Wiesel ldquoReceptive fields binocularinteraction and functional architecture in the catrsquos visualcortexrdquo The Journal of Physiology vol 160 pp 106ndash154 1962

[17] R H Baayen and P Hendrix ldquoTwo-layer networks non-linear separation and human learningrdquo in From Semantics toDialectometry Festschrift in honor of JohnNerbonne Tributes 32M Wieling M Kroon G Van Noord and G Bouma Eds pp13ndash22 College Publications 2017

[18] D Arnold F Tomaschek K Sering F Lopez and R H BaayenldquoWords from spontaneous conversational speech can be rec-ognized with human-like accuracy by an error-driven learningalgorithm that discriminates between meanings straight fromsmart acoustic features bypassing the phoneme as recognitionunitrdquo PLoS ONE vol 12 no 4 2017

[19] P Zwitserlood ldquoProcessing and representation of morphologi-cal complexity in native language comprehension and produc-tionrdquo in The Construction of Words G E Booij Ed pp 583ndash602 Springer 2018

[20] G Booij ldquoConstruction morphologyrdquo Language and LinguisticsCompass vol 4 no 7 pp 543ndash555 2010

[21] J P Blevins ldquoWord-based morphologyrdquo Journal of Linguisticsvol 42 no 3 pp 531ndash573 2006

[22] G Stump Inflectional Morphology A Theory of ParadigmStructure Cambridge University Press 2001

[23] R Beard ldquoOn the extent and nature of irregularity in thelexiconrdquo Lingua vol 42 no 4 pp 305ndash341 1977

[24] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1974

[25] P H Matthews Morphology An Introduction to the Theory ofWord Structure Cambridge University Press Cambridge 1991

[26] C Hockett ldquoThe origin of speechrdquo Scientific American vol 203pp 89ndash97 1960

[27] M Erelt Estonian Language M Erelt Ed Estonian AcademyPublishers Tallinn Estonia 2003

[28] P H Matthews Grammatical Theory in the United States fromBloomfield to Chomsky Cambridge University Press Cam-bridge 1993

[29] N Chomsky andMHalleTheSound Pattern of English Harperand Row NY USA 1968

[30] T Landauer and S Dumais ldquoA solution to Platorsquos problem Thelatent semantic analysis theory of acquisition induction andrepresentation of knowledgerdquo Psychological Review vol 104 no2 pp 211ndash240 1997

[31] W Weaver ldquoTranslationrdquo in Machine Translation of LanguagesFourteen Essays W N Locke and A D Booth Eds pp 15ndash23MIT Press Cambridge 1955

[32] J R Firth Selected Papers of J R Firth 1952-59 IndianaUniversity Press 1968

[33] K Lund and C Burgess ldquoProducing high-dimensional seman-tic spaces from lexical co-occurrencerdquo Behavior Research Meth-ods Instruments and Computers vol 28 no 2 pp 203ndash2081996

[34] C Shaoul and C Westbury ldquoExploring lexical co-occurrencespace using HiDExrdquo Behavior Research Methods vol 42 no 2pp 393ndash413 2010

36 Complexity

[35] T Mikolov I Sutskever K Chen G Corrado and J DeanldquoDistributed representations of words and phrases and theircompositionalityrdquo inAdvances in Neural Information ProcessingSystems pp 3111ndash3119 2013

[36] A Roelofs ldquoThe WEAVER model of word-form encoding inspeech productionrdquo Cognition vol 64 no 3 pp 249ndash284 1997

[37] W J M Levelt A Roelofs and A S Meyer ldquoA theory of lexicalaccess in speech productionrdquoBehavioral and Brain Sciences vol22 no 1 pp 1ndash38 1999

[38] M Taft ldquoInteractive-activation as a Framework for Under-standing Morphological Processingrdquo Language and CognitiveProcesses vol 9 no 3 pp 271ndash294 1994

[39] R Schreuder and R H Baayen ldquoModeling morphologicalprocessingrdquo inMorphological Aspects of Language Processing LB Feldman Ed pp 131ndash154 Lawrence ErlbaumHillsdale NewJersey USA 1995

[40] B C Love D L Medin and T M Gureckis ldquoSUSTAIN ANetwork Model of Category Learningrdquo Psychological Reviewvol 111 no 2 pp 309ndash332 2004

[41] C J Marsolek ldquoWhat antipriming reveals about primingrdquoTrends in Cognitive Sciences vol 12 no 5 pp 176ndash181 2008

[42] M Ramscar and R Port ldquoCategorization (without categories)rdquoin Handbook of Cognitive Linguistics E Dabrowska and DDivjak Eds pp 75ndash99 De Gruyter Berlin Germany 2015

[43] R A Rescorla and A R Wagner ldquoA theory of Pavlovianconditioning Variations in the effectiveness of reinforcementand nonreinforcementrdquo in Classical Conditioning II CurrentResearch and Theory A H Black and W F Prokasy Eds pp64ndash99 Appleton Century Crofts NY USA 1972

[44] B Widrow and M E Hoff ldquoAdaptive switching circuitsrdquo 1960WESCON Convention Record Part IV pp 96ndash104 1960

[45] R H Baayen R Schreuder and R Sproat ldquoMorphology inthe Mental Lexicon A Computational Model for Visual WordRecognitionrdquo in Lexicon Development for Speech and LanguageProcessing F van Eynde and D Gibbon Eds pp 267ndash291Kluwer Academic Publishers 2000

[46] M Taft ldquoA morphological-decomposition model of lexicalrepresentationrdquo Linguistics vol 26 no 4 pp 657ndash667 1988

[47] MWHarmandM S Seidenberg ldquoComputing themeanings ofwords in reading Cooperative division of labor between visualand phonological processesrdquo Psychological Review vol 111 no3 pp 662ndash720 2004

[48] P C Trimmer J M McNamara A Houston and J A RMarshall ldquoDoes natural selection favour the Rescorla-Wagnerrulerdquo Journal of Theoretical Biology vol 302 pp 39ndash52 2012

[49] R R Miller R C Barnet and N J Grahame ldquoAssessment ofthe Rescorla-Wagner modelrdquo Psychological Bulletin vol 117 no3 pp 363ndash386 1995

[50] L J Kamin ldquoPredictability surprise attention and condition-ingrdquo in Punishment and Aversive Behavior B A Campbell andRMChurch Eds pp 276ndash296Appleton-Century-CroftsNYUSA 1969

[51] R A Rescorla ldquoPavlovian conditioning Itrsquos not what you thinkit isrdquo American Psychologist vol 43 no 3 pp 151ndash160 1988

[52] M Ramscar D Yarlett M Dye K Denny and K ThorpeldquoThe Effects of Feature-Label-Order and Their Implications forSymbolic Learningrdquo Cognitive Science vol 34 no 6 pp 909ndash957 2010

[53] M Ramscar and D Yarlett ldquoLinguistic self-correction in theabsence of feedback A new approach to the logical problem oflanguage acquisitionrdquo Cognitive Science vol 31 no 6 pp 927ndash960 2007

[54] R H Baayen P Milin D F ETHurđevic P Hendrix and MMarelli ldquoAn Amorphous Model for Morphological Process-ing in Visual Comprehension Based on Naive DiscriminativeLearningrdquo Psychological Review vol 118 no 3 pp 438ndash4812011

[55] F M Del Prado Martın R Bertram T Haikio R Schreuderand R H Baayen ldquoMorphological family size in a morphologi-cally rich languageThe case of Finnish compared to Dutch andHebrewrdquo Journal of Experimental Psychology LearningMemoryand Cognition vol 30 pp 1271ndash1278 2004

[56] P Milin D F Durdevic and F M Del Prado Martın ldquoThesimultaneous effects of inflectional paradigms and classes onlexical recognition Evidence from Serbianrdquo Journal of Memoryand Language vol 60 no 1 pp 50ndash64 2009

[57] B K Bergen ldquoThe psychological reality of phonaesthemesrdquoLanguage vol 80 no 2 pp 290ndash311 2004

[58] P Milin L B Feldman M Ramscar P Hendrix and R HBaayen ldquoDiscrimination in lexical decisionrdquo PLoS ONE vol 12no 2 2017

[59] R H Baayen P Milin and M Ramscar ldquoFrequency in lexicalprocessingrdquo Aphasiology vol 30 no 11 pp 1174ndash1220 2016

[60] M Marelli and M Baroni ldquoAffixation in semantic space Mod-eling morpheme meanings with compositional distributionalsemanticsrdquo Psychological Review vol 122 no 3 pp 485ndash5152015

[61] T Sering PMilin andRH Baayen ldquoLanguage comprehensionas amultiple label classification problemrdquo StatisticaNeerlandicapp 1ndash15 2018

[62] R Kaye and RWilson Linear Algebra Oxford University Press1998

[63] S H Ivens and B L Koslin Demands for Reading LiteracyRequire New Accountability Methods Touchstone Applied Sci-ence Associates 1991

[64] T K Landauer P W Foltz and D Laham ldquoAn introduction tolatent semantic analysisrdquoDiscourse Processes vol 25 no 2-3 pp259ndash284 1998

[65] H Schmid ldquoImprovements in Part-of-Speech Tagging with anApplication to Germanrdquo in Proceedings of the ACL SIGDAT-Workshop pp 13ndash25 Dublin Ireland 1995

[66] R H Baayen R Piepenbrock and L Gulikers The CELEXLexical Database (CD-ROM) Linguistic Data ConsortiumUniversity of Pennsylvania Philadelphia PA 1995

[67] J Mitchell and M Lapata ldquoVector-based models of semanticcompositionrdquo in Proceedings of the ACL pp 236ndash244 2008

[68] A Lazaridou M Marelli R Zamparelli and M BaronildquoCompositionally derived representations of morphologicallycomplex words in distributional semanticsrdquo in Proceedings ofthe ACL (1) pp 1517ndash1526 2013

[69] R Cotterell H Schutze and J Eisner ldquoMorphological smooth-ing and extrapolation of word embeddingsrdquo in Proceedings ofthe 54th Annual Meeting of the Association for ComputationalLinguistics vol 1 pp 1651ndash1660 2016

[70] B D Zeller S Pado and J Snajder ldquoTowards semantic valida-tion of a derivational lexiconrdquo in Proceedings of the COLING2014 the 25th International Conference on Computational Lin-guistics Technical Papers pp 1728ndash1739 2014

[71] H Marchand The Categories and Types of Present-Day EnglishWord Formation A Synchronic-Diachronic Approach BeckrsquoscheVerlagsbuchhandlung Munchen Germany 1969

[72] L Bauer EnglishWord-Formation Cambridge University PressCambridge 1983

Complexity 37

[73] I PlagWord-Formation in English CambridgeUniversity PressCambridge 2003

[74] R Beard Lexeme-Morpheme Base Morphology A GeneralTheory of Inflection and Word Formation State University ofNew York Press Albany NY USA 1995

[75] K Geeraert J Newman and R H Baayen ldquoIdiom VariationExperimental Data and a Blueprint of a Computational ModelrdquoTopics in Cognitive Science vol 9 no 3 pp 653ndash669 2017

[76] C Shaoul S Bitschau N Schilling et al ldquondl2 Naive discrim-inative learning an implementation in R R packagerdquo 2015

[77] P Milin D Divjak and R H Baayen ldquoA learning perspectiveon individual differences in skilled reading Exploring andexploiting orthographic and semantic discrimination cuesrdquoJournal of Experimental Psychology Learning Memory andCognition vol 43 no 11 pp 1730ndash1751 2017

[78] M Ramscar P Hendrix C Shaoul P Milin and R H BaayenldquoNonlinear dynamics of lifelong learning the myth of cognitivedeclinerdquo Topics in Cognitive Science vol 6 pp 5ndash42 2014

[79] M Ramscar C C Sun P Hendrix and R H Baayen ldquoTheMis-measurement of Mind Life-Span Changes in Paired-Associate-Learning Scores Reflect the ldquoCostrdquo of Learning Not CognitiveDeclinerdquo Psychological Science vol 28 no 8 pp 1171ndash1179 2017

[80] G desRosiers and D Ivison ldquoPaired associate learning Form 1and form 2 of the Wechsler memory scalerdquo Archives of ClinicalNeuropsychology vol 3 no 1 pp 47ndash67 1988

[81] E Bruni N K Tran andM Baroni ldquoMultimodal distributionalsemanticsrdquo Journal of Artificial Intelligence Research vol 49 pp1ndash47 2014

[82] W N Venables and B D RipleyModern Applied Statistics withS-PLUS Springer NY USA 2002

[83] A B Warriner V Kuperman and M Brysbaert ldquoNorms ofvalence arousal and dominance for 13915 English lemmasrdquoBehavior Research Methods vol 45 no 4 pp 1191ndash1207 2013

[84] D Kastovsky ldquoProductivity in word formationrdquo Linguistics vol24 no 3 pp 585ndash600 1986

[85] R H Baayen Y Chuang and J P Blevins ldquoInflectionalmorphology with linear mappingsrdquoTheMental Lexicon vol 13no 2 pp 232ndash270 2018

[86] Burnage CELEX A Guide for Users Centre for Lexical Infor-mation Nijmegen The Netherlands 1988

[87] C McBride-Chang ldquoModels of Speech Perception and Phono-logical Processing in Readingrdquo Child Development vol 67 no4 pp 1836ndash1856 1996

[88] G C Van Orden H Kloos et al ldquoThe question of phonologyand readingrdquo inThe Science of Reading A Handbook pp 61ndash782005

[89] D Jared and K OrsquoDonnell ldquoSkilled adult readers activate themeanings of high-frequency words using phonology Evidencefrom eye trackingrdquoMemoryampCognition vol 45 no 2 pp 334ndash346 2017

[90] G S Dell ldquoA Spreading-Activation Theory of Retrieval inSentence Productionrdquo Psychological Review vol 93 no 3 pp283ndash321 1986

[91] A Marantz ldquoNo escape from morphemes in morphologicalprocessingrdquo Language and Cognitive Processes vol 28 no 7 pp905ndash916 2013

[92] M Bozic W D Marslen-Wilson E A Stamatakis M HDavis and L K Tyler ldquoDifferentiating morphology formand meaning Neural correlates of morphological complexityrdquoCognitive Neuroscience vol 19 no 9 pp 1464ndash1475 2007

[93] R F Port andA P Leary ldquoAgainst formal phonologyrdquoLanguagevol 81 no 4 pp 927ndash964 2005

[94] J P Blevins ldquoStems and paradigmsrdquo Language vol 79 no 4 pp737ndash767 2003

[95] R Forsyth and D Holmes ldquoFeature-finding for test classifica-tionrdquo Literary and Linguistic Computing vol 11 no 4 pp 163ndash174 1996

[96] H Pham and R H Baayen ldquoVietnamese compounds showan anti-frequency effect in visual lexical decisionrdquo LanguageCognition and Neuroscience vol 30 no 9 pp 1077ndash1095 2015

[97] L Cohen and S Dehaene ldquoVentral and dorsal contributions toword readingrdquo inTheCognitive Neurosciences M S GazzanigaEd pp 789ndash804 Massachusetts Institute of Technology 2009

[98] E Keuleers P Lacey K Rastle and M Brysbaert ldquoThe BritishLexicon Project Lexical decision data for 28730 monosyllabicand disyllabic English wordsrdquo Behavior Research Methods vol44 no 1 pp 287ndash304 2012

[99] R K Wagner J K Torgesen and C A Rashotte ldquoDevel-opment of reading-related phonological processing abilitiesnew evidence of bidirectional causality from a latent variablelongitudinal studyrdquoDevelopmental Psychology vol 30 no 1 pp73ndash87 1994

[100] K F EWong andH-C Chen ldquoOrthographic and phonologicalprocessing in reading Chinese text Evidence from eye fixa-tionsrdquo Language and Cognitive Processes vol 14 no 5-6 pp461ndash480 1999

[101] R L Newman D Jared and C A Haigh ldquoDoes phonologyplay a role when skilled readers read high-frequency wordsEvidence from ERPsrdquo Language and Cognitive Processes vol 27no 9 pp 1361ndash1384 2012

[102] D Jared J Ashby S J Agauas and B A Levy ldquoPhonologicalactivation of word meanings in grade 5 readersrdquo Journal ofExperimental Psychology LearningMemory andCognition vol42 no 4 pp 524ndash541 2016

[103] T Bitan A Kaftory A Meiri-Leib Z Eviatar and O PelegldquoPhonological ambiguity modulates resolution of semanticambiguity during reading An fMRI study of Hebrewrdquo Neu-ropsychology vol 31 no 7 pp 759ndash777 2017

[104] D Jared and S Bainbridge ldquoReading homophone puns Evi-dence from eye trackingrdquo Canadian Journal of ExperimentalPsychology vol 71 no 1 pp 2ndash13 2017

[105] S Amenta M Marelli and S Sulpizio ldquoFrom sound tomeaning Phonology-to-Semantics mapping in visual wordrecognitionrdquo Psychonomic Bulletin amp Review vol 24 no 3 pp887ndash893 2017

[106] M Perrone-Bertolotti J Kujala J R Vidal et al ldquoHow silent issilent reading intracerebral evidence for top-down activationof temporal voice areas during readingrdquo The Journal of Neuro-science vol 32 no 49 pp 17554ndash17562 2012

[107] B Yao P Belin andC Scheepers ldquoSilent reading of direct versusindirect speech activates voice-selective areas in the auditorycortexrdquo Journal of Cognitive Neuroscience vol 23 no 10 pp3146ndash3152 2011

[108] M Coltheart B Curtis P Atkins and M Haller ldquoMod-els of Reading Aloud Dual-Route and Parallel-Distributed-Processing Approachesrdquo Psychological Review vol 100 no 4pp 589ndash608 1993

[109] M Coltheart ldquoModeling reading The dual-route approachrdquo inThe Science of Reading A Handbook pp 6ndash23 2005

38 Complexity

[110] M Pitt K Johnson E Hume S Kiesling and W RaymondldquoThe Buckeye corpus of conversational speech Labeling con-ventions and a test of transcriber reliabilityrdquo Speech Communi-cation vol 45 no 1 pp 89ndash95 2005

[111] K Johnson ldquoMassive reduction in conversational AmericanEnglishrdquo in Proceedings of the Spontaneous Speech Data andAnalysis Proceedings of the 1st Session of the 10th InternationalSymposium pp 29ndash54 The National International Institute forJapanese Language Tokyo Japan 2004

[112] C Phillips ldquoLevels of representation in the electrophysiology ofspeech perceptionrdquo Cognitive Science vol 25 no 5 pp 711ndash7312001

[113] R L Diehl A J Lotto and L L Holt ldquoSpeech perceptionrdquoAnnual Review of Psychology vol 55 pp 149ndash179 2004

[114] D Norris and J M McQueen ldquoShortlist B a Bayesian model ofcontinuous speech recognitionrdquo Psychological Review vol 115no 2 pp 357ndash395 2008

[115] S Hawkins ldquoRoles and representations of systematic finephonetic detail in speech understandingrdquo Journal of Phoneticsvol 31 no 3-4 pp 373ndash405 2003

[116] CCucchiarini andH Strik ldquoAutomatic Phonetic TranscriptionAn overviewrdquo in Proceedings of the 15th ICPhS pp 347ndash350Barcelona Spain 2003

[117] K JohnsonTheAuditoryPerceptual Basis for Speech Segmenta-tion Ohio State University Working Papers in Linguistics 1997

[118] H Fletcher ldquoAuditory patternsrdquo Reviews of Modern Physics vol12 no 1 pp 47ndash65 1940

[119] D Arnold ldquoAcousticndlcoder Coding sound files for use withndl R package version 101rdquo 2017

[120] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2016

[121] J Pickett and I Pollack ldquoIntelligibility of excerpts from fluentspeech Effects of rate of utterance and duration of excerptrdquoLanguage and Speech vol 6 pp 151ndash164 1963

[122] L Shockey ldquoPerception of reduced forms by non-native speak-ers of Englishrdquo in Sound Patterns of Spontaneous Speech DDuez Ed pp 97ndash100 ESCA Aix 1998

[123] M Ernestus H Baayen and R Schreuder ldquoThe recognition ofreduced word formsrdquo Brain and Language vol 81 no 1-3 pp162ndash173 2002

[124] R H Baayen C Shaoul J Willits and M Ramscar ldquoCom-prehension without segmentation a proof of concept withnaive discriminative learningrdquo Language Cognition and Neu-roscience vol 31 no 1 pp 106ndash128 2016

[125] C P Browman and L Goldstein ldquoArticulatory phonology Anoverviewrdquo Phonetica vol 49 no 3-4 pp 155ndash180 1992

[126] J A Tourville and F H Guenther ldquoThe DIVA model A neuraltheory of speech acquisition and productionrdquo Language andCognitive Processes vol 26 no 7 pp 952ndash981 2011

[127] G Csardi and T Nepusz ldquoThe igraph software package forcomplex network researchrdquo InterJournal Complex Systems vol1695 no 5 pp 1ndash9 2006

[128] M Ernestus and R H Baayen ldquoPredicting the unpredictableInterpreting neutralized segments in Dutchrdquo Language vol 79no 1 pp 5ndash38 2003

[129] R Weingarten G Nottbusch and U Will ldquoMorphemes sylla-bles and graphemes in written word productionrdquo inMultidisci-plinary Approaches to Speech Production T Pechmann and CHabel Eds pp 529ndash572 Mouton de Gruyter Berlin Germany2004

[130] R Bertram F E Toslashnnessen S Stromqvist J Hyona andP Niemi ldquoCascaded processing in written compound wordproductionrdquo Frontiers in Human Neuroscience vol 9 pp 1ndash102015

[131] T Cho ldquoEffects of morpheme boundaries on intergesturaltiming Evidence from Koreanrdquo Phonetica vol 58 no 3 pp129ndash162 2001

[132] S N Wood Generalized Additive Models Chapman ampHallCRC NY USA 2017

[133] M Seidenberg ldquoSublexical structures in visual word recogni-tion Access units or orthographic redundancyrdquo in Attentionand Performance XII M Coltheart Ed pp 245ndash264 LawrenceErlbaum Associates Hove 1987

[134] J M McQueen ldquoSegmentation of continuous speech usingphonotacticsrdquo Journal of Memory and Language vol 39 no 1pp 21ndash46 1998

[135] J B Hay Causes and Consequences of Word Structure Rout-ledge NY USA London UK 2003

[136] J B Hay ldquoFrom speech perception to morphology Affix-ordering revisitedrdquo Language vol 78 pp 527ndash555 2002

[137] J B Hay and R H Baayen ldquoPhonotactics parsing and produc-tivityrdquo Italian Journal of Linguistics vol 15 no 1 pp 99ndash1302003

[138] J Hay J Pierrehumbert and M Beckman ldquoSpeech perceptionwell-formedness and the statistics of the lexiconrdquo Papers inlaboratory phonology VI pp 58ndash74 2004

[139] M Ferro C Marzi and V Pirrelli ldquoA self-organizing modelof word storage and processing implications for morphologylearningrdquo Lingue e Linguaggio vol 10 no 2 pp 209ndash226 2011

[140] F ChersiM Ferro G Pezzulo andV Pirrelli ldquoTopological self-organization and prediction learning support both action andlexical chains in the brainrdquo Topics in Cognitive Science vol 6no 3 pp 476ndash491 2014

[141] V Pirrelli M Ferro and C Marzi ldquoComputational complexityof abstractive morphologyrdquo in Understanding and MeasuringMorphological Complexity M Bearman D Brown and G GCorbett Eds pp 141ndash166 Oxford University Press 2015

[142] C Marzi M Ferro and V Pirrelli ldquoIs inflectional irregularitydysfunctional to human processingrdquo in Abstract Booklet TheMental Lexicon 2018 V Kuperman Ed University of AlbertaEdmonton 2018

[143] E Keuleers M Stevens P Mandera and M Brysbaert ldquoWordknowledge in the crowd Measuring vocabulary size and wordprevalence in a massive online experimentrdquo The QuarterlyJournal of Experimental Psychology vol 68 no 8 pp 1665ndash16922015

[144] M Ernestus Voice Assimilation And Segment Reduction inCasual Dutch A Corpus-Based Study of the Phonology-PhoneticsInterface LOT Utrecht the Netherlands 2000

[145] R Kemps M Ernestus R Schreuder and H Baayen ldquoProcess-ing reducedword formsThe suffix restoration effectrdquoBrain andLanguage vol 90 no 1-3 pp 117ndash127 2004

[146] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) vol 1 pp 886ndash893 June 2005

[147] G Hickok ldquoThe architecture of speech production and the roleof the phoneme in speech processingrdquo Language Cognition andNeuroscience vol 29 no 1 pp 2ndash20 2014

[148] B V Tucker M Sims and R H Baayen Opposing Forceson Acoustic Duration Manuscript University of Alberta andUniversity of Tubingen 2018

Complexity 39

[149] I Plag J Homann and G Kunter ldquoHomophony and mor-phology The acoustics of word-final S in Englishrdquo Journal ofLinguistics vol 53 no 1 pp 181ndash216 2017

[150] F Tomaschek I Plag M Ernestus and R H BaayenModeling the Duration of Word-Final s in English withNaive Discriminative Learning Manuscript University ofSiegenTubingenNijmegen 2018

[151] R E Kalman ldquoA new approach to linear filtering and predictionproblemsrdquo Journal of Fluids Engineering vol 82 no 1 pp 35ndash451960

[152] B Russell ldquoOn denotingrdquo Mind vol 14 no 56 pp 479ndash4931905

[153] N Hornstein Logical Form FromGB toMinimalism Blackwell1995

[154] R Jackendoff Semantic StructuresMITPress Cambridge 1990[155] J J McCarthy ldquoA prosodic theory of non-concatenative mor-

phologyrdquo Linguistic Inquiry vol 12 pp 373ndash418 1981[156] A Ussishkin ldquoA Fixed Prosodic Theory of Nonconcatenative

Templaticmorphologyrdquo Natural Language amp Linguistic Theoryvol 23 no 1 pp 169ndash218 2005

[157] A Ussishkin ldquoAffix-favored contrast inequity and psycholin-guistic grounding for non-concatenative morphologyrdquo Mor-phology vol 16 no 1 pp 107ndash125 2006

[158] RW Young andWMorganTheNavajo Language A Grammarand Colloquial Dictionary University of New Mexico Press1980

[159] S C Levinson and A Majid ldquoThe island of time Yelı Dnye thelanguage of Rossel islandrdquo Frontiers in psychology vol 4 2013

[160] S C Levinson ldquoThe language of space in Yelı Dnyerdquo inGrammars of Space Explorations in Cognitive Diversity pp 157ndash203 Cambridge University Press 2006

[161] E Shafaei-Bajestan and R H Baayen ldquoWide Learning forAuditory Comprehensionrdquo in Proceedings of the Interspeech2018 2018

[162] J Hay and K Drager ldquoStuffed toys and speech perceptionrdquoLinguistics vol 48 no 4 pp 865ndash892 2010

[163] V Henri and C Henri ldquoOn our earliest recollections ofchildhoodrdquo Psychological Review vol 2 pp 215-216 1895

[164] S Freud ldquoChildhood and concealing memoriesrdquo in The BasicWritings of Sigmund Freud The Modern Library NY USA19051953

[165] P J Bauer and M Larkina ldquoThe onset of childhood amnesiain childhood A prospective investigation of the course anddeterminants of forgetting of early-life eventsrdquoMemory vol 22no 8 pp 907ndash924 2014

[166] M S Seidenberg and L M Gonnerman ldquoExplaining deriva-tional morphology as the convergence of codesrdquo Trends inCognitive Sciences vol 4 no 9 pp 353ndash361 2000

[167] M Coltheart K Rastle C Perry R Langdon and J ZieglerldquoDRC a dual route cascaded model of visual word recognitionand reading aloudrdquoPsychological Review vol 108 no 1 pp 204ndash256 2001

[168] M V Butz and E F Kutter How the Mind Comes into BeingIntroducing Cognitive Science from a Functional and Computa-tional Perspective Oxford University Press 2016

[169] M W Harm and M S Seidenberg ldquoPhonology readingacquisition and dyslexia Insights from connectionist modelsrdquoPsychological Review vol 106 no 3 pp 491ndash528 1999

[170] L M Gonnerman M S Seidenberg and E S AndersenldquoGraded semantic and phonological similarity effects in prim-ing Evidence for a distributed connectionist approach to

morphologyrdquo Journal of Experimental Psychology General vol136 no 2 pp 323ndash345 2007

[171] H Zhao O Gallo I Frosio and J Kautz ldquoLoss Functions forImage Restoration With Neural Networksrdquo IEEE Transactionson Computational Imaging vol 3 no 1 pp 47ndash57 2017

[172] C Clarke and P Luce ldquoPerceptual adaptation to speakercharacteristics Vot boundaries in stop voicing categorizationrdquoin Proceedings of the ISCA Workshop on Plasticity in SpeechPerception 2005

[173] DNorris JMMcQueen andACutler ldquoPerceptual learning inspeechrdquo Cognitive Psychology vol 47 no 2 pp 204ndash238 2003

[174] A Schweitzer and N Lewandowski ldquoSocial factors in conver-gence of f1 and f2 in spontaneous speechrdquo in Proceedings ofthe 10th International Seminar on Speech Production CologneGermany 2014

[175] M J Pickering and S Garrod ldquoToward a mechanistic psychol-ogy of dialoguerdquo Behavioral and Brain Sciences vol 27 no 2 pp169ndash190 2004

[176] C M Clarke and M F Garrett ldquoRapid adaptation to foreign-accented Englishrdquo The Journal of the Acoustical Society ofAmerica vol 116 no 6 pp 3647ndash3658 2004

[177] W J Levelt ldquoDie konnektionistische moderdquo Sprache UndKognition vol 10 no 2 pp 61ndash72 1991

[178] A Kostic ldquoInformational load constraints on processinginflected morphologyrdquo in Morphological Aspects of LanguageProcessing L B Feldman Ed Lawrence Erlbaum Inc Publish-ers New Jersey USA 1995

[179] J P BlevinsWord andParadigmMorphology OxfordUniversityPress 2016

[180] E Forstemann Uber Deutsche volksetymologie Zeitschrift furvergleichende Sprachforschung auf dem Gebiete des DeutschenGriechischen und Lateinischen 1852

[181] K Nault Morphological Therapy Protocal [Ph D thesis] Uni-versity of Alberta Edmonton 2010

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 36: The Discriminative Lexicon: A Unified Computational Model ...downloads.hindawi.com/journals/complexity/2019/4895891.pdf · as developed within computational linguistics and com-puter

36 Complexity

[35] T Mikolov I Sutskever K Chen G Corrado and J DeanldquoDistributed representations of words and phrases and theircompositionalityrdquo inAdvances in Neural Information ProcessingSystems pp 3111ndash3119 2013

[36] A Roelofs ldquoThe WEAVER model of word-form encoding inspeech productionrdquo Cognition vol 64 no 3 pp 249ndash284 1997

[37] W J M Levelt A Roelofs and A S Meyer ldquoA theory of lexicalaccess in speech productionrdquoBehavioral and Brain Sciences vol22 no 1 pp 1ndash38 1999

[38] M Taft ldquoInteractive-activation as a Framework for Under-standing Morphological Processingrdquo Language and CognitiveProcesses vol 9 no 3 pp 271ndash294 1994

[39] R Schreuder and R H Baayen ldquoModeling morphologicalprocessingrdquo inMorphological Aspects of Language Processing LB Feldman Ed pp 131ndash154 Lawrence ErlbaumHillsdale NewJersey USA 1995

[40] B C Love D L Medin and T M Gureckis ldquoSUSTAIN ANetwork Model of Category Learningrdquo Psychological Reviewvol 111 no 2 pp 309ndash332 2004

[41] C J Marsolek ldquoWhat antipriming reveals about primingrdquoTrends in Cognitive Sciences vol 12 no 5 pp 176ndash181 2008

[42] M Ramscar and R Port ldquoCategorization (without categories)rdquoin Handbook of Cognitive Linguistics E Dabrowska and DDivjak Eds pp 75ndash99 De Gruyter Berlin Germany 2015

[43] R A Rescorla and A R Wagner ldquoA theory of Pavlovianconditioning Variations in the effectiveness of reinforcementand nonreinforcementrdquo in Classical Conditioning II CurrentResearch and Theory A H Black and W F Prokasy Eds pp64ndash99 Appleton Century Crofts NY USA 1972

[44] B Widrow and M E Hoff ldquoAdaptive switching circuitsrdquo 1960WESCON Convention Record Part IV pp 96ndash104 1960

[45] R H Baayen R Schreuder and R Sproat ldquoMorphology inthe Mental Lexicon A Computational Model for Visual WordRecognitionrdquo in Lexicon Development for Speech and LanguageProcessing F van Eynde and D Gibbon Eds pp 267ndash291Kluwer Academic Publishers 2000

[46] M Taft ldquoA morphological-decomposition model of lexicalrepresentationrdquo Linguistics vol 26 no 4 pp 657ndash667 1988

[47] MWHarmandM S Seidenberg ldquoComputing themeanings ofwords in reading Cooperative division of labor between visualand phonological processesrdquo Psychological Review vol 111 no3 pp 662ndash720 2004

[48] P C Trimmer J M McNamara A Houston and J A RMarshall ldquoDoes natural selection favour the Rescorla-Wagnerrulerdquo Journal of Theoretical Biology vol 302 pp 39ndash52 2012

[49] R R Miller R C Barnet and N J Grahame ldquoAssessment ofthe Rescorla-Wagner modelrdquo Psychological Bulletin vol 117 no3 pp 363ndash386 1995

[50] L J Kamin ldquoPredictability surprise attention and condition-ingrdquo in Punishment and Aversive Behavior B A Campbell andRMChurch Eds pp 276ndash296Appleton-Century-CroftsNYUSA 1969

[51] R A Rescorla ldquoPavlovian conditioning Itrsquos not what you thinkit isrdquo American Psychologist vol 43 no 3 pp 151ndash160 1988

[52] M Ramscar D Yarlett M Dye K Denny and K ThorpeldquoThe Effects of Feature-Label-Order and Their Implications forSymbolic Learningrdquo Cognitive Science vol 34 no 6 pp 909ndash957 2010

[53] M Ramscar and D Yarlett ldquoLinguistic self-correction in theabsence of feedback A new approach to the logical problem oflanguage acquisitionrdquo Cognitive Science vol 31 no 6 pp 927ndash960 2007

[54] R H Baayen P Milin D F ETHurđevic P Hendrix and MMarelli ldquoAn Amorphous Model for Morphological Process-ing in Visual Comprehension Based on Naive DiscriminativeLearningrdquo Psychological Review vol 118 no 3 pp 438ndash4812011

[55] F M Del Prado Martın R Bertram T Haikio R Schreuderand R H Baayen ldquoMorphological family size in a morphologi-cally rich languageThe case of Finnish compared to Dutch andHebrewrdquo Journal of Experimental Psychology LearningMemoryand Cognition vol 30 pp 1271ndash1278 2004

[56] P Milin D F Durdevic and F M Del Prado Martın ldquoThesimultaneous effects of inflectional paradigms and classes onlexical recognition Evidence from Serbianrdquo Journal of Memoryand Language vol 60 no 1 pp 50ndash64 2009

[57] B K Bergen ldquoThe psychological reality of phonaesthemesrdquoLanguage vol 80 no 2 pp 290ndash311 2004

[58] P Milin L B Feldman M Ramscar P Hendrix and R HBaayen ldquoDiscrimination in lexical decisionrdquo PLoS ONE vol 12no 2 2017

[59] R H Baayen P Milin and M Ramscar ldquoFrequency in lexicalprocessingrdquo Aphasiology vol 30 no 11 pp 1174ndash1220 2016

[60] M Marelli and M Baroni ldquoAffixation in semantic space Mod-eling morpheme meanings with compositional distributionalsemanticsrdquo Psychological Review vol 122 no 3 pp 485ndash5152015

[61] T Sering PMilin andRH Baayen ldquoLanguage comprehensionas amultiple label classification problemrdquo StatisticaNeerlandicapp 1ndash15 2018

[62] R Kaye and RWilson Linear Algebra Oxford University Press1998

[63] S H Ivens and B L Koslin Demands for Reading LiteracyRequire New Accountability Methods Touchstone Applied Sci-ence Associates 1991

[64] T K Landauer P W Foltz and D Laham ldquoAn introduction tolatent semantic analysisrdquoDiscourse Processes vol 25 no 2-3 pp259ndash284 1998

[65] H Schmid ldquoImprovements in Part-of-Speech Tagging with anApplication to Germanrdquo in Proceedings of the ACL SIGDAT-Workshop pp 13ndash25 Dublin Ireland 1995

[66] R H Baayen R Piepenbrock and L Gulikers The CELEXLexical Database (CD-ROM) Linguistic Data ConsortiumUniversity of Pennsylvania Philadelphia PA 1995

[67] J Mitchell and M Lapata ldquoVector-based models of semanticcompositionrdquo in Proceedings of the ACL pp 236ndash244 2008

[68] A Lazaridou M Marelli R Zamparelli and M BaronildquoCompositionally derived representations of morphologicallycomplex words in distributional semanticsrdquo in Proceedings ofthe ACL (1) pp 1517ndash1526 2013

[69] R Cotterell H Schutze and J Eisner ldquoMorphological smooth-ing and extrapolation of word embeddingsrdquo in Proceedings ofthe 54th Annual Meeting of the Association for ComputationalLinguistics vol 1 pp 1651ndash1660 2016

[70] B D Zeller S Pado and J Snajder ldquoTowards semantic valida-tion of a derivational lexiconrdquo in Proceedings of the COLING2014 the 25th International Conference on Computational Lin-guistics Technical Papers pp 1728ndash1739 2014

[71] H Marchand The Categories and Types of Present-Day EnglishWord Formation A Synchronic-Diachronic Approach BeckrsquoscheVerlagsbuchhandlung Munchen Germany 1969

[72] L Bauer EnglishWord-Formation Cambridge University PressCambridge 1983

Complexity 37

[73] I PlagWord-Formation in English CambridgeUniversity PressCambridge 2003

[74] R Beard Lexeme-Morpheme Base Morphology A GeneralTheory of Inflection and Word Formation State University ofNew York Press Albany NY USA 1995

[75] K Geeraert J Newman and R H Baayen ldquoIdiom VariationExperimental Data and a Blueprint of a Computational ModelrdquoTopics in Cognitive Science vol 9 no 3 pp 653ndash669 2017

[76] C Shaoul S Bitschau N Schilling et al ldquondl2 Naive discrim-inative learning an implementation in R R packagerdquo 2015

[77] P Milin D Divjak and R H Baayen ldquoA learning perspectiveon individual differences in skilled reading Exploring andexploiting orthographic and semantic discrimination cuesrdquoJournal of Experimental Psychology Learning Memory andCognition vol 43 no 11 pp 1730ndash1751 2017

[78] M Ramscar P Hendrix C Shaoul P Milin and R H BaayenldquoNonlinear dynamics of lifelong learning the myth of cognitivedeclinerdquo Topics in Cognitive Science vol 6 pp 5ndash42 2014

[79] M Ramscar C C Sun P Hendrix and R H Baayen ldquoTheMis-measurement of Mind Life-Span Changes in Paired-Associate-Learning Scores Reflect the ldquoCostrdquo of Learning Not CognitiveDeclinerdquo Psychological Science vol 28 no 8 pp 1171ndash1179 2017

[80] G desRosiers and D Ivison ldquoPaired associate learning Form 1and form 2 of the Wechsler memory scalerdquo Archives of ClinicalNeuropsychology vol 3 no 1 pp 47ndash67 1988

[81] E Bruni N K Tran andM Baroni ldquoMultimodal distributionalsemanticsrdquo Journal of Artificial Intelligence Research vol 49 pp1ndash47 2014

[82] W N Venables and B D RipleyModern Applied Statistics withS-PLUS Springer NY USA 2002

[83] A B Warriner V Kuperman and M Brysbaert ldquoNorms ofvalence arousal and dominance for 13915 English lemmasrdquoBehavior Research Methods vol 45 no 4 pp 1191ndash1207 2013

[84] D Kastovsky ldquoProductivity in word formationrdquo Linguistics vol24 no 3 pp 585ndash600 1986

[85] R H Baayen Y Chuang and J P Blevins ldquoInflectionalmorphology with linear mappingsrdquoTheMental Lexicon vol 13no 2 pp 232ndash270 2018

[86] Burnage CELEX A Guide for Users Centre for Lexical Infor-mation Nijmegen The Netherlands 1988

[87] C McBride-Chang ldquoModels of Speech Perception and Phono-logical Processing in Readingrdquo Child Development vol 67 no4 pp 1836ndash1856 1996

[88] G C Van Orden H Kloos et al ldquoThe question of phonologyand readingrdquo inThe Science of Reading A Handbook pp 61ndash782005

[89] D Jared and K OrsquoDonnell ldquoSkilled adult readers activate themeanings of high-frequency words using phonology Evidencefrom eye trackingrdquoMemoryampCognition vol 45 no 2 pp 334ndash346 2017

[90] G S Dell ldquoA Spreading-Activation Theory of Retrieval inSentence Productionrdquo Psychological Review vol 93 no 3 pp283ndash321 1986

[91] A Marantz ldquoNo escape from morphemes in morphologicalprocessingrdquo Language and Cognitive Processes vol 28 no 7 pp905ndash916 2013

[92] M Bozic W D Marslen-Wilson E A Stamatakis M HDavis and L K Tyler ldquoDifferentiating morphology formand meaning Neural correlates of morphological complexityrdquoCognitive Neuroscience vol 19 no 9 pp 1464ndash1475 2007

[93] R F Port andA P Leary ldquoAgainst formal phonologyrdquoLanguagevol 81 no 4 pp 927ndash964 2005

[94] J P Blevins ldquoStems and paradigmsrdquo Language vol 79 no 4 pp737ndash767 2003

[95] R Forsyth and D Holmes ldquoFeature-finding for test classifica-tionrdquo Literary and Linguistic Computing vol 11 no 4 pp 163ndash174 1996

[96] H Pham and R H Baayen ldquoVietnamese compounds showan anti-frequency effect in visual lexical decisionrdquo LanguageCognition and Neuroscience vol 30 no 9 pp 1077ndash1095 2015

[97] L Cohen and S Dehaene ldquoVentral and dorsal contributions toword readingrdquo inTheCognitive Neurosciences M S GazzanigaEd pp 789ndash804 Massachusetts Institute of Technology 2009

[98] E Keuleers P Lacey K Rastle and M Brysbaert ldquoThe BritishLexicon Project Lexical decision data for 28730 monosyllabicand disyllabic English wordsrdquo Behavior Research Methods vol44 no 1 pp 287ndash304 2012

[99] R K Wagner J K Torgesen and C A Rashotte ldquoDevel-opment of reading-related phonological processing abilitiesnew evidence of bidirectional causality from a latent variablelongitudinal studyrdquoDevelopmental Psychology vol 30 no 1 pp73ndash87 1994

[100] K F EWong andH-C Chen ldquoOrthographic and phonologicalprocessing in reading Chinese text Evidence from eye fixa-tionsrdquo Language and Cognitive Processes vol 14 no 5-6 pp461ndash480 1999

[101] R L Newman D Jared and C A Haigh ldquoDoes phonologyplay a role when skilled readers read high-frequency wordsEvidence from ERPsrdquo Language and Cognitive Processes vol 27no 9 pp 1361ndash1384 2012

[102] D Jared J Ashby S J Agauas and B A Levy ldquoPhonologicalactivation of word meanings in grade 5 readersrdquo Journal ofExperimental Psychology LearningMemory andCognition vol42 no 4 pp 524ndash541 2016

[103] T Bitan A Kaftory A Meiri-Leib Z Eviatar and O PelegldquoPhonological ambiguity modulates resolution of semanticambiguity during reading An fMRI study of Hebrewrdquo Neu-ropsychology vol 31 no 7 pp 759ndash777 2017

[104] D Jared and S Bainbridge ldquoReading homophone puns Evi-dence from eye trackingrdquo Canadian Journal of ExperimentalPsychology vol 71 no 1 pp 2ndash13 2017

[105] S Amenta M Marelli and S Sulpizio ldquoFrom sound tomeaning Phonology-to-Semantics mapping in visual wordrecognitionrdquo Psychonomic Bulletin amp Review vol 24 no 3 pp887ndash893 2017

[106] M Perrone-Bertolotti J Kujala J R Vidal et al ldquoHow silent issilent reading intracerebral evidence for top-down activationof temporal voice areas during readingrdquo The Journal of Neuro-science vol 32 no 49 pp 17554ndash17562 2012

[107] B Yao P Belin andC Scheepers ldquoSilent reading of direct versusindirect speech activates voice-selective areas in the auditorycortexrdquo Journal of Cognitive Neuroscience vol 23 no 10 pp3146ndash3152 2011

[108] M Coltheart B Curtis P Atkins and M Haller ldquoMod-els of Reading Aloud Dual-Route and Parallel-Distributed-Processing Approachesrdquo Psychological Review vol 100 no 4pp 589ndash608 1993

[109] M Coltheart ldquoModeling reading The dual-route approachrdquo inThe Science of Reading A Handbook pp 6ndash23 2005

38 Complexity

[110] M Pitt K Johnson E Hume S Kiesling and W RaymondldquoThe Buckeye corpus of conversational speech Labeling con-ventions and a test of transcriber reliabilityrdquo Speech Communi-cation vol 45 no 1 pp 89ndash95 2005

[111] K Johnson ldquoMassive reduction in conversational AmericanEnglishrdquo in Proceedings of the Spontaneous Speech Data andAnalysis Proceedings of the 1st Session of the 10th InternationalSymposium pp 29ndash54 The National International Institute forJapanese Language Tokyo Japan 2004

[112] C Phillips ldquoLevels of representation in the electrophysiology ofspeech perceptionrdquo Cognitive Science vol 25 no 5 pp 711ndash7312001

[113] R L Diehl A J Lotto and L L Holt ldquoSpeech perceptionrdquoAnnual Review of Psychology vol 55 pp 149ndash179 2004

[114] D Norris and J M McQueen ldquoShortlist B a Bayesian model ofcontinuous speech recognitionrdquo Psychological Review vol 115no 2 pp 357ndash395 2008

[115] S Hawkins ldquoRoles and representations of systematic finephonetic detail in speech understandingrdquo Journal of Phoneticsvol 31 no 3-4 pp 373ndash405 2003

[116] CCucchiarini andH Strik ldquoAutomatic Phonetic TranscriptionAn overviewrdquo in Proceedings of the 15th ICPhS pp 347ndash350Barcelona Spain 2003

[117] K JohnsonTheAuditoryPerceptual Basis for Speech Segmenta-tion Ohio State University Working Papers in Linguistics 1997

[118] H Fletcher ldquoAuditory patternsrdquo Reviews of Modern Physics vol12 no 1 pp 47ndash65 1940

[119] D Arnold ldquoAcousticndlcoder Coding sound files for use withndl R package version 101rdquo 2017

[120] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2016

[121] J Pickett and I Pollack ldquoIntelligibility of excerpts from fluentspeech Effects of rate of utterance and duration of excerptrdquoLanguage and Speech vol 6 pp 151ndash164 1963

[122] L Shockey ldquoPerception of reduced forms by non-native speak-ers of Englishrdquo in Sound Patterns of Spontaneous Speech DDuez Ed pp 97ndash100 ESCA Aix 1998

[123] M Ernestus H Baayen and R Schreuder ldquoThe recognition ofreduced word formsrdquo Brain and Language vol 81 no 1-3 pp162ndash173 2002

[124] R H Baayen C Shaoul J Willits and M Ramscar ldquoCom-prehension without segmentation a proof of concept withnaive discriminative learningrdquo Language Cognition and Neu-roscience vol 31 no 1 pp 106ndash128 2016

[125] C P Browman and L Goldstein ldquoArticulatory phonology Anoverviewrdquo Phonetica vol 49 no 3-4 pp 155ndash180 1992

[126] J A Tourville and F H Guenther ldquoThe DIVA model A neuraltheory of speech acquisition and productionrdquo Language andCognitive Processes vol 26 no 7 pp 952ndash981 2011

[127] G Csardi and T Nepusz ldquoThe igraph software package forcomplex network researchrdquo InterJournal Complex Systems vol1695 no 5 pp 1ndash9 2006

[128] M Ernestus and R H Baayen ldquoPredicting the unpredictableInterpreting neutralized segments in Dutchrdquo Language vol 79no 1 pp 5ndash38 2003

[129] R Weingarten G Nottbusch and U Will ldquoMorphemes sylla-bles and graphemes in written word productionrdquo inMultidisci-plinary Approaches to Speech Production T Pechmann and CHabel Eds pp 529ndash572 Mouton de Gruyter Berlin Germany2004

[130] R Bertram F E Toslashnnessen S Stromqvist J Hyona andP Niemi ldquoCascaded processing in written compound wordproductionrdquo Frontiers in Human Neuroscience vol 9 pp 1ndash102015

[131] T Cho ldquoEffects of morpheme boundaries on intergesturaltiming Evidence from Koreanrdquo Phonetica vol 58 no 3 pp129ndash162 2001

[132] S N Wood Generalized Additive Models Chapman ampHallCRC NY USA 2017

[133] M Seidenberg ldquoSublexical structures in visual word recogni-tion Access units or orthographic redundancyrdquo in Attentionand Performance XII M Coltheart Ed pp 245ndash264 LawrenceErlbaum Associates Hove 1987

[134] J M McQueen ldquoSegmentation of continuous speech usingphonotacticsrdquo Journal of Memory and Language vol 39 no 1pp 21ndash46 1998

[135] J B Hay Causes and Consequences of Word Structure Rout-ledge NY USA London UK 2003

[136] J B Hay ldquoFrom speech perception to morphology Affix-ordering revisitedrdquo Language vol 78 pp 527ndash555 2002

[137] J B Hay and R H Baayen ldquoPhonotactics parsing and produc-tivityrdquo Italian Journal of Linguistics vol 15 no 1 pp 99ndash1302003

[138] J Hay J Pierrehumbert and M Beckman ldquoSpeech perceptionwell-formedness and the statistics of the lexiconrdquo Papers inlaboratory phonology VI pp 58ndash74 2004

[139] M Ferro C Marzi and V Pirrelli ldquoA self-organizing modelof word storage and processing implications for morphologylearningrdquo Lingue e Linguaggio vol 10 no 2 pp 209ndash226 2011

[140] F ChersiM Ferro G Pezzulo andV Pirrelli ldquoTopological self-organization and prediction learning support both action andlexical chains in the brainrdquo Topics in Cognitive Science vol 6no 3 pp 476ndash491 2014

[141] V Pirrelli M Ferro and C Marzi ldquoComputational complexityof abstractive morphologyrdquo in Understanding and MeasuringMorphological Complexity M Bearman D Brown and G GCorbett Eds pp 141ndash166 Oxford University Press 2015

[142] C Marzi M Ferro and V Pirrelli ldquoIs inflectional irregularitydysfunctional to human processingrdquo in Abstract Booklet TheMental Lexicon 2018 V Kuperman Ed University of AlbertaEdmonton 2018

[143] E Keuleers M Stevens P Mandera and M Brysbaert ldquoWordknowledge in the crowd Measuring vocabulary size and wordprevalence in a massive online experimentrdquo The QuarterlyJournal of Experimental Psychology vol 68 no 8 pp 1665ndash16922015

[144] M Ernestus Voice Assimilation And Segment Reduction inCasual Dutch A Corpus-Based Study of the Phonology-PhoneticsInterface LOT Utrecht the Netherlands 2000

[145] R Kemps M Ernestus R Schreuder and H Baayen ldquoProcess-ing reducedword formsThe suffix restoration effectrdquoBrain andLanguage vol 90 no 1-3 pp 117ndash127 2004

[146] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) vol 1 pp 886ndash893 June 2005

[147] G Hickok ldquoThe architecture of speech production and the roleof the phoneme in speech processingrdquo Language Cognition andNeuroscience vol 29 no 1 pp 2ndash20 2014

[148] B V Tucker M Sims and R H Baayen Opposing Forceson Acoustic Duration Manuscript University of Alberta andUniversity of Tubingen 2018

Complexity 39

[149] I Plag J Homann and G Kunter ldquoHomophony and mor-phology The acoustics of word-final S in Englishrdquo Journal ofLinguistics vol 53 no 1 pp 181ndash216 2017

[150] F Tomaschek I Plag M Ernestus and R H BaayenModeling the Duration of Word-Final s in English withNaive Discriminative Learning Manuscript University ofSiegenTubingenNijmegen 2018

[151] R E Kalman ldquoA new approach to linear filtering and predictionproblemsrdquo Journal of Fluids Engineering vol 82 no 1 pp 35ndash451960

[152] B Russell ldquoOn denotingrdquo Mind vol 14 no 56 pp 479ndash4931905

[153] N Hornstein Logical Form FromGB toMinimalism Blackwell1995

[154] R Jackendoff Semantic StructuresMITPress Cambridge 1990[155] J J McCarthy ldquoA prosodic theory of non-concatenative mor-

phologyrdquo Linguistic Inquiry vol 12 pp 373ndash418 1981[156] A Ussishkin ldquoA Fixed Prosodic Theory of Nonconcatenative

Templaticmorphologyrdquo Natural Language amp Linguistic Theoryvol 23 no 1 pp 169ndash218 2005

[157] A Ussishkin ldquoAffix-favored contrast inequity and psycholin-guistic grounding for non-concatenative morphologyrdquo Mor-phology vol 16 no 1 pp 107ndash125 2006

[158] RW Young andWMorganTheNavajo Language A Grammarand Colloquial Dictionary University of New Mexico Press1980

[159] S C Levinson and A Majid ldquoThe island of time Yelı Dnye thelanguage of Rossel islandrdquo Frontiers in psychology vol 4 2013

[160] S C Levinson ldquoThe language of space in Yelı Dnyerdquo inGrammars of Space Explorations in Cognitive Diversity pp 157ndash203 Cambridge University Press 2006

[161] E Shafaei-Bajestan and R H Baayen ldquoWide Learning forAuditory Comprehensionrdquo in Proceedings of the Interspeech2018 2018

[162] J Hay and K Drager ldquoStuffed toys and speech perceptionrdquoLinguistics vol 48 no 4 pp 865ndash892 2010

[163] V Henri and C Henri ldquoOn our earliest recollections ofchildhoodrdquo Psychological Review vol 2 pp 215-216 1895

[164] S Freud ldquoChildhood and concealing memoriesrdquo in The BasicWritings of Sigmund Freud The Modern Library NY USA19051953

[165] P J Bauer and M Larkina ldquoThe onset of childhood amnesiain childhood A prospective investigation of the course anddeterminants of forgetting of early-life eventsrdquoMemory vol 22no 8 pp 907ndash924 2014

[166] M S Seidenberg and L M Gonnerman ldquoExplaining deriva-tional morphology as the convergence of codesrdquo Trends inCognitive Sciences vol 4 no 9 pp 353ndash361 2000

[167] M Coltheart K Rastle C Perry R Langdon and J ZieglerldquoDRC a dual route cascaded model of visual word recognitionand reading aloudrdquoPsychological Review vol 108 no 1 pp 204ndash256 2001

[168] M V Butz and E F Kutter How the Mind Comes into BeingIntroducing Cognitive Science from a Functional and Computa-tional Perspective Oxford University Press 2016

[169] M W Harm and M S Seidenberg ldquoPhonology readingacquisition and dyslexia Insights from connectionist modelsrdquoPsychological Review vol 106 no 3 pp 491ndash528 1999

[170] L M Gonnerman M S Seidenberg and E S AndersenldquoGraded semantic and phonological similarity effects in prim-ing Evidence for a distributed connectionist approach to

morphologyrdquo Journal of Experimental Psychology General vol136 no 2 pp 323ndash345 2007

[171] H Zhao O Gallo I Frosio and J Kautz ldquoLoss Functions forImage Restoration With Neural Networksrdquo IEEE Transactionson Computational Imaging vol 3 no 1 pp 47ndash57 2017

[172] C Clarke and P Luce ldquoPerceptual adaptation to speakercharacteristics Vot boundaries in stop voicing categorizationrdquoin Proceedings of the ISCA Workshop on Plasticity in SpeechPerception 2005

[173] DNorris JMMcQueen andACutler ldquoPerceptual learning inspeechrdquo Cognitive Psychology vol 47 no 2 pp 204ndash238 2003

[174] A Schweitzer and N Lewandowski ldquoSocial factors in conver-gence of f1 and f2 in spontaneous speechrdquo in Proceedings ofthe 10th International Seminar on Speech Production CologneGermany 2014

[175] M J Pickering and S Garrod ldquoToward a mechanistic psychol-ogy of dialoguerdquo Behavioral and Brain Sciences vol 27 no 2 pp169ndash190 2004

[176] C M Clarke and M F Garrett ldquoRapid adaptation to foreign-accented Englishrdquo The Journal of the Acoustical Society ofAmerica vol 116 no 6 pp 3647ndash3658 2004

[177] W J Levelt ldquoDie konnektionistische moderdquo Sprache UndKognition vol 10 no 2 pp 61ndash72 1991

[178] A Kostic ldquoInformational load constraints on processinginflected morphologyrdquo in Morphological Aspects of LanguageProcessing L B Feldman Ed Lawrence Erlbaum Inc Publish-ers New Jersey USA 1995

[179] J P BlevinsWord andParadigmMorphology OxfordUniversityPress 2016

[180] E Forstemann Uber Deutsche volksetymologie Zeitschrift furvergleichende Sprachforschung auf dem Gebiete des DeutschenGriechischen und Lateinischen 1852

[181] K Nault Morphological Therapy Protocal [Ph D thesis] Uni-versity of Alberta Edmonton 2010

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 37: The Discriminative Lexicon: A Unified Computational Model ...downloads.hindawi.com/journals/complexity/2019/4895891.pdf · as developed within computational linguistics and com-puter

Complexity 37

[73] I PlagWord-Formation in English CambridgeUniversity PressCambridge 2003

[74] R Beard Lexeme-Morpheme Base Morphology A GeneralTheory of Inflection and Word Formation State University ofNew York Press Albany NY USA 1995

[75] K Geeraert J Newman and R H Baayen ldquoIdiom VariationExperimental Data and a Blueprint of a Computational ModelrdquoTopics in Cognitive Science vol 9 no 3 pp 653ndash669 2017

[76] C Shaoul S Bitschau N Schilling et al ldquondl2 Naive discrim-inative learning an implementation in R R packagerdquo 2015

[77] P Milin D Divjak and R H Baayen ldquoA learning perspectiveon individual differences in skilled reading Exploring andexploiting orthographic and semantic discrimination cuesrdquoJournal of Experimental Psychology Learning Memory andCognition vol 43 no 11 pp 1730ndash1751 2017

[78] M Ramscar P Hendrix C Shaoul P Milin and R H BaayenldquoNonlinear dynamics of lifelong learning the myth of cognitivedeclinerdquo Topics in Cognitive Science vol 6 pp 5ndash42 2014

[79] M Ramscar C C Sun P Hendrix and R H Baayen ldquoTheMis-measurement of Mind Life-Span Changes in Paired-Associate-Learning Scores Reflect the ldquoCostrdquo of Learning Not CognitiveDeclinerdquo Psychological Science vol 28 no 8 pp 1171ndash1179 2017

[80] G desRosiers and D Ivison ldquoPaired associate learning Form 1and form 2 of the Wechsler memory scalerdquo Archives of ClinicalNeuropsychology vol 3 no 1 pp 47ndash67 1988

[81] E Bruni N K Tran andM Baroni ldquoMultimodal distributionalsemanticsrdquo Journal of Artificial Intelligence Research vol 49 pp1ndash47 2014

[82] W N Venables and B D RipleyModern Applied Statistics withS-PLUS Springer NY USA 2002

[83] A B Warriner V Kuperman and M Brysbaert ldquoNorms ofvalence arousal and dominance for 13915 English lemmasrdquoBehavior Research Methods vol 45 no 4 pp 1191ndash1207 2013

[84] D Kastovsky ldquoProductivity in word formationrdquo Linguistics vol24 no 3 pp 585ndash600 1986

[85] R H Baayen Y Chuang and J P Blevins ldquoInflectionalmorphology with linear mappingsrdquoTheMental Lexicon vol 13no 2 pp 232ndash270 2018

[86] Burnage CELEX A Guide for Users Centre for Lexical Infor-mation Nijmegen The Netherlands 1988

[87] C McBride-Chang ldquoModels of Speech Perception and Phono-logical Processing in Readingrdquo Child Development vol 67 no4 pp 1836ndash1856 1996

[88] G C Van Orden H Kloos et al ldquoThe question of phonologyand readingrdquo inThe Science of Reading A Handbook pp 61ndash782005

[89] D Jared and K OrsquoDonnell ldquoSkilled adult readers activate themeanings of high-frequency words using phonology Evidencefrom eye trackingrdquoMemoryampCognition vol 45 no 2 pp 334ndash346 2017

[90] G S Dell ldquoA Spreading-Activation Theory of Retrieval inSentence Productionrdquo Psychological Review vol 93 no 3 pp283ndash321 1986

[91] A Marantz ldquoNo escape from morphemes in morphologicalprocessingrdquo Language and Cognitive Processes vol 28 no 7 pp905ndash916 2013

[92] M Bozic W D Marslen-Wilson E A Stamatakis M HDavis and L K Tyler ldquoDifferentiating morphology formand meaning Neural correlates of morphological complexityrdquoCognitive Neuroscience vol 19 no 9 pp 1464ndash1475 2007

[93] R F Port andA P Leary ldquoAgainst formal phonologyrdquoLanguagevol 81 no 4 pp 927ndash964 2005

[94] J P Blevins ldquoStems and paradigmsrdquo Language vol 79 no 4 pp737ndash767 2003

[95] R Forsyth and D Holmes ldquoFeature-finding for test classifica-tionrdquo Literary and Linguistic Computing vol 11 no 4 pp 163ndash174 1996

[96] H Pham and R H Baayen ldquoVietnamese compounds showan anti-frequency effect in visual lexical decisionrdquo LanguageCognition and Neuroscience vol 30 no 9 pp 1077ndash1095 2015

[97] L Cohen and S Dehaene ldquoVentral and dorsal contributions toword readingrdquo inTheCognitive Neurosciences M S GazzanigaEd pp 789ndash804 Massachusetts Institute of Technology 2009

[98] E Keuleers P Lacey K Rastle and M Brysbaert ldquoThe BritishLexicon Project Lexical decision data for 28730 monosyllabicand disyllabic English wordsrdquo Behavior Research Methods vol44 no 1 pp 287ndash304 2012

[99] R K Wagner J K Torgesen and C A Rashotte ldquoDevel-opment of reading-related phonological processing abilitiesnew evidence of bidirectional causality from a latent variablelongitudinal studyrdquoDevelopmental Psychology vol 30 no 1 pp73ndash87 1994

[100] K F EWong andH-C Chen ldquoOrthographic and phonologicalprocessing in reading Chinese text Evidence from eye fixa-tionsrdquo Language and Cognitive Processes vol 14 no 5-6 pp461ndash480 1999

[101] R L Newman D Jared and C A Haigh ldquoDoes phonologyplay a role when skilled readers read high-frequency wordsEvidence from ERPsrdquo Language and Cognitive Processes vol 27no 9 pp 1361ndash1384 2012

[102] D Jared J Ashby S J Agauas and B A Levy ldquoPhonologicalactivation of word meanings in grade 5 readersrdquo Journal ofExperimental Psychology LearningMemory andCognition vol42 no 4 pp 524ndash541 2016

[103] T Bitan A Kaftory A Meiri-Leib Z Eviatar and O PelegldquoPhonological ambiguity modulates resolution of semanticambiguity during reading An fMRI study of Hebrewrdquo Neu-ropsychology vol 31 no 7 pp 759ndash777 2017

[104] D Jared and S Bainbridge ldquoReading homophone puns Evi-dence from eye trackingrdquo Canadian Journal of ExperimentalPsychology vol 71 no 1 pp 2ndash13 2017

[105] S Amenta M Marelli and S Sulpizio ldquoFrom sound tomeaning Phonology-to-Semantics mapping in visual wordrecognitionrdquo Psychonomic Bulletin amp Review vol 24 no 3 pp887ndash893 2017

[106] M Perrone-Bertolotti J Kujala J R Vidal et al ldquoHow silent issilent reading intracerebral evidence for top-down activationof temporal voice areas during readingrdquo The Journal of Neuro-science vol 32 no 49 pp 17554ndash17562 2012

[107] B Yao P Belin andC Scheepers ldquoSilent reading of direct versusindirect speech activates voice-selective areas in the auditorycortexrdquo Journal of Cognitive Neuroscience vol 23 no 10 pp3146ndash3152 2011

[108] M Coltheart B Curtis P Atkins and M Haller ldquoMod-els of Reading Aloud Dual-Route and Parallel-Distributed-Processing Approachesrdquo Psychological Review vol 100 no 4pp 589ndash608 1993

[109] M Coltheart ldquoModeling reading The dual-route approachrdquo inThe Science of Reading A Handbook pp 6ndash23 2005

38 Complexity

[110] M Pitt K Johnson E Hume S Kiesling and W RaymondldquoThe Buckeye corpus of conversational speech Labeling con-ventions and a test of transcriber reliabilityrdquo Speech Communi-cation vol 45 no 1 pp 89ndash95 2005

[111] K Johnson ldquoMassive reduction in conversational AmericanEnglishrdquo in Proceedings of the Spontaneous Speech Data andAnalysis Proceedings of the 1st Session of the 10th InternationalSymposium pp 29ndash54 The National International Institute forJapanese Language Tokyo Japan 2004

[112] C Phillips ldquoLevels of representation in the electrophysiology ofspeech perceptionrdquo Cognitive Science vol 25 no 5 pp 711ndash7312001

[113] R L Diehl A J Lotto and L L Holt ldquoSpeech perceptionrdquoAnnual Review of Psychology vol 55 pp 149ndash179 2004

[114] D Norris and J M McQueen ldquoShortlist B a Bayesian model ofcontinuous speech recognitionrdquo Psychological Review vol 115no 2 pp 357ndash395 2008

[115] S Hawkins ldquoRoles and representations of systematic finephonetic detail in speech understandingrdquo Journal of Phoneticsvol 31 no 3-4 pp 373ndash405 2003

[116] CCucchiarini andH Strik ldquoAutomatic Phonetic TranscriptionAn overviewrdquo in Proceedings of the 15th ICPhS pp 347ndash350Barcelona Spain 2003

[117] K JohnsonTheAuditoryPerceptual Basis for Speech Segmenta-tion Ohio State University Working Papers in Linguistics 1997

[118] H Fletcher ldquoAuditory patternsrdquo Reviews of Modern Physics vol12 no 1 pp 47ndash65 1940

[119] D Arnold ldquoAcousticndlcoder Coding sound files for use withndl R package version 101rdquo 2017

[120] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2016

[121] J Pickett and I Pollack ldquoIntelligibility of excerpts from fluentspeech Effects of rate of utterance and duration of excerptrdquoLanguage and Speech vol 6 pp 151ndash164 1963

[122] L Shockey ldquoPerception of reduced forms by non-native speak-ers of Englishrdquo in Sound Patterns of Spontaneous Speech DDuez Ed pp 97ndash100 ESCA Aix 1998

[123] M Ernestus H Baayen and R Schreuder ldquoThe recognition ofreduced word formsrdquo Brain and Language vol 81 no 1-3 pp162ndash173 2002

[124] R H Baayen C Shaoul J Willits and M Ramscar ldquoCom-prehension without segmentation a proof of concept withnaive discriminative learningrdquo Language Cognition and Neu-roscience vol 31 no 1 pp 106ndash128 2016

[125] C P Browman and L Goldstein ldquoArticulatory phonology Anoverviewrdquo Phonetica vol 49 no 3-4 pp 155ndash180 1992

[126] J A Tourville and F H Guenther ldquoThe DIVA model A neuraltheory of speech acquisition and productionrdquo Language andCognitive Processes vol 26 no 7 pp 952ndash981 2011

[127] G Csardi and T Nepusz ldquoThe igraph software package forcomplex network researchrdquo InterJournal Complex Systems vol1695 no 5 pp 1ndash9 2006

[128] M Ernestus and R H Baayen ldquoPredicting the unpredictableInterpreting neutralized segments in Dutchrdquo Language vol 79no 1 pp 5ndash38 2003

[129] R Weingarten G Nottbusch and U Will ldquoMorphemes sylla-bles and graphemes in written word productionrdquo inMultidisci-plinary Approaches to Speech Production T Pechmann and CHabel Eds pp 529ndash572 Mouton de Gruyter Berlin Germany2004

[130] R Bertram F E Toslashnnessen S Stromqvist J Hyona andP Niemi ldquoCascaded processing in written compound wordproductionrdquo Frontiers in Human Neuroscience vol 9 pp 1ndash102015

[131] T Cho ldquoEffects of morpheme boundaries on intergesturaltiming Evidence from Koreanrdquo Phonetica vol 58 no 3 pp129ndash162 2001

[132] S N Wood Generalized Additive Models Chapman ampHallCRC NY USA 2017

[133] M Seidenberg ldquoSublexical structures in visual word recogni-tion Access units or orthographic redundancyrdquo in Attentionand Performance XII M Coltheart Ed pp 245ndash264 LawrenceErlbaum Associates Hove 1987

[134] J M McQueen ldquoSegmentation of continuous speech usingphonotacticsrdquo Journal of Memory and Language vol 39 no 1pp 21ndash46 1998

[135] J B Hay Causes and Consequences of Word Structure Rout-ledge NY USA London UK 2003

[136] J B Hay ldquoFrom speech perception to morphology Affix-ordering revisitedrdquo Language vol 78 pp 527ndash555 2002

[137] J B Hay and R H Baayen ldquoPhonotactics parsing and produc-tivityrdquo Italian Journal of Linguistics vol 15 no 1 pp 99ndash1302003

[138] J Hay J Pierrehumbert and M Beckman ldquoSpeech perceptionwell-formedness and the statistics of the lexiconrdquo Papers inlaboratory phonology VI pp 58ndash74 2004

[139] M Ferro C Marzi and V Pirrelli ldquoA self-organizing modelof word storage and processing implications for morphologylearningrdquo Lingue e Linguaggio vol 10 no 2 pp 209ndash226 2011

[140] F ChersiM Ferro G Pezzulo andV Pirrelli ldquoTopological self-organization and prediction learning support both action andlexical chains in the brainrdquo Topics in Cognitive Science vol 6no 3 pp 476ndash491 2014

[141] V Pirrelli M Ferro and C Marzi ldquoComputational complexityof abstractive morphologyrdquo in Understanding and MeasuringMorphological Complexity M Bearman D Brown and G GCorbett Eds pp 141ndash166 Oxford University Press 2015

[142] C Marzi M Ferro and V Pirrelli ldquoIs inflectional irregularitydysfunctional to human processingrdquo in Abstract Booklet TheMental Lexicon 2018 V Kuperman Ed University of AlbertaEdmonton 2018

[143] E Keuleers M Stevens P Mandera and M Brysbaert ldquoWordknowledge in the crowd Measuring vocabulary size and wordprevalence in a massive online experimentrdquo The QuarterlyJournal of Experimental Psychology vol 68 no 8 pp 1665ndash16922015

[144] M Ernestus Voice Assimilation And Segment Reduction inCasual Dutch A Corpus-Based Study of the Phonology-PhoneticsInterface LOT Utrecht the Netherlands 2000

[145] R Kemps M Ernestus R Schreuder and H Baayen ldquoProcess-ing reducedword formsThe suffix restoration effectrdquoBrain andLanguage vol 90 no 1-3 pp 117ndash127 2004

[146] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) vol 1 pp 886ndash893 June 2005

[147] G Hickok ldquoThe architecture of speech production and the roleof the phoneme in speech processingrdquo Language Cognition andNeuroscience vol 29 no 1 pp 2ndash20 2014

[148] B V Tucker M Sims and R H Baayen Opposing Forceson Acoustic Duration Manuscript University of Alberta andUniversity of Tubingen 2018

Complexity 39

[149] I Plag J Homann and G Kunter ldquoHomophony and mor-phology The acoustics of word-final S in Englishrdquo Journal ofLinguistics vol 53 no 1 pp 181ndash216 2017

[150] F Tomaschek I Plag M Ernestus and R H BaayenModeling the Duration of Word-Final s in English withNaive Discriminative Learning Manuscript University ofSiegenTubingenNijmegen 2018

[151] R E Kalman ldquoA new approach to linear filtering and predictionproblemsrdquo Journal of Fluids Engineering vol 82 no 1 pp 35ndash451960

[152] B Russell ldquoOn denotingrdquo Mind vol 14 no 56 pp 479ndash4931905

[153] N Hornstein Logical Form FromGB toMinimalism Blackwell1995

[154] R Jackendoff Semantic StructuresMITPress Cambridge 1990[155] J J McCarthy ldquoA prosodic theory of non-concatenative mor-

phologyrdquo Linguistic Inquiry vol 12 pp 373ndash418 1981[156] A Ussishkin ldquoA Fixed Prosodic Theory of Nonconcatenative

Templaticmorphologyrdquo Natural Language amp Linguistic Theoryvol 23 no 1 pp 169ndash218 2005

[157] A Ussishkin ldquoAffix-favored contrast inequity and psycholin-guistic grounding for non-concatenative morphologyrdquo Mor-phology vol 16 no 1 pp 107ndash125 2006

[158] RW Young andWMorganTheNavajo Language A Grammarand Colloquial Dictionary University of New Mexico Press1980

[159] S C Levinson and A Majid ldquoThe island of time Yelı Dnye thelanguage of Rossel islandrdquo Frontiers in psychology vol 4 2013

[160] S C Levinson ldquoThe language of space in Yelı Dnyerdquo inGrammars of Space Explorations in Cognitive Diversity pp 157ndash203 Cambridge University Press 2006

[161] E Shafaei-Bajestan and R H Baayen ldquoWide Learning forAuditory Comprehensionrdquo in Proceedings of the Interspeech2018 2018

[162] J Hay and K Drager ldquoStuffed toys and speech perceptionrdquoLinguistics vol 48 no 4 pp 865ndash892 2010

[163] V Henri and C Henri ldquoOn our earliest recollections ofchildhoodrdquo Psychological Review vol 2 pp 215-216 1895

[164] S Freud ldquoChildhood and concealing memoriesrdquo in The BasicWritings of Sigmund Freud The Modern Library NY USA19051953

[165] P J Bauer and M Larkina ldquoThe onset of childhood amnesiain childhood A prospective investigation of the course anddeterminants of forgetting of early-life eventsrdquoMemory vol 22no 8 pp 907ndash924 2014

[166] M S Seidenberg and L M Gonnerman ldquoExplaining deriva-tional morphology as the convergence of codesrdquo Trends inCognitive Sciences vol 4 no 9 pp 353ndash361 2000

[167] M Coltheart K Rastle C Perry R Langdon and J ZieglerldquoDRC a dual route cascaded model of visual word recognitionand reading aloudrdquoPsychological Review vol 108 no 1 pp 204ndash256 2001

[168] M V Butz and E F Kutter How the Mind Comes into BeingIntroducing Cognitive Science from a Functional and Computa-tional Perspective Oxford University Press 2016

[169] M W Harm and M S Seidenberg ldquoPhonology readingacquisition and dyslexia Insights from connectionist modelsrdquoPsychological Review vol 106 no 3 pp 491ndash528 1999

[170] L M Gonnerman M S Seidenberg and E S AndersenldquoGraded semantic and phonological similarity effects in prim-ing Evidence for a distributed connectionist approach to

morphologyrdquo Journal of Experimental Psychology General vol136 no 2 pp 323ndash345 2007

[171] H Zhao O Gallo I Frosio and J Kautz ldquoLoss Functions forImage Restoration With Neural Networksrdquo IEEE Transactionson Computational Imaging vol 3 no 1 pp 47ndash57 2017

[172] C Clarke and P Luce ldquoPerceptual adaptation to speakercharacteristics Vot boundaries in stop voicing categorizationrdquoin Proceedings of the ISCA Workshop on Plasticity in SpeechPerception 2005

[173] DNorris JMMcQueen andACutler ldquoPerceptual learning inspeechrdquo Cognitive Psychology vol 47 no 2 pp 204ndash238 2003

[174] A Schweitzer and N Lewandowski ldquoSocial factors in conver-gence of f1 and f2 in spontaneous speechrdquo in Proceedings ofthe 10th International Seminar on Speech Production CologneGermany 2014

[175] M J Pickering and S Garrod ldquoToward a mechanistic psychol-ogy of dialoguerdquo Behavioral and Brain Sciences vol 27 no 2 pp169ndash190 2004

[176] C M Clarke and M F Garrett ldquoRapid adaptation to foreign-accented Englishrdquo The Journal of the Acoustical Society ofAmerica vol 116 no 6 pp 3647ndash3658 2004

[177] W J Levelt ldquoDie konnektionistische moderdquo Sprache UndKognition vol 10 no 2 pp 61ndash72 1991

[178] A Kostic ldquoInformational load constraints on processinginflected morphologyrdquo in Morphological Aspects of LanguageProcessing L B Feldman Ed Lawrence Erlbaum Inc Publish-ers New Jersey USA 1995

[179] J P BlevinsWord andParadigmMorphology OxfordUniversityPress 2016

[180] E Forstemann Uber Deutsche volksetymologie Zeitschrift furvergleichende Sprachforschung auf dem Gebiete des DeutschenGriechischen und Lateinischen 1852

[181] K Nault Morphological Therapy Protocal [Ph D thesis] Uni-versity of Alberta Edmonton 2010

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 38: The Discriminative Lexicon: A Unified Computational Model ...downloads.hindawi.com/journals/complexity/2019/4895891.pdf · as developed within computational linguistics and com-puter

38 Complexity

[110] M Pitt K Johnson E Hume S Kiesling and W RaymondldquoThe Buckeye corpus of conversational speech Labeling con-ventions and a test of transcriber reliabilityrdquo Speech Communi-cation vol 45 no 1 pp 89ndash95 2005

[111] K Johnson ldquoMassive reduction in conversational AmericanEnglishrdquo in Proceedings of the Spontaneous Speech Data andAnalysis Proceedings of the 1st Session of the 10th InternationalSymposium pp 29ndash54 The National International Institute forJapanese Language Tokyo Japan 2004

[112] C Phillips ldquoLevels of representation in the electrophysiology ofspeech perceptionrdquo Cognitive Science vol 25 no 5 pp 711ndash7312001

[113] R L Diehl A J Lotto and L L Holt ldquoSpeech perceptionrdquoAnnual Review of Psychology vol 55 pp 149ndash179 2004

[114] D Norris and J M McQueen ldquoShortlist B a Bayesian model ofcontinuous speech recognitionrdquo Psychological Review vol 115no 2 pp 357ndash395 2008

[115] S Hawkins ldquoRoles and representations of systematic finephonetic detail in speech understandingrdquo Journal of Phoneticsvol 31 no 3-4 pp 373ndash405 2003

[116] CCucchiarini andH Strik ldquoAutomatic Phonetic TranscriptionAn overviewrdquo in Proceedings of the 15th ICPhS pp 347ndash350Barcelona Spain 2003

[117] K JohnsonTheAuditoryPerceptual Basis for Speech Segmenta-tion Ohio State University Working Papers in Linguistics 1997

[118] H Fletcher ldquoAuditory patternsrdquo Reviews of Modern Physics vol12 no 1 pp 47ndash65 1940

[119] D Arnold ldquoAcousticndlcoder Coding sound files for use withndl R package version 101rdquo 2017

[120] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2016

[121] J Pickett and I Pollack ldquoIntelligibility of excerpts from fluentspeech Effects of rate of utterance and duration of excerptrdquoLanguage and Speech vol 6 pp 151ndash164 1963

[122] L Shockey ldquoPerception of reduced forms by non-native speak-ers of Englishrdquo in Sound Patterns of Spontaneous Speech DDuez Ed pp 97ndash100 ESCA Aix 1998

[123] M Ernestus H Baayen and R Schreuder ldquoThe recognition ofreduced word formsrdquo Brain and Language vol 81 no 1-3 pp162ndash173 2002

[124] R H Baayen C Shaoul J Willits and M Ramscar ldquoCom-prehension without segmentation a proof of concept withnaive discriminative learningrdquo Language Cognition and Neu-roscience vol 31 no 1 pp 106ndash128 2016

[125] C P Browman and L Goldstein ldquoArticulatory phonology Anoverviewrdquo Phonetica vol 49 no 3-4 pp 155ndash180 1992

[126] J A Tourville and F H Guenther ldquoThe DIVA model A neuraltheory of speech acquisition and productionrdquo Language andCognitive Processes vol 26 no 7 pp 952ndash981 2011

[127] G Csardi and T Nepusz ldquoThe igraph software package forcomplex network researchrdquo InterJournal Complex Systems vol1695 no 5 pp 1ndash9 2006

[128] M Ernestus and R H Baayen ldquoPredicting the unpredictableInterpreting neutralized segments in Dutchrdquo Language vol 79no 1 pp 5ndash38 2003

[129] R Weingarten G Nottbusch and U Will ldquoMorphemes sylla-bles and graphemes in written word productionrdquo inMultidisci-plinary Approaches to Speech Production T Pechmann and CHabel Eds pp 529ndash572 Mouton de Gruyter Berlin Germany2004

[130] R Bertram F E Toslashnnessen S Stromqvist J Hyona andP Niemi ldquoCascaded processing in written compound wordproductionrdquo Frontiers in Human Neuroscience vol 9 pp 1ndash102015

[131] T Cho ldquoEffects of morpheme boundaries on intergesturaltiming Evidence from Koreanrdquo Phonetica vol 58 no 3 pp129ndash162 2001

[132] S N Wood Generalized Additive Models Chapman ampHallCRC NY USA 2017

[133] M Seidenberg ldquoSublexical structures in visual word recogni-tion Access units or orthographic redundancyrdquo in Attentionand Performance XII M Coltheart Ed pp 245ndash264 LawrenceErlbaum Associates Hove 1987

[134] J M McQueen ldquoSegmentation of continuous speech usingphonotacticsrdquo Journal of Memory and Language vol 39 no 1pp 21ndash46 1998

[135] J B Hay Causes and Consequences of Word Structure Rout-ledge NY USA London UK 2003

[136] J B Hay ldquoFrom speech perception to morphology Affix-ordering revisitedrdquo Language vol 78 pp 527ndash555 2002

[137] J B Hay and R H Baayen ldquoPhonotactics parsing and produc-tivityrdquo Italian Journal of Linguistics vol 15 no 1 pp 99ndash1302003

[138] J Hay J Pierrehumbert and M Beckman ldquoSpeech perceptionwell-formedness and the statistics of the lexiconrdquo Papers inlaboratory phonology VI pp 58ndash74 2004

[139] M Ferro C Marzi and V Pirrelli ldquoA self-organizing modelof word storage and processing implications for morphologylearningrdquo Lingue e Linguaggio vol 10 no 2 pp 209ndash226 2011

[140] F ChersiM Ferro G Pezzulo andV Pirrelli ldquoTopological self-organization and prediction learning support both action andlexical chains in the brainrdquo Topics in Cognitive Science vol 6no 3 pp 476ndash491 2014

[141] V Pirrelli M Ferro and C Marzi ldquoComputational complexityof abstractive morphologyrdquo in Understanding and MeasuringMorphological Complexity M Bearman D Brown and G GCorbett Eds pp 141ndash166 Oxford University Press 2015

[142] C Marzi M Ferro and V Pirrelli ldquoIs inflectional irregularitydysfunctional to human processingrdquo in Abstract Booklet TheMental Lexicon 2018 V Kuperman Ed University of AlbertaEdmonton 2018

[143] E Keuleers M Stevens P Mandera and M Brysbaert ldquoWordknowledge in the crowd Measuring vocabulary size and wordprevalence in a massive online experimentrdquo The QuarterlyJournal of Experimental Psychology vol 68 no 8 pp 1665ndash16922015

[144] M Ernestus Voice Assimilation And Segment Reduction inCasual Dutch A Corpus-Based Study of the Phonology-PhoneticsInterface LOT Utrecht the Netherlands 2000

[145] R Kemps M Ernestus R Schreuder and H Baayen ldquoProcess-ing reducedword formsThe suffix restoration effectrdquoBrain andLanguage vol 90 no 1-3 pp 117ndash127 2004

[146] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) vol 1 pp 886ndash893 June 2005

[147] G Hickok ldquoThe architecture of speech production and the roleof the phoneme in speech processingrdquo Language Cognition andNeuroscience vol 29 no 1 pp 2ndash20 2014

[148] B V Tucker M Sims and R H Baayen Opposing Forceson Acoustic Duration Manuscript University of Alberta andUniversity of Tubingen 2018

Complexity 39

[149] I Plag J Homann and G Kunter ldquoHomophony and mor-phology The acoustics of word-final S in Englishrdquo Journal ofLinguistics vol 53 no 1 pp 181ndash216 2017

[150] F Tomaschek I Plag M Ernestus and R H BaayenModeling the Duration of Word-Final s in English withNaive Discriminative Learning Manuscript University ofSiegenTubingenNijmegen 2018

[151] R E Kalman ldquoA new approach to linear filtering and predictionproblemsrdquo Journal of Fluids Engineering vol 82 no 1 pp 35ndash451960

[152] B Russell ldquoOn denotingrdquo Mind vol 14 no 56 pp 479ndash4931905

[153] N Hornstein Logical Form FromGB toMinimalism Blackwell1995

[154] R Jackendoff Semantic StructuresMITPress Cambridge 1990[155] J J McCarthy ldquoA prosodic theory of non-concatenative mor-

phologyrdquo Linguistic Inquiry vol 12 pp 373ndash418 1981[156] A Ussishkin ldquoA Fixed Prosodic Theory of Nonconcatenative

Templaticmorphologyrdquo Natural Language amp Linguistic Theoryvol 23 no 1 pp 169ndash218 2005

[157] A Ussishkin ldquoAffix-favored contrast inequity and psycholin-guistic grounding for non-concatenative morphologyrdquo Mor-phology vol 16 no 1 pp 107ndash125 2006

[158] RW Young andWMorganTheNavajo Language A Grammarand Colloquial Dictionary University of New Mexico Press1980

[159] S C Levinson and A Majid ldquoThe island of time Yelı Dnye thelanguage of Rossel islandrdquo Frontiers in psychology vol 4 2013

[160] S C Levinson ldquoThe language of space in Yelı Dnyerdquo inGrammars of Space Explorations in Cognitive Diversity pp 157ndash203 Cambridge University Press 2006

[161] E Shafaei-Bajestan and R H Baayen ldquoWide Learning forAuditory Comprehensionrdquo in Proceedings of the Interspeech2018 2018

[162] J Hay and K Drager ldquoStuffed toys and speech perceptionrdquoLinguistics vol 48 no 4 pp 865ndash892 2010

[163] V Henri and C Henri ldquoOn our earliest recollections ofchildhoodrdquo Psychological Review vol 2 pp 215-216 1895

[164] S Freud ldquoChildhood and concealing memoriesrdquo in The BasicWritings of Sigmund Freud The Modern Library NY USA19051953

[165] P J Bauer and M Larkina ldquoThe onset of childhood amnesiain childhood A prospective investigation of the course anddeterminants of forgetting of early-life eventsrdquoMemory vol 22no 8 pp 907ndash924 2014

[166] M S Seidenberg and L M Gonnerman ldquoExplaining deriva-tional morphology as the convergence of codesrdquo Trends inCognitive Sciences vol 4 no 9 pp 353ndash361 2000

[167] M Coltheart K Rastle C Perry R Langdon and J ZieglerldquoDRC a dual route cascaded model of visual word recognitionand reading aloudrdquoPsychological Review vol 108 no 1 pp 204ndash256 2001

[168] M V Butz and E F Kutter How the Mind Comes into BeingIntroducing Cognitive Science from a Functional and Computa-tional Perspective Oxford University Press 2016

[169] M W Harm and M S Seidenberg ldquoPhonology readingacquisition and dyslexia Insights from connectionist modelsrdquoPsychological Review vol 106 no 3 pp 491ndash528 1999

[170] L M Gonnerman M S Seidenberg and E S AndersenldquoGraded semantic and phonological similarity effects in prim-ing Evidence for a distributed connectionist approach to

morphologyrdquo Journal of Experimental Psychology General vol136 no 2 pp 323ndash345 2007

[171] H Zhao O Gallo I Frosio and J Kautz ldquoLoss Functions forImage Restoration With Neural Networksrdquo IEEE Transactionson Computational Imaging vol 3 no 1 pp 47ndash57 2017

[172] C Clarke and P Luce ldquoPerceptual adaptation to speakercharacteristics Vot boundaries in stop voicing categorizationrdquoin Proceedings of the ISCA Workshop on Plasticity in SpeechPerception 2005

[173] DNorris JMMcQueen andACutler ldquoPerceptual learning inspeechrdquo Cognitive Psychology vol 47 no 2 pp 204ndash238 2003

[174] A Schweitzer and N Lewandowski ldquoSocial factors in conver-gence of f1 and f2 in spontaneous speechrdquo in Proceedings ofthe 10th International Seminar on Speech Production CologneGermany 2014

[175] M J Pickering and S Garrod ldquoToward a mechanistic psychol-ogy of dialoguerdquo Behavioral and Brain Sciences vol 27 no 2 pp169ndash190 2004

[176] C M Clarke and M F Garrett ldquoRapid adaptation to foreign-accented Englishrdquo The Journal of the Acoustical Society ofAmerica vol 116 no 6 pp 3647ndash3658 2004

[177] W J Levelt ldquoDie konnektionistische moderdquo Sprache UndKognition vol 10 no 2 pp 61ndash72 1991

[178] A Kostic ldquoInformational load constraints on processinginflected morphologyrdquo in Morphological Aspects of LanguageProcessing L B Feldman Ed Lawrence Erlbaum Inc Publish-ers New Jersey USA 1995

[179] J P BlevinsWord andParadigmMorphology OxfordUniversityPress 2016

[180] E Forstemann Uber Deutsche volksetymologie Zeitschrift furvergleichende Sprachforschung auf dem Gebiete des DeutschenGriechischen und Lateinischen 1852

[181] K Nault Morphological Therapy Protocal [Ph D thesis] Uni-versity of Alberta Edmonton 2010

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 39: The Discriminative Lexicon: A Unified Computational Model ...downloads.hindawi.com/journals/complexity/2019/4895891.pdf · as developed within computational linguistics and com-puter

Complexity 39

[149] I Plag J Homann and G Kunter ldquoHomophony and mor-phology The acoustics of word-final S in Englishrdquo Journal ofLinguistics vol 53 no 1 pp 181ndash216 2017

[150] F Tomaschek I Plag M Ernestus and R H BaayenModeling the Duration of Word-Final s in English withNaive Discriminative Learning Manuscript University ofSiegenTubingenNijmegen 2018

[151] R E Kalman ldquoA new approach to linear filtering and predictionproblemsrdquo Journal of Fluids Engineering vol 82 no 1 pp 35ndash451960

[152] B Russell ldquoOn denotingrdquo Mind vol 14 no 56 pp 479ndash4931905

[153] N Hornstein Logical Form FromGB toMinimalism Blackwell1995

[154] R Jackendoff Semantic StructuresMITPress Cambridge 1990[155] J J McCarthy ldquoA prosodic theory of non-concatenative mor-

phologyrdquo Linguistic Inquiry vol 12 pp 373ndash418 1981[156] A Ussishkin ldquoA Fixed Prosodic Theory of Nonconcatenative

Templaticmorphologyrdquo Natural Language amp Linguistic Theoryvol 23 no 1 pp 169ndash218 2005

[157] A Ussishkin ldquoAffix-favored contrast inequity and psycholin-guistic grounding for non-concatenative morphologyrdquo Mor-phology vol 16 no 1 pp 107ndash125 2006

[158] RW Young andWMorganTheNavajo Language A Grammarand Colloquial Dictionary University of New Mexico Press1980

[159] S C Levinson and A Majid ldquoThe island of time Yelı Dnye thelanguage of Rossel islandrdquo Frontiers in psychology vol 4 2013

[160] S C Levinson ldquoThe language of space in Yelı Dnyerdquo inGrammars of Space Explorations in Cognitive Diversity pp 157ndash203 Cambridge University Press 2006

[161] E Shafaei-Bajestan and R H Baayen ldquoWide Learning forAuditory Comprehensionrdquo in Proceedings of the Interspeech2018 2018

[162] J Hay and K Drager ldquoStuffed toys and speech perceptionrdquoLinguistics vol 48 no 4 pp 865ndash892 2010

[163] V Henri and C Henri ldquoOn our earliest recollections ofchildhoodrdquo Psychological Review vol 2 pp 215-216 1895

[164] S Freud ldquoChildhood and concealing memoriesrdquo in The BasicWritings of Sigmund Freud The Modern Library NY USA19051953

[165] P J Bauer and M Larkina ldquoThe onset of childhood amnesiain childhood A prospective investigation of the course anddeterminants of forgetting of early-life eventsrdquoMemory vol 22no 8 pp 907ndash924 2014

[166] M S Seidenberg and L M Gonnerman ldquoExplaining deriva-tional morphology as the convergence of codesrdquo Trends inCognitive Sciences vol 4 no 9 pp 353ndash361 2000

[167] M Coltheart K Rastle C Perry R Langdon and J ZieglerldquoDRC a dual route cascaded model of visual word recognitionand reading aloudrdquoPsychological Review vol 108 no 1 pp 204ndash256 2001

[168] M V Butz and E F Kutter How the Mind Comes into BeingIntroducing Cognitive Science from a Functional and Computa-tional Perspective Oxford University Press 2016

[169] M W Harm and M S Seidenberg ldquoPhonology readingacquisition and dyslexia Insights from connectionist modelsrdquoPsychological Review vol 106 no 3 pp 491ndash528 1999

[170] L M Gonnerman M S Seidenberg and E S AndersenldquoGraded semantic and phonological similarity effects in prim-ing Evidence for a distributed connectionist approach to

morphologyrdquo Journal of Experimental Psychology General vol136 no 2 pp 323ndash345 2007

[171] H Zhao O Gallo I Frosio and J Kautz ldquoLoss Functions forImage Restoration With Neural Networksrdquo IEEE Transactionson Computational Imaging vol 3 no 1 pp 47ndash57 2017

[172] C Clarke and P Luce ldquoPerceptual adaptation to speakercharacteristics Vot boundaries in stop voicing categorizationrdquoin Proceedings of the ISCA Workshop on Plasticity in SpeechPerception 2005

[173] DNorris JMMcQueen andACutler ldquoPerceptual learning inspeechrdquo Cognitive Psychology vol 47 no 2 pp 204ndash238 2003

[174] A Schweitzer and N Lewandowski ldquoSocial factors in conver-gence of f1 and f2 in spontaneous speechrdquo in Proceedings ofthe 10th International Seminar on Speech Production CologneGermany 2014

[175] M J Pickering and S Garrod ldquoToward a mechanistic psychol-ogy of dialoguerdquo Behavioral and Brain Sciences vol 27 no 2 pp169ndash190 2004

[176] C M Clarke and M F Garrett ldquoRapid adaptation to foreign-accented Englishrdquo The Journal of the Acoustical Society ofAmerica vol 116 no 6 pp 3647ndash3658 2004

[177] W J Levelt ldquoDie konnektionistische moderdquo Sprache UndKognition vol 10 no 2 pp 61ndash72 1991

[178] A Kostic ldquoInformational load constraints on processinginflected morphologyrdquo in Morphological Aspects of LanguageProcessing L B Feldman Ed Lawrence Erlbaum Inc Publish-ers New Jersey USA 1995

[179] J P BlevinsWord andParadigmMorphology OxfordUniversityPress 2016

[180] E Forstemann Uber Deutsche volksetymologie Zeitschrift furvergleichende Sprachforschung auf dem Gebiete des DeutschenGriechischen und Lateinischen 1852

[181] K Nault Morphological Therapy Protocal [Ph D thesis] Uni-versity of Alberta Edmonton 2010

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 40: The Discriminative Lexicon: A Unified Computational Model ...downloads.hindawi.com/journals/complexity/2019/4895891.pdf · as developed within computational linguistics and com-puter

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom