500

Computational Explorations in Cognitive Neuroscience ...the-eye.eu/public/Books/BioMed/Computational Explorations...cognitive neuroscience, a key focus is on the fact that this depletion

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

  • Foreword

    The Role of Computational Models in CognitiveNeuroscience

    The publication of O’Reilly and Munakata’s Computa-tional Explorations in Cognitive Neuroscience comes atan opportune moment. The field is rapidly growing,with centers and institutes springing up everywhere.Researchers from backgrounds ranging from psychol-ogy to molecular biology are pouring into the field. EricKandel has suggested that “cognitive neuroscience—with its concern about perception, action, memory, lan-guage and selective attention—will increasingly cometo represent the central focus of all neurosciences in thetwentyfirst century.”

    Today, quite a bit of the excitement in the field sur-rounds the use of several important new experimentalmethodologies. For the study of the neural basis ofcognition in humans, fMRI and other imaging modal-ities hold great promise to allow us to visualize thebrain while cognition is occurring, and it is likely thatthere will be ongoing breakthroughs in spatial and tem-poral resolution. Working upward from the molecularlevel, new genetic methods for creating animals withalterations in the basic functional properties of specificgroups of neurons and synapses is allowing detailedexploration of how these cellular and synaptic pro-cesses impact higher processes such as spatial learningin animals. Within neurophysiology, the use of multi-electrode arrays to record from as many as 100 separateneurons at a time has led to new insights into the repre-sentation of information during behavior, and the laterreactivation of these representations from memory dur-ing sleep.

    With all their marvelous tools, the question arises,do we really need computational models in cognitive

    neuroscience? Can we not learn everything we need toknow about the neural basis of cognition through ex-perimental investigation? Do we need a book like thepresent one to explore the principles of neural compu-tation and apply them to the task of understanding howcognition arises from neuronal interactions?

    The answer is: Yes, we do need computational mod-els in cognitive neuroscience. To support this answer, Iwill begin by describing what I take to be one of the cen-tral goals of cognitive neuroscience. I will then describewhat we mean by the phrase “a computational model”and consider the role such models can play in address-ing the central goal. Along the way I hope to indicatesome of the shortcomings of experimental research un-dertaken without the aid of computational models andhow models can be used to go beyond these limitations.The goal is to make clear exactly what models are, andthe role they are intended to play.

    First, what is this central goal of cognitive neuro-science? To me, and I think to O’Reilly, Munakata, andmany researchers in the field, the goal is to understandhow neural processes give rise to cognition. Typically,cognition is broadly construed to include perception, at-tention, language, memory, problem solving, planning,reasoning, and the coordination and execution of ac-tion. And typically some task or tasks are used to makebehavioral observations that tap these underlying pro-cesses; aspects of conscious experience are included,to the extent that they can be subjected to scientificscrutiny through observables (including verbal reportsor other readout methods). The processes consideredmay be ones that take place in a brief interval of time,such as the processes that occur when a human observerreads a visually presented word. Or they may be onesthat take place over longer periods of time, such as theprocesses that occur as a child progresses through var-

    xix

  • xx FOREWORD

    ious developmental stages in understanding the role ofweight and distance, say, in balance. Processes occur-ring in individuals with disorders are often of interest,in part to understand the disorder and in part for thelight the disorder may shed on the effort to understandthe “normal” case. Cognition as it occurs in humansis often the main focus, but animal models are oftenused, in part because there are clear structure-functionhomologues between humans and nonhuman animals inmany domains of cognition, and in part because we canmanipulate the nervous systems of nonhuman animalsand invade their nervous systems with probes we can-not use in humans.

    Whatever the exact target phenomenon, the essentialgoal is to understand the mechanisms involved. Here,we must be extremely careful to distinguish differentspecific kinds of mechanistic goals. One might havethe goal simply to provide a detailed characterization ofthe actual physical and chemical processes that under-lie the cognitive processes in question. This is certainlya very worthwhile goal, and will be essential for med-ical applications of cognitive neuroscience. But mostresearchers who call themselves cognitive neuroscien-tists are probably looking for something more general;I think most researchers would like to say it is not thedetails themselves that matter but the principles that areembodied in these details. As one example, consider thephenomenon observed of depletion of neurotransmitterthat occurs when a synapse is repeatedly activated byan incoming neural impulse. The exact details of themolecular mechanisms involved in this process are afocus of considerable interest in neuroscience. But forcognitive neuroscience, a key focus is on the fact thatthis depletion results in a temporary reduction in effi-cacy at the affected synapse. The principle that activ-ity weakens synapses has been incorporated into sev-eral models, and then has been used to account for a va-riety of phenomena, including alternation between dis-tinct interpretations of the same percept (also known as“bistable perception”). The point is that the basic prin-ciple can be built into a model that captures emergentperceptual phenomena without incorporating all of theunderlying biophysical details.

    It may be worth noting here that some researcherswho call themselves cognitive neuroscientists disavow

    a concern for the neural processes themselves that formthe underlying basis of cognition and focus instead onwhat they take to be more fundamental issues that tran-scend the details. For example, many researchers con-sider the task of cognitive neuroscience to be one offinding the correct partitioning of the brain into dis-tinct modules with isolable functions. This is a highlyseductive enterprise, one that is reinforced by the factthat damage to particular brain regions can produce aprofound deficit in the ability to perform certain cogni-tive tasks or to carry out a task for a particular type ofitem, while largely sparing performance of other tasksor other types of items.

    For example, damage to anterior language areas canhave a large effect on the ability to produce the pasttenses of words that exhibit the regular English pasttense (such as “need” – “needed”), leaving performanceon exception words (such as “take” – “took”) largely in-tact. On the other hand, damage to more posterior areascan sometimes lead to a large deficit in producing thepast tenses of the exception words, with relative spar-ing of the regular items and a tendency to “regularize”exceptions (“wear” – “weared”). Such findings have of-ten enticed cognitive neuroscientists to attribute perfor-mance on the different tasks or classes of items to differ-ent cognitive modules, and fMRI and other brain imag-ing models showing differential activations of brain re-gions in different tasks or with different items are usedin a similar way to assign functions to brain modules.

    Thus, one interpretation of the findings on verb in-flection is that the system of rules that is used in pro-cessing the regular items is subserved by some part orparts of the anterior language processing system, whilea lexicon or list of word-specific information specify-ing the correct past tenses of exceptions is subserved bysome part or parts of the affected posterior areas (Ull-man, Corkin, & Pinker, 1997). As appealing intuitivelyas this sort of inference may seem, there are two prob-lems. One problem is that, as stated, it isn’t always clearwhether the ideas are sufficient to provide a full accountof the pattern of spared and impaired performance seenby both types of patients. An explicit model can helpthe scientific community work through in full detail theactual ability of a proposed set of modular mechanismsto account for the observed pattern of data. We will

  • FOREWORD xxi

    come back to this issue. Before doing so, it is importantto consider the second problem with drawing inferencesdirectly without an explicit model of the processes in-volved providing guidance.

    The second problem is that in the absence of an ex-plicit process model, investigators often have a ten-dency to reify aspects of task differences, stimulustypes, or types of errors in their theorizing about the un-derlying modular organization. Here, models help to il-lustrate that other interpretations may be possible, onesthat may require fewer modules or independent loci ofdamage in the resulting account for normal and disor-dered behavior. For example, the connectionist modelof Joanisse and Seidenberg (1999) offers an alternativeto the Ullman et al. account of the pattern of deficitsin forming past tenses. According to this model, dam-age to the posterior system disrupts the semantic rep-resentations of all types of words, while damage to theanterior system disrupts phonological processing of alltypes of words. The reason why anterior lesions have adisproportionate impact on inflection of regular wordsis that the regular inflection is both subtler to perceiveand harder to produce, making it more sensitive to aphonological disruption. The reason why posterior le-sions have a disproportionate impact on exceptions isthat the semantics of a word provides distinct, word spe-cific input that is necessary for overriding the regularinflectional pattern that is typical of most words.

    The key point here is that an explicit computationalperspective often leads to new ways of understandingobserved phenomena that are apparently not always ac-cessible to those who seek to identify subsystems with-out giving detailed consideration to the mechanisms in-volved. To demonstrate how commonly this arises, Iwill mention three other cases in point, two of whichare explored more fully by O’Reilly and Munakata. Allthree suggest how a computational perspective has ledto new interpretations of neuropsychological phenom-ena:

    � It has long been known that lesions in posterior pari-etal cortex lead to deficits in visual processing in theopposite visual field. Posner, Walker, Friedrich, andRafal (1984) observed that patients with lateralizedposterior parietal lesions had only a moderate deficit

    in detecting targets in the opposite visual field, andthat the deficit became much larger when the task re-quired shifting away from the unaffected visual fieldinto the affected field, and proposed that this reflecteda deficit in a specific neural module for disengage-ment of attention from the opposite side of space.But as O’Reilly and Munakata explore in a modelin chapter 8, Cohen, Romero, Farah, and Servan-Schreiber (1994) later demonstrated that in a simpleneural network, in which there are pools of neuronson each side of the brain each responsible to atten-tion to the opposite side of space, partial damage tothe pool of neurons on one side led to a close fit towhole pattern of data; no separate module for dis-engagement, over and above the basic mechanism ofattention itself, was required to account for the data.

    � Several investigators have observed interesting pat-terns of disproportionate deficits in patients withproblems with face recognition. For example, suchpatients can show profound deficits in the ability toname famous faces, yet show facilitation of reading aperson’s name aloud when the face is presented alongwith it. Such a finding was once interpreted as sug-gesting that the damage had affected only consciousface processing (or access to consciousness of the re-sults of an intact unconscious process). But Farah,O’Reilly, and Vecera (1993) later demonstrated thatin a simple neural network, partial damage can easilyhave a disproportionate impact of the ability to pro-duce a complete, correct answer, while neverthelessleaving enough residual sensitivity to specific facesto bias processing in other parts of the system.

    � The intriguing phenomenon of deep dyslexia waspuzzling to neuropsychologists for many years(Coltheart, Patterson, & Marshall, 1980). An essen-tial characteristic of this disorder is the fact that thepatients sometimes make dramatic errors preservingmeaning but completely failing to respect the visualor phonological aspects of the stimulus word. For ex-ample such a patient might read “ROSE” as “TULIP.”This suggests a semantic deficit; yet at the same timeall such patients also make visual errors, such as mis-reading “SYMPHONY” as “SYMPATHY,” for ex-ample. Neuropsychologists working from a modu-

  • xxii FOREWORD

    lar perspective were forced to propose at least twodistinct lesion loci to account for these effects; andthis was deeply troubling to the modelers, who con-sidered being forced to postulate multiple lesions forevery single case to be unparsimonious. As O’Reillyand Munakata explore in a model in chapter 10, acomputational approach has led to a far more parsi-monious account. Hinton and Shallice (1991) (andlater Plaut & Shallice, 1993) were able to show thatin a simple neural network model that maps represen-tations of letters onto representations of word mean-ings, a lesion anywhere in the network led to bothvisual and semantic errors. Lesions “closer to orthog-raphy” led to a greater number of visual errors, andlesions “closer to semantics” led to a greater numberof semantic errors, but crucially, both kinds of lesionsled to some errors of both types. Thus, the model sug-gests that a single locus of lesion may be consistentwith the data after all.

    What these three examples and the earlier exampleabout the past tense all illustrate is that the inferencefrom data to the modular architecture of the mind is notat all straightforward, and that explicit computationalmodels can provide alternatives to what in some casesappears to be a fairly simplistic reification of task oritem differences into cognitive modules, and in othercases manifests as a reification of types of errors (se-mantic, visual) into lesion sights.

    Given that thinking about the underlying mechanismleads to alternative accounts for patterns of data, thenwe must return to the the crucial question of decid-ing which of several alternative proposals provides the“right” account. The question is very hard to answer,without an implemented computational instantiation ofall of the competing accounts, since it isn’t clear in ad-vance of a specification and implementation just whatthe detailed predictions of each of the accounts mightreally be. The point is that a computational approachcan lead both to appealing alternatives to intuitive ac-counts as well as to explicit predictions that can be com-pared to all aspects of data to determine which accountis actually able to offer an adequate account. For thisreason, explicit computational models (whether basedon neural networks or on other frameworks) are becom-

    ing more and more central to the effort to understandthe nature of the underlying mechanisms of cognition.

    Providing a detailed account of a body of empiricaldata has been the goal of a great deal of modeling work,but it is important to understand that models can be use-ful and informative, even when they are not fit in detailto a complex data set. Instead of viewing models onlyas data fitting tools, it seems preferable to view them astools (implemented in a computer program) for explor-ing what a given set of postulates about some processor mechanism implies about its resulting behavior. Incognitive neuroscience, we are usually interested in un-derstanding what mechanism or process might give riseto observables, either behavioral data obtained in a cog-nitive task or some observable neural phenomenon suchas the receptive field properties and spatial distributionof neurons in visual cortex. We explore this by layingout a set of postulates that define an explicit computa-tional process, its inputs and its initial conditions, andthen we run the process on a computer to see how it be-haves. Typically there will be outputs that are intendedto correspond in some way to behavioral responses, andthere may be internal variables that are intended to cor-respond to observable neural variables, such as neuronalfiring rates.

    A very important point is that in this particular kindof work, the postulates built into a model need not rep-resent the beliefs of the modeler; rather, they may rep-resent a particular set of choices the modeler has madeto try to gain insight into the model and, by proxy, theassociated behavioral or biological phenomenon. A keyaspect of the process of making good choices is abstrac-tion and simplification. Unless models are kept as min-imal as possible, it can become extremely difficult tounderstand them, or even to run the simulations quicklyenough for them to serve as useful part of the researchprocess. On the other hand, it is essential that we main-tain sufficient structure within the model to deal with theissues and phenomena that are of interest. The processof model development within cognitive neuroscience isan exploration—a search for the key principles that themodels must embody, for the most direct and succinctway of capturing these principles, and for a clear un-derstanding of how and why the model gives rise to thephenomena that we see exhibited in its behavior.

  • FOREWORD xxiii

    This book by O’Reilly and Munakata is an impor-tant step in the progress of these explorations. Thebook represents an evolution from the earlier explo-rations represented by the “PDP books” (Parallel Dis-tributed Processing: Explorations in the Microstructureof Cognition by Rumelhart, McClelland, and the PDPResearch Group, 1986, and the companion Handbook,Explorations in Parallel Distributed Processing, by Mc-Clelland and Rumelhart, 1988). O’Reilly and Munakatahave built on a set of computational principles that arosefrom the effort to constrain the parallel distributed pro-cessing framework for modeling cognitive processes(McClelland, 1993), and have instantiated them withinan integrated computational framework incorporatingadditional principles associated with O’Reilly’s Leabraalgorithm. They have taken the computational and psy-chological abstraction characteristic of the PDP work,while moving many of the properties of the frameworkcloser to aspects of the underlying neural implementa-tion. They have employed a powerful set of softwaretools, including a sophisticated graphical user interfaceand a full-featured scripting language to create an im-pressive, state-of-the art simulation tool. They have ex-ploited the combined use of expository text and hands-on simulation exercises to illustrate the basic propertiesof processing, representation, and learning in networks,and they have used their integrated framework to im-plement close analogs of a number of the examples thatwere developed in the earlier PDP work to illustrate keyaspects of the emergent behavior or neural networks.They have gone on to show how these models can be ap-plied in a number of domains of cognitive neuroscienceto offer alternatives to traditional approaches to a num-ber of central issues. Overall this book represents animpressive effort to construct a framework for the fur-ther exploration of the principles and their implicationsfor cognitive neuroscience.

    It is important, however, to be aware that the compu-tational exploration of issues in cognitive neuroscienceis still very much in its infancy. There is a greatdeal that remains to be discovered about learning,processing, and representation in the brain, and abouthow cognition emerges from the underlying neuralmechanisms. As with the earlier, PDP books, animportant part of the legacy of this book is likely to be

    its influence on the next wave of researchers who willtake the next steps in these explorations.

    James L. McClellandCenter for the Neural Basis of CognitionFebruary, 2000

  • Preface

    Computational approaches to cognitive neuroscience(computational cognitive neuroscience) focus on un-derstanding how the brain embodies the mind, usingbiologically based computational models made up ofnetworks of neuronlike units. Because this endeavorlies at the intersection of a number of different disci-plines, including neuroscience, computation, and cogni-tive psychology, the boundaries of computational cog-nitive neuroscience are difficult to delineate, making ita subject that can be difficult to teach well. This book isintended to support the teaching of this subject by pro-viding a coherent, principled introduction to the mainideas in the field. It is suitable for an advanced under-graduate or graduate course in one semester or quarter(the authors have each used this text for such coursesat their respective universities), and also for researchersin related areas who want to learn more about this ap-proach to understanding the relation between mind andbrain.

    Any introductory text on the subject of computationalcognitive neuroscience faces a potentially overwhelm-ing set of compromises — one could write volumes oneach of the different component aspects of computa-tion, cognition, and neuroscience. Many existing textshave avoided these compromises by focusing on spe-cific issues such as the firing patterns of individual neu-rons (e.g., Reike, Warland, van Steveninck, & Bialek,1996), mathematically oriented treatments of the com-putational properties of networks (e.g., Hertz, Krogh, &Palmer, 1991), or more abstract models of cognitivephenomena (e.g., Elman, Bates, Johnson, Karmiloff-Smith, Parisi, & Plunkett, 1996; Plunkett & Elman,1997). However, we knew that our excitement in thefield was based in large part on the wide scope of theissues involved in this endeavor — from biological andcomputational properties to cognitive function — which

    requires a broader perspective (hand in hand with agreater compromise on some of the details) than is cap-tured in these texts.

    Thus, like many of our colleagues teaching similarcourses, we continued to think that the original PDP(parallel distributed processing) volumes (Rumelhart,McClelland, & PDP Research Group, 1986c; McClel-land, Rumelhart, & PDP Research Group, 1986; Mc-Clelland & Rumelhart, 1988) were the best texts forcovering the broader scope of issues. Unlike many laterworks, these volumes present the computational and bi-ological mechanisms from a distinctly cognitive per-spective, and they make a serious attempt at modeling arange of cognitive phenomena. However, the PDP vol-umes are now somewhat dated and present an often con-fusing hodge-podge of different algorithms and ideas.Also, the simulation exercises were a separate volume,rather than being incorporated into the text to play anintegral role in students’ understanding of the complexbehavior of the models. Finally, the neuroscience gotshort shrift in this treatment, because most of the mod-els relied on very abstract and somewhat biologicallyimplausible mechanisms.

    Our objective in writing this text was therefore toreplicate the scope (and excitement) of the original PDPvolumes in a more modern, integrated, and unified man-ner that more tightly related biology and cognition andprovided intuitive graphical simulations at every stepalong the way. We achieved this scope by focusing ona consistent set of principles that form bridges betweencomputation, neuroscience, and cognition. Within thiscoherent framework, we cover a breadth and depth ofsimulations of cognitive phenomena unlike any othertextbook that we know of. We provide a large numberof modern, state-of-the-art, research-grade simulationmodels that readers explore in some detail as guided by

    xxv

  • xxvi PREFACE

    the text, and that they can then explore further on theirown.

    We are well aware that there is a basic tradeoff be-tween consistency and diversity (e.g., exploitation ver-sus exploration as emphasized in the reinforcement-learning paradigm). The field of computational cog-nitive neuroscience has generally been characterizedmore by the diversity of theoretical approaches andmodels than by any kind of explicit consistency. Thisdiversity has been cataloged in places like the ency-clopedic Arbib (1995) volume, where readers can findoverview treatments of widely varying perspectives. Weview this book as a complementary resource to such en-cyclopedic treatments. We focus on consistency, striv-ing to abstract and present as much as possible a con-sensus view, guided by a basic set of well-developedprinciples, with brief pointers to major alternative per-spectives.

    In summary, this book is an attempt to consolidateand integrate advances across a range of fields and phe-nomena into one coherent package, which can be di-gested relatively easily by the reader. At one level, theresult can be viewed as just that — an integration andconsolidation of existing knowledge. However, we havefound that the process of putting all of these ideas to-gether into one package has led to an emergent phe-nomenon in which the whole is greater than the sumof its parts. We come away with a sense of renewedexcitement and interest in computational cognitive neu-roscience after writing this book, and hope that you feelsome of this, too.

  • Acknowledgments

    This book has benefited greatly from the generous in-put of a number of individuals. First, we thank our stu-dents for working through early drafts of the text andsimulations, and for providing useful feedback fromthe perspective of the primary target audience for thisbook. From the University of Denver: David Bauer,Nick Bitz, Rebecca Betjemann, Senia Bozova, KristinBrelsford, Nomita Chhabildas, Tom Delaney, ElizabethGriffith, Jeff Grubb, Leane Guzzetta, Erik Johnston,Gabe Lebovich, John McGoldrick, Jamie Ogline, JoanRoss, Robbie Rossman, Jeanne Shinskey, Tracy Stack-house, Jennifer Stedron, Rachel Tunick, Tara Wass,and Julie Wilbarger. And from the University of Col-orado, Boulder: Anita Bowles, Mike Emerson, MichaelFrank, Naomi Friedman, Tom Helman, Josh Hemann,Darrell Laham, Noelle LaVoie, Bryan Loughry, BenPageler, Chris Ritacca, Alan Sanfey, Rodolfo Soto,Steve Romero, Mike Vanelzakker, Jim Vanoverschelde,Tor Wager, Rebecca Washlow, and Ting-Yu Wu.

    We were very fortunate for the comments of the fol-lowing colleagues who provided invaluable expertiseand insight: Dan Barth, Lyle Bourne, Axel Cleeremansand colleagues, Martha Farah, Lew Harvey, Alex Hol-combe, Jim Hoeffner, Jan Keenan, Akira Miyake, MikeMozer, David Noelle, Ken Norman, Dick Olson, BrucePennington, Jerry Rudy, and Jack Werner. We reservespecial thanks for M. Frank Norman for a very thoroughreading of the entire manuscript and careful attentionto the mathematical equations, and for Chad Marsolekwho gave us extensive feedback after teaching from thebook.

    Michael Rutter at MIT Press was very supportivethroughout the development, writing, and reviewingprocess of the book — he is good enough to make usalmost feel like writing another book! We will have to

    see how this one does first. Katherine Almeida at MITPress and the entire production staff made the produc-tion process remarkably smooth — thanks!

    Peter Dayan deserves special gratitude for inspiringus to write this thing in the first place. Josh Dorman wasvery generous in agreeing to do the cover for the book,which is entitled “Yonder: Interior” — please buy hisartwork (we did)! His work is represented by 55 MercerGallery in New York (212) 226-8513, and on the web at:www.sirius.com/ zknower/josh.

    Finally, we owe the greatest debt to Jay McClelland,who was our mentor throughout graduate school andcontinues to inspire us. It will be obvious to all thatmany of the framing ideas in this book are based onJay’s pioneering work, and that he deserves extraordi-nary credit for shaping the field and maintaining a focuson cognitive issues.

    RO and YM were supported by NSF KDI/LIS grantIBN-9873492. RO was supported by NIH ProgramProject MH47566, and YM was supported by NICHD1R29 HD37163-01 and NIMH 1R03 MH59066-01.

    xxvii

  • Chapter 1

    Introduction and Overview

    Contents

    1.1 Computational Cognitive Neuroscience . . . . . . 11.2 Basic Motivations for Computational Cognitive

    Neuroscience . . . . . . . . . . . . . . . . . . . . 21.2.1 Physical Reductionism . . . . . . . . . . . 2

    1.2.2 Reconstructionism . . . . . . . . . . . . . . 3

    1.2.3 Levels of Analysis . . . . . . . . . . . . . . 4

    1.2.4 Scaling Issues . . . . . . . . . . . . . . . . 6

    1.3 Historical Context . . . . . . . . . . . . . . . . . 81.4 Overview of Our Approach . . . . . . . . . . . . 101.5 General Issues in Computational Modeling . . . . 111.6 Motivating Cognitive Phenomena and Their Bi-

    ological Bases . . . . . . . . . . . . . . . . . . . . 141.6.1 Parallelism . . . . . . . . . . . . . . . . . 15

    1.6.2 Gradedness . . . . . . . . . . . . . . . . . 15

    1.6.3 Interactivity . . . . . . . . . . . . . . . . . 17

    1.6.4 Competition . . . . . . . . . . . . . . . . . 17

    1.6.5 Learning . . . . . . . . . . . . . . . . . . . 18

    1.7 Organization of the Book . . . . . . . . . . . . . 191.8 Further Reading . . . . . . . . . . . . . . . . . . 20

    1.1 Computational Cognitive Neuroscience

    How does the brain think? This is one of the mostchallenging unsolved questions in science. Armed withnew methods, data, and ideas, researchers in a varietyof fields bring us closer to fully answering this questioneach day. We can even watch the brain as it thinks, usingmodern neuroimaging machines that record the biolog-ical shadows of thought and transform them into vivid

    color images. These amazing images, together with theresults from many other important techniques, have ad-vanced our understanding of the neural bases of cogni-tion considerably. We can consolidate these various dif-ferent approaches under the umbrella discipline of cog-nitive neuroscience, which has as its goal answeringthis most important of scientific questions.

    Cognitive neuroscience will remain a frontier formany years to come, because both thoughts and brainsare incredibly complex and difficult to understand. Se-quences of images of the brain thinking reveal a vastnetwork of glowing regions that interact in complexways with changing patterns of thought. Each pictureis worth a thousand words — indeed, language oftenfails us in the attempt to capture the richness and sub-tlety of it all. Computational models based on bio-logical properties of the brain can provide an impor-tant tool for understanding all of this complexity. Suchmodels can capture the flow of information from youreyes recording these letters and words, up to the parts ofyour brain activated by the different word meanings, re-sulting in an integrated comprehension of this text. Al-though our understanding of such phenomena is still in-complete, these models enable us to explore their under-lying mechanisms, which we can implement on a com-puter and manipulate, test, and ultimately understand.

    This book provides an introduction to this emergingsubdiscipline known as computational cognitive neu-roscience: simulating human cognition using biologi-cally based networks of neuronlike units (neural net-works). We provide a textbook-style treatment of thecentral ideas in this field, integrated with computer sim-

    1

  • 2 CHAPTER 1. INTRODUCTION AND OVERVIEW

    ulations that allow readers to undertake their own ex-plorations of the material presented in the text. An im-portant and unique aspect of this book is that the ex-plorations include a number of large-scale simulationsused in recent original research projects, giving studentsand other researchers the opportunity to examine thesemodels up close and in detail.

    In this chapter, we present an overview of the basicmotivations and history behind computational cogni-tive neuroscience, followed by an overview of the sub-sequent chapters covering basic neural computationalmechanisms (part I) and cognitive phenomena (part II).Using the neural network models in this book, you willbe able to explore a wide range of interesting cognitivephenomena, including:

    Visual encoding: A neural network will view naturalscenes (mountains, trees, etc.), and, using some basicprinciples of learning, will develop ways of encodingthese visual scenes much like those your brain usesto make sense of the visual world.

    Spatial attention: By taking advantage of the interac-tions between two different streams of visual process-ing, you can see how a model focuses its attentionin different locations in space, for example to scan avisual scene. Then, you can use this model to sim-ulate the attention performance of normal and brain-damaged people.

    Episodic memory: By incorporating the structure ofthe brain area called the hippocampus, a neural net-work will become able to form new memories of ev-eryday experiences and events, and will simulate hu-man performance on memory tasks.

    Working memory: You will see that specialized bio-logical mechanisms can greatly improve a network’sworking memory (the kind of memory you need tomultiply 42 by 17 in your head, for example). Fur-ther, you will see how the skilled control of workingmemory can be learned through experience.

    Word reading: You can see how a network can learnto read and pronounce nearly 3,000 English words.Like human subjects, this network can pronouncenovel nonwords that it has never seen before (e.g.,“mave” or “nust”), demonstrating that it is not sim-

    ply memorizing pronunciations — instead, it learnsthe complex web of regularities that govern Englishpronunciation. And, by damaging a model that cap-tures the many different ways that words are repre-sented in the brain, you can simulate various formsof dyslexia.

    Semantic representation: You can explore a networkthat has “read” every paragraph in this textbook andin the process acquired a surprisingly good under-standing of the words used therein, essentially by not-ing which words tend to be used together or in similarcontexts.

    Task directed behavior: You can explore a model ofthe “executive” part of the brain, the prefrontal cor-tex, and see how it can keep us focused on perform-ing the task at hand while protecting us from gettingdistracted by other things going on.

    Deliberate, explicit cognition: A surprising numberof things occur relatively automatically in your brain(e.g., you are not aware of exactly how you trans-late these black and white strokes on the page intosome sense of what these words are saying), but youcan also think and act in a deliberate, explicit fash-ion. You’ll explore a model that exhibits both of thesetypes of cognition within the context of a simple cat-egorization task, and in so doing, provides the begin-nings of an account of the biological basis of con-scious awareness.

    1.2 Basic Motivations for ComputationalCognitive Neuroscience

    1.2.1 Physical Reductionism

    The whole idea behind cognitive neuroscience is theonce radical notion that the mysteries of human thoughtcan be explained in much the same way as everythingelse in science — by reducing a complex phenomenon(cognition) into simpler components (the underlying bi-ological mechanisms of the brain). This process is justreductionism, which has been and continues to be thestandard method of scientific advancement across mostfields. For example, all matter can be reduced to itsatomic components, which helps to explain the various

  • 1.2. BASIC MOTIVATIONS FOR COMPUTATIONAL COGNITIVE NEUROSCIENCE 3

    properties of different kinds of matter, and the ways inwhich they interact. Similarly, many biological phe-nomena can be explained in terms of the actions of un-derlying DNA and proteins.

    Although it is natural to think of reductionism interms of physical systems (e.g., explaining cognitionin terms of the physical brain), it is also possible toachieve a form of reductionism in terms of more ab-stract components of a system. Indeed, one could arguethat all forms of explanation entail a form of reduction-ism, in that they explain a previously inexplicable thingin terms of other, more familiar constructs, just as onecan understand the definition of an unfamiliar word inthe dictionary in terms of more familiar words.

    There have been many attempts over the years toexplain human cognition using various different lan-guages and metaphors. For example, can cognition beexplained by assuming it is based on simple logical op-erations? By assuming it works just like a standard se-rial computer? Although these approaches have bornesome fruit, the idea that one should look to the brainitself for the language and principles upon which to ex-plain human cognition seems more likely to succeed,given that the brain is ultimately responsible for it all.Thus, it is not just reductionism that defines the essenceof cognitive neuroscience — it is also the stipulationthat the components be based on the physical substrateof human cognition, the brain. This is physical reduc-tionism.

    As a domain of scientific inquiry matures, there is atendency for constructs that play a role in that domain tobecome physically grounded. For example, in the bio-logical sciences before the advent of modern molecularbiology, ephemeral, vitalistic theories were common,where the components were posited based on a theory,not on any physical evidence for them. As the molecularbasis of life was understood, it became possible to de-velop theories of biological function in terms of real un-derlying components (proteins, nucleic acids, etc.) thatcan be measured and localized. Some prephysical theo-retical constructs accurately anticipated their physicallygrounded counterparts; for example, Mendel’s theory ofgenetics anticipated many important functional aspectsof DNA replication, while others did not fare so well.

    Similarly, many previous and current theories of hu-

    man cognition are based on constructs such as “atten-tion” and “working memory buffers” that are based onan analysis of behaviors or thoughts, and not on phys-ical entities that can be independently measured. Cog-nitive neuroscience differs from other forms of cogni-tive theorizing in that it seeks to explain cognitive phe-nomena in terms of underlying neurobiological com-ponents, which can in principle be independently mea-sured and localized. Just as in biology and other fields,some of the nonphysical constructs of cognition willprobably fit well with the underlying biological mech-anisms, and others may not (e.g., Churchland, 1986).Even in those that fit well, understanding their biolog-ical basis will probably lead to a more refined and so-phisticated understanding (e.g., as knowing the biolog-ical structure of DNA has for understanding genetics).

    1.2.2 Reconstructionism

    However, reductionism in all aspects of science — par-ticularly in the study of human cognition — can suf-fer from an inappropriate emphasis on the process ofreducing phenomena into component pieces, withoutthe essential and complementary process of using thosepieces to reconstruct the larger phenomenon. We referto this latter process as reconstructionism. It is simplynot enough to say that the brain is made of neurons;one must explain how billions of neurons interactingwith each other produce human cognition. Teitelbaum(1967) argued for a similar complementarity of scien-tific processes — analysis and synthesis — in the studyof physiological psychology. Analysis entails dissect-ing and simplifying a system to understand its essentialelements; synthesis entails combining elements and un-derstanding their interactions.

    The computational approach to cognitive neuro-science becomes critically important in reconstruction-ism: it is very difficult to use verbal arguments to re-construct human cognition (or any other complex phe-nomenon) from the action of a large number of interact-ing components. Instead, we can implement the behav-ior of these components in a computer program and testwhether they are indeed capable of reproducing the de-sired phenomena. Such simulations are crucial to devel-oping our understanding of how neurons produce cog-

  • 4 CHAPTER 1. INTRODUCTION AND OVERVIEW

    a) b)

    Figure 1.1: Illustration of the importance of reconstruction-ism — it is not enough to say that the system is composed ofcomponents (e.g., two gears as in a), one must also show howthese components interact to produce overall behaviors. In b,the two gears interact to produce changes in rotational speedand torque — these effects emerge from the interaction, andare not a property of each component individually.

    nition. This is especially true when there are emergentphenomena that arise from these interactions withoutobviously being present in the behavior of individualelements (neurons) — where the whole is greater thanthe sum of its parts. The importance of reconstruction-ism is often overlooked in all areas of science, not justcognitive neuroscience, and the process has really onlyrecently become feasible with the advent of relativelyaffordable fast computers.

    Figure 1.1 shows a simple illustration of the impor-tance of reconstructionism in understanding how sys-tems behave. Here, it is not sufficient to say that thesystem is composed of two components (the two gearsshown in panel a). Instead, one must also specify thatthe gears interact as shown in panel b, because it is onlythrough this interaction that the important “behavioral”properties of changes in rotational speed and torque canemerge. For example, if the smaller gear drives thelarger gear, this achieves a decrease in rotational speedand an increase in torque. However, if this same drivinggear were to interact with a gear that was even smallerthan it, it would produce the opposite effect. This isessentially what it means for the behavior to emergefrom the interaction between the two gears, because itis clearly not a property of the individual gears in isola-tion. Similarly, cognition is an emergent phenomenonof the interactions of billions of neurons. It is not suf-ficient to say that the cognitive system is composed ofbillions of neurons; we must instead specify how theseneurons interact to produce cognition.

    1.2.3 Levels of Analysis

    Although the physical reductionism and reconstruction-ism motivations behind computational cognitive neuro-science may appear sound and straightforward, this ap-proach to understanding human cognition is challengedby the extreme complexity of and lack of knowledgeabout both the brain and the cognition it produces. As aresult, many researchers have appealed to the notion ofhierarchical levels of analysis to deal with this complex-ity. Clearly, some levels of underlying mechanism aremore appropriate for explaining human cognition thanothers. For example, it appears foolhardy to try to ex-plain human cognition directly in terms of atoms andsimple molecules, or even proteins and DNA. Thus, wemust focus instead on higher level mechanisms. How-ever, exactly which level is the “right” level is an im-portant issue that will only be resolved through furtherscientific investigation. The level presented in this bookrepresents our best guess at this time.

    One approach toward thinking about the issue of lev-els of analysis was suggested by David Marr (1982),who introduced the seductive notion of computational,algorithmic, and implementational levels by forgingan analogy with the computer. Take the example of aprogram that sorts a list of numbers. One can specifyin very abstract terms that the computation performedby this program is to arrange the numbers such thatthe smallest one is first in the list, the next largest oneis next, and so on. This abstract computational levelof analysis is useful for specifying what different pro-grams do, without worrying about exactly how they goabout doing it. Think of it as the “executive summary.”

    The algorithmic level then delves into more of thedetails as to how sorting actually occurs — there aremany different strategies that one could adopt, and theyhave various tradeoffs in terms of factors such as speedor amount of memory used. Critically, the algorithmprovides just enough information to implement the pro-gram, but does not specify any details about what lan-guage to program it in, what variable names to use, andso on. These details are left for the implementationallevel — how the program is actually written and exe-cuted on a particular computer using a particular lan-guage.

  • 1.2. BASIC MOTIVATIONS FOR COMPUTATIONAL COGNITIVE NEUROSCIENCE 5

    Marr’s levels and corresponding emphasis on thecomputational and algorithmic levels were born out ofthe early movements of artificial intelligence, cogni-tive psychology, and cognitive science, which werebased on the idea that one could ignore the underly-ing biological mechanisms of cognition, focusing in-stead on identifying important computational or cog-nitive level properties. Indeed, these traditional ap-proaches were based on the assumption that the brainworks like a standard computer, and thus that Marr’scomputational and algorithmic levels were much moreimportant than the “mere details” of the underlying neu-robiological implementation.

    The optimality or rational analysis approach, whichis widely employed across the “sciences of complex-ity” from biology to psychology and economics (e.g.,Anderson, 1990), shares the Marr-like emphasis on thecomputational level. Here, one assumes that it is pos-sible to identify the “optimal” computation or functionperformed by a person or animal in a given context, andthat whatever the brain is doing, it must somehow beaccomplishing this same optimal computation (and cantherefore be safely ignored). For example, Anderson(1990) argues that memory retention curves are opti-mally tuned to the expected frequency and spacing ofretrieval demands for items stored in memory. Underthis view, it doesn’t really matter how the memory re-tention mechanisms work, because they are ultimatelydriven by the optimality criterion of matching expecteddemands for items, which in turn is assumed to followgeneral laws.

    Although the optimality approach may sound attrac-tive, the definition of optimality all too often ends upbeing conditioned on a number of assumptions (includ-ing those about the nature of the underlying implemen-tation) that have no real independent basis. In short,optimality can rarely be defined in purely “objective”terms, and so often what is optimal in a given situationdepends on the detailed circumstances.

    Thus, the dangerous thing about both Marr’s levelsand these optimality approaches is that they appear tosuggest that the implementational level is largely irrel-evant. In most standard computers and languages, thisis true, because they are all effectively equivalent at theimplementational level, so that the implementational is-

    sues don’t really affect the algorithmic and computa-tional levels of analysis. Indeed, computer algorithmscan be turned into implementations by the completelyautomatic process of compilation. In contrast, in thebrain, the neural implementation is certainly not derivedautomatically from some higher-level description, andthus it is not obviously true that it can be easily de-scribed at these higher levels.

    In effect, the higher-level computational analysishas already assumed a general implementational form,without giving proper credit to it for shaping the wholeenterprise in the first place. However, with the adventof parallel computers, people are beginning to realizethe limitations of computation and algorithms that as-sume the standard serial computer with address-basedmemory — entirely new classes of algorithms and waysof thinking about problems are being developed to takeadvantage of parallel computation. Given that the brainis clearly a parallel computer, having billions of com-puting elements (neurons), one must be very careful inimporting seductively simple ideas based on standardcomputers.

    On the other end of the spectrum, various researchershave emphasized the implementational level as primaryover the computational and algorithmic. They haveargued that cognitive models should be assembled bymaking extremely detailed replicas of neurons, thusguaranteeing that the resulting model contains all of theimportant biological mechanisms (e.g., Bower, 1992).The risk of this approach is complementary to those thatemphasize a purely computational approach: withoutany clear understanding of which biological propertiesare functionally important and which are not, one endsup with massive, complicated models that are difficultto understand, and that provide little insight into thecritical properties of cognition. Further, these modelsinevitably fail to represent all of the biological mecha-nisms in their fullest possible detail, so one can neverbe quite sure that something important is not missing.

    Instead of arguing for the superiority of one levelover the other, we adopt a fully interactive, balancedapproach, which emphasizes forming connections be-tween data across all of the relevant levels, and strikinga reasonable balance between the desire for a simpli-fied model and the desire to incorporate as much of the

  • 6 CHAPTER 1. INTRODUCTION AND OVERVIEW

    Cognitive Phenomena

    Neurobiological Mechanisms

    Principles

    Figure 1.2: The two basic levels of analysis used in this text,with an intermediate level to help forge the links.

    known biological mechanisms as possible. There is aplace for both bottom-up (i.e., working from biologicalfacts “up” to cognition), top-down (i.e., working fromcognition “down” to biological facts), and, most impor-tant, interactive approaches, where one tries to simulta-neously take into account constraints at the biologicaland cognitive levels.

    For example, it can be useful to take a set of factsabout how neurons behave, encode them in a set ofequations in a computer program, and see how the kindsof behaviors that result depend on the properties of theseneurons. It can also be useful to think about what cog-nition should be doing in a particular case (e.g., at thecomputational level, or on some other principled basis),and then derive an implementation that accomplishesthis, and see how well that characterizes what we knowabout the brain, and how well it does the cognitive job itis supposed to do. This kind of interplay between neuro-biological, cognitive and principled (computational andotherwise) considerations is emphasized throughout thetext.

    To summarize our approach, and to avoid the unin-tended associations with Marr’s terminology, we adoptthe following hierarchy of analytical levels (figure 1.2).At its core, we have essentially a simple bi-levelphysical reductionist/reconstructionist hierarchy, with alower level consisting of neurobiological mechanisms,and an upper level consisting of cognitive phenomena.We will reduce cognitive phenomena to the operation ofneurobiological mechanisms, and show, through simu-lations, how these mechanisms produce emergent cog-nitive phenomena. Of course, our simulations will haveto rely on simplified, abstracted renditions of the neuro-biological mechanisms.

    To help forge links between these two levels of anal-ysis, we have an auxiliary intermediate level consisting

    of principles presented throughout the text. We do notthink that the brain nor cognition can be fully describedby these principles, which is why they play an auxiliaryrole and are shown off to one side of the figure. How-ever, they serve to highlight and make clear the connec-tion between certain aspects of the biology and certainaspects of cognition. Often, these principles are basedon computational-level descriptions of aspects of cog-nition. But, we want to avoid any implication that theseprinciples provide some privileged level of description(i.e., like Marr’s view of the computational level), thattempts us into thinking that data at the two basic em-pirical levels (cognition and neurobiology) are less rele-vant. Instead, these principles are fundamentally shapedby, and help to strike a good balance between, the twoprimary levels of analysis.

    The levels of analysis issue is easily confused withdifferent levels of structure within the nervous system,but these two types of levels are not equivalent. Therelevant levels of structure range from molecules to in-dividual neurons to small groups or columns of neu-rons to larger areas or regions of neurons up to the en-tire brain itself. Although one might be tempted to saythat our cognitive phenomena level of analysis shouldbe associated with the highest structural level (the en-tire brain), and our neurobiological mechanisms level ofanalysis associated with lower structural levels, this isnot really accurate. Indeed, some cognitive phenomenacan be traced directly to properties of individual neu-rons (e.g., that they exhibit a fatiguelike phenomenon ifactivated too long), whereas other cognitive phenom-ena only emerge as a result of interactions among anumber of different brain areas. Furthermore, as weprogress from lower to higher structural levels in suc-cessive chapters of this book, we emphasize that spe-cific computational principles and cognitive phenomenacan be associated with each of these structural levels.Thus, just as there is no privileged level of analysis,there is no privileged structural level — all of these lev-els must be considered in an interactive fashion.

    1.2.4 Scaling Issues

    Having adopted essentially two levels of analysis, weare in the position of using biological mechanisms op-

  • 1.2. BASIC MOTIVATIONS FOR COMPUTATIONAL COGNITIVE NEUROSCIENCE 7

    erating at the level of individual neurons to explaineven relatively complex, high-level cognitive phenom-ena. This raises the question as to why these basicneural mechanisms should have any relevance to un-derstanding something that is undoubtedly the productof millions or even billions of neurons — certainly wedo not include anywhere near that many neurons in oursimulations! This scaling issue relates to the way inwhich we construct a scaled-down model of the realbrain. It is important to emphasize that the need forscaling is at least partially a pragmatic issue having todo with the limitations of currently available computa-tional resources. Thus, it should be possible to put thefollowing arguments to the test in the future as larger,more complex models can be constructed. However,scaled-down models are also easier to understand, andare a good place to begin the computational cognitiveneuroscience enterprise.

    We approach the scaling problem in the followingways.

    � The target cognitive behavior that we expect (and ob-tain) from the models is similarly scaled down com-pared to the complexities of actual human cognition.

    � We show that one of our simulated neurons (units)in the model can approximate the behavior of manyreal neurons, so that we can build models of multi-ple brain areas where the neurons in those areas aresimulated by many fewer units.

    � We argue that information processing in the brainhas a fractal quality, where the same basic proper-ties apply across disparate physical scales. These ba-sic properties are those of individual neurons, which“show through” even at higher levels, and are thusrelevant to understanding even the large-scale behav-ior of the brain.

    The first argument amounts to the idea that our neu-ral network models are performing essentially the sametype of processing as a human in a particular task, buton a reduced problem that either lacks the detailed in-formation content of the human equivalent or representsa subset of these details. Of course, many phenomenacan become qualitatively different as they get scaled up

    a) b)

    Figure 1.3: Illustration of scaling as performed on an image— the original image in (a) was scaled down by a factor of8, retaining only 1/8th of the original information, and thenscaled back up to the same size and averaged (blurred) to pro-duce (b), which captures many of the general characteristicsof the original, but not the fine details. Our models give ussomething like this scaled-down, averaged image of how thebrain works.

    or down along this content dimension, but it seems rea-sonable to allow that some important properties mightbe relatively scale invariant. For example, one couldplausibly argue that each major area of the human cor-tex could be reduced to handle only a small portion ofthe content that it actually does (e.g., by the use of a16x16 pixel retina instead of 16 million x 16 millionpixels), but that some important aspects of the essentialcomputation on any piece of that information are pre-served in the reduced model. If several such reducedcortical areas were connected, one could imagine hav-ing a useful but simplified model of some reasonablycomplex psychological phenomena.

    The second argument can perhaps be stated mostclearly by imagining that an individual unit in the modelapproximates the behavior of a population of essentiallyidentical neurons. Thus, whereas actual neurons are dis-cretely spiking, our model units typically (but not ex-clusively) use a continuous, graded activation signal.We will see in chapter 2 that this graded signal pro-vides a very good approximation to the average num-ber of spikes per unit time produced by a population ofspiking neurons. Of course, we don’t imagine that thebrain is constructed from populations of identical neu-rons, but we do think that the brain employs overlappingdistributed representations, so that an individual modelunit can represent the centroid of a set of such repre-

  • 8 CHAPTER 1. INTRODUCTION AND OVERVIEW

    sentations. Thus, the population can encode much moreinformation (e.g., many finer shades of meaning), andis probably different in other important ways (e.g., itmight be more robust to the effects of noise). A visualanalogy for this kind of scaling is shown in figure 1.3,where the sharp, high-resolution detail of the original(panel a) is lost in the scaled-down version (panel b),but the basic overall structure is preserved.

    Finally, we believe that the brain has a fractal char-acter for two reasons: First, it is likely that, at least inthe cortex, the effective properties of long-range con-nectivity are similar to that of local, short-range con-nectivity. For example, both short and long-range con-nectivity produce a balance between excitation and in-hibition by virtue of connecting to both excitatory andinhibitory neurons (more on this in chapter 3). Thus, amodel based on the properties of short-range connectiv-ity within a localized cortical area could also describe alarger-scale model containing many such cortical areassimulated at a coarser level. The second reason is basi-cally the same as the one given earlier about averagingover populations of neurons: if on average the popula-tion behaves roughly the same as the individual neuron,then the two levels of description are self-similar, whichis what it means to be fractal.

    In short, these arguments provide a basis for opti-mism that models based on neurobiological data canprovide useful accounts of cognitive phenomena, eventhose that involve large, widely distributed areas of thebrain. The models described in this book substanti-ate some of this optimism, but certainly this issue re-mains an open and important question for the compu-tational cognitive neuroscience enterprise. The follow-ing historical perspective on this enterprise provides anoverview of some of the other important issues that haveshaped the field.

    1.3 Historical Context

    Although the field of computational cognitive neuro-science is relatively young, its boundaries are easilyblurred into a large number of related disciplines, someof which have been around for quite some time. In-deed, research in any aspect of cognition, neuroscience,or computation has the potential to make an important

    contribution to this field. Thus, the entire space of thisbook could be devoted to an adequate account of therelevant history of the field. This section is instead in-tended to merely provide a brief overview of some ofthe particularly relevant historical context and motiva-tion behind our approach. Specifically, we focus on theadvances in understanding how networks of simulatedneurons can lead to interesting cognitive phenomena,which occurred initially in the 1960s and then again inthe period from the late ‘70s to the present day. Theseadvances form the main heritage of our approach be-cause, as should be clear from what has been said ear-lier, the neural network modeling approach provides acrucial link between networks of neurons and humancognition.

    The field of cognitive psychology began in the late1950s and early ‘60s, following the domination of thebehaviorists. Key advances associated with this newfield included its emphasis on internal mechanisms formediating cognition, and in particular the use of explicitcomputational models for simulating cognition on com-puters (e.g., problem solving and mathematical reason-ing; Newell & Simon, 1972). The dominant approachwas based on the computer metaphor, which held thathuman cognition is much like processing in a standardserial computer.

    In such systems, which we will refer to as “tra-ditional” or “symbolic,” the basic operations involvesymbol manipulation (e.g., manipulating logical state-ments expressed using dynamically-bound variablesand operators), and processing consists of a sequenceof serial, rule-governed steps. Production systemsbecame the dominant framework for cognitive model-ing within this approach. Productions are essentiallyelaborate if-then constructs that are activated when theirif-conditions are met, and they then produce actions thatenable the firing of subsequent productions. Thus, theseproductions control the sequential flow of processing.As we will see, these traditional, symbolic models serveas an important contrast to the neural-network frame-work, and the two have been in a state of competitionfrom the earliest days of their existence.

    Even though the computer metaphor was dominant,there was also considerable interest in neuronlike pro-cessing during this time, with advances like: (a) the

  • 1.3. HISTORICAL CONTEXT 9

    McCulloch and Pitts (1943) model of neural process-ing in terms of basic logical operations; (b) Hebb’s(1949) theory of Hebbian learning and the cell as-sembly, which holds that connections between coac-tive neurons should be strengthened, joining them to-gether; and (c) Rosenblatt’s (1958) work on the per-ceptron learning algorithm, which could learn fromerror signals. These computational approaches builton fundamental advances in neurobiology, where theidea that the neuron is the primary information process-ing unit of the brain became established (the “neurondoctrine”; Shepherd, 1992), and the basic principlesof neural communication and processing (action poten-tials, synapses, neurotransmitters, ion channels, etc.)were being developed. The dominance of the computermetaphor approach in cognitive psychology was nev-ertheless sealed with the publication of the book Per-ceptrons (Minsky & Papert, 1969), which proved thatsome of these simple neuronlike models had significantcomputational limitations — they were unable to learnto solve a large class of basic problems.

    While a few hardy researchers continued studyingthese neural-network models through the ‘70s (e.g.,Grossberg, Kohonen, Anderson, Amari, Arbib, Will-shaw), it was not until the ‘80s that a few critical ad-vances brought the field back into real popularity. In theearly ‘80s, psychological (e.g., McClelland & Rumel-hart, 1981) and computational (Hopfield, 1982, 1984)advances were made based on the activation dynamicsof networks. Then, the backpropagation learning al-gorithm was rediscovered by Rumelhart, Hinton, andWilliams (1986b) (having been independently discov-ered several times before: Bryson & Ho, 1969; Wer-bos, 1974; Parker, 1985) and the Parallel DistributedProcessing (PDP) books (Rumelhart et al., 1986c; Mc-Clelland et al., 1986) were published, which firmly es-tablished the credibility of neural network models. Crit-ically, the backpropagation algorithm eliminated thelimitations of the earlier models, enabling essentiallyany function to be learned by a neural network. Anotherimportant advance represented in the PDP books was astrong appreciation for the importance of distributedrepresentations (Hinton, McClelland, & Rumelhart,1986), which have a number of computational advan-tages over symbolic or localist representations.

    Backpropagation led to a new wave of cognitive mod-eling (which often goes by the name connectionism).Although it represented a step forward computation-ally, backpropagation was viewed by many as a stepbackward from a biological perspective, because it wasnot at all clear how it could be implemented by bio-logical mechanisms (Crick, 1989; Zipser & Andersen,1988). Thus, backpropagation-based cognitive model-ing carried on without a clear biological basis, causingmany such researchers to use the same kinds of argu-ments used by supporters of the computer metaphor tojustify their approach (i.e., the “computational level”arguments discussed previously). Some would arguethat this deemphasizing of the biological issues madethe field essentially a reinvented computational cogni-tive psychology based on “neuronlike” processing prin-ciples, rather than a true computational cognitive neu-roscience.

    In parallel with the expanded influence of neural net-work models in understanding cognition, there was arapid growth of more biologically oriented modeling.We can usefully identify several categories of this typeof research. First, we can divide the biological mod-els into those that emphasize learning and those thatdo not. The models that do not emphasize learninginclude detailed biophysical models of individual neu-rons (Traub & Miles, 1991; Bower, 1992), information-theoretic approaches to processing in neurons and net-works of neurons (e.g., Abbott & LeMasson, 1993; At-ick & Redlich, 1990; Amit, Gutfreund, & Sompolin-sky, 1987; Amari & Maginu, 1988), and refinementsand extensions of the original Hopfield (1982, 1984)models, which hold considerable appeal due to theirunderlying mathematical formulation in terms of con-cepts from statistical physics. Although this researchhas led to many important insights, it tends to make lessdirect contact with cognitively relevant issues (thoughthe Hopfield network itself provides some centrally im-portant principles, as we will see in chapter 3, and hasbeen used as a framework for some kinds of learning).

    The biologically based learning models have tendedto focus on learning in the early visual system, with anemphasis on Hebbian learning (Linsker, 1986; Miller,Keller, & Stryker, 1989; Miller, 1994; Kohonen, 1984;Hebb, 1949). Importantly, a large body of basic neu-

  • 10 CHAPTER 1. INTRODUCTION AND OVERVIEW

    roscience research supports the idea that Hebbian-likemechanisms are operating in neurons in most cogni-tively important areas of the brain (Bear, 1996; Brown,Kairiss, & Keenan, 1990; Collingridge & Bliss, 1987).However, Hebbian learning is generally fairly computa-tionally weak (as we will see in chapter 5), and suffersfrom limitations similar to those of the 1960s genera-tion of learning mechanisms. Thus, it has not been aswidely used as backpropagation for cognitive modelingbecause it often cannot learn the relevant tasks.

    In addition to the cognitive (connectionist) and bio-logical branches of neural network research, consider-able work has been done on the computational end. Ithas been apparent that the mathematical basis of neu-ral networks has much in common with statistics, andthe computational advances have tended to push thisconnection further. Recently, the use of the Bayesianframework for statistical inference has been applied todevelop new learning algorithms (e.g., Dayan, Hinton,Neal, & Zemel, 1995; Saul, Jaakkola, & Jordan, 1996),and more generally to understand existing ones. How-ever, none of these models has yet been developed tothe point where they provide a framework for learningthat works reliably on a wide range of cognitive tasks,while simultaneously being implementable by a reason-able biological mechanism. Indeed, most (but not all)of the principal researchers in the computational end ofthe field are more concerned with theoretical, statistical,and machine-learning kinds of issues than with cogni-tive or biological ones.

    In short, from the perspective of the computationalcognitive neuroscience endeavor, the field is in a some-what fragmented state, with modelers in computationalcognitive psychology primarily focused on understand-ing human cognition without close contact with theunderlying neurobiology, biological modelers focusedon information-theoretic constructs or computationallyweak learning mechanisms without close contact withcognition, and learning theorists focused at a more com-putational level of analysis involving statistical con-structs without close contact with biology or cogni-tion. Nevertheless, we think that a strong set of cogni-tively relevant computational and biological principleshas emerged over the years, and that the time is ripe foran attempt to consolidate and integrate these principles.

    1.4 Overview of Our Approach

    This brief historical overview provides a useful con-text for describing the basic characteristics of the ap-proach we have taken in this book. Our core mech-anistic principles include both backpropagation-basederror-driven learning and Hebbian learning, the cen-tral principles behind the Hopfield network for interac-tive, constraint-satisfaction style processing, distributedrepresentations, and inhibitory competition. The neu-ral units in our simulations use equations based di-rectly on the ion channels that govern the behavior ofreal neurons (as described in chapter 2), and our neu-ral networks incorporate a number of well-establishedanatomical and physiological properties of the neocor-tex (as described in chapter 3). Thus, we strive to es-tablish detailed connections between biology and cog-nition, in a way that is consistent with many well-established computational principles.

    Our approach can be seen as an integration of anumber of different themes, trends, and developments(O’Reilly, 1998). Perhaps the most relevant such devel-opment was the integration of a coherent set of neuralnetwork principles into the GRAIN framework of Mc-Clelland (1993). GRAIN stands for graded, random,adaptive, interactive, (nonlinear) network. This frame-work was primarily motivated by (and applied to) issuessurrounding the dynamics of activation flow through aneural network. The framework we adopt in this bookincorporates and extends these GRAIN principles byemphasizing learning mechanisms and the architecturalproperties that support them.

    For example, there has been a long-standing desireto understand how more biologically realistic mecha-nisms could give rise to error-driven learning (e.g., Hin-ton & McClelland, 1988; Mazzoni, Andersen, & Jor-dan, 1991). Recently, a number of different frameworksfor achieving this goal have been shown to be vari-ants of a common underlying error propagation mecha-nism (O’Reilly, 1996a). The resulting algorithm, calledGeneRec, is consistent with known biological mecha-nisms of learning, makes use of other biological proper-ties of the brain (including interactivity), and allows forrealistic neural activation functions to be used. Thus,

  • 1.5. GENERAL ISSUES IN COMPUTATIONAL MODELING 11

    this algorithm plays an important role in our integratedframework by allowing us to use the principle of back-propagation learning without conflicting with the desireto take the biology seriously.

    Another long-standing theme in neural network mod-els is the development of inhibitory competition mecha-nisms (e.g., Kohonen, 1984; McClelland & Rumelhart,1981; Rumelhart & Zipser, 1986; Grossberg, 1976).Competition has a number of important functional ben-efits emphasized in the GRAIN framework (which wewill explore in chapter 3) and is generally required forthe use of Hebbian learning mechanisms. It is tech-nically challenging, however, to combine competitionwith distributed representations in an effective manner,because the two tend to work at cross purposes. Never-theless, there are good reasons to believe that the kindsof sparse distributed representations that should in prin-ciple result from competition provide a particularly ef-ficient means for representing the structure of the nat-ural environment (e.g., Barlow, 1989; Field, 1994; Ol-shausen & Field, 1996). Thus, an important part of ourframework is a mechanism of neural competition thatis compatible with powerful distributed representationsand can be combined with interactivity and learning ina way that was not generally possible before (O’Reilly,1998, 1996b).

    The emphasis throughout the book is on the factsof the biology, the core computational principles justdescribed, which underlie most of the cognitive neu-ral network models that have been developed to date,and their interrelationship in the context of a range ofwell-studied cognitive phenomena. To facilitate andsimplify the hands-on exploration of these ideas by thestudent, we take advantage of a particular implementa-tional framework that incorporates all of the core mech-anistic principles called Leabra (local, error-driven andassociative, biologically realistic algorithm). Leabra ispronounced like the astrological sign Libra, which em-phasizes the balance between many different objectivesthat is achieved by the algorithm.

    To the extent that we are able to understand a widerange of cognitive phenomena using a consistent set ofbiological and computational principles, one could con-sider the framework presented in this book to be a “firstdraft” of a coherent framework for computational cog-

    nitive neuroscience. This framework provides a usefulconsolidation of existing ideas, and should help to iden-tify the limitations and problems that will need to besolved in the future.

    Newell (1990) provided a number of arguments in fa-vor of developing unified theories of cognition, manyof which apply to our approach of developing a co-herent framework for computational cognitive neuro-science. Newell argued that it is relatively easy (andthus relatively uninformative) to construct specializedtheories of specific phenomena. In contrast, one en-counters many more constraints by taking on a widerrange of data, and a theory that can account for thisdata is thus much more likely to be true. Given that ourframework bears little resemblance to Newell’s SOARarchitecture, it is clear that just the process of makinga unified architecture does not guarantee convergenceon some common set of principles. However, it is clearthat casting a wider net imposes many more constraintson the modeling process, and the fact that the single setof principles can be used to model the wide range ofphenomena covered in this book lends some measure ofvalidity to the undertaking.

    Chomsky (1965) and Seidenberg (1993) also dis-cussed the value of developing explanatory theories thatexplain phenomena in terms of a small set of indepen-dently motivated principles, in contrast with descriptivetheories that essentially restate phenomena.

    1.5 General Issues in Computational Modeling

    The preceding discussion of the benefits of a unifiedmodel raises a number of more general issues regardingthe benefits of computational modeling1 as a method-ology for cognitive neuroscience. Although we thinkthe benefits generally outweigh the disadvantages, it isalso important to be cognizant of the potential traps andproblems associated with this methodology. We willjust provide a brief summary of these advantages andproblems here.

    1We consider both models that are explicitly simulated on a com-puter and more abstract mathematical models to be computationalmodels, in that both are focused on the computational processing ofinformation in the brain.

  • 12 CHAPTER 1. INTRODUCTION AND OVERVIEW

    Advantages:

    Models help us to understand phenomena. A com-putational model can provide novel sources of insightinto behavior, for example by providing a counter-intuitive explanation of a phenomenon, or by rec-onciling seemingly contradictory phenomena (e.g.,by complex interactions among components). Seem-ingly different phenomena can also be related to eachother in nonobvious ways via a common set of com-putational mechanisms.

    Computational models can also be lesioned and thentested, providing insight into behavior following spe-cific types of brain damage, and in turn, into normalfunctioning. Often, lesions can have nonobvious ef-fects that computational models can explain.

    By virtue of being able to translate between func-tional desiderata and the biological mechanisms thatimplement them, computational models enable us tounderstand not just how the brain is structured, butwhy it is structured in the way it is.

    Models deal with complexity. A computational mo-del can deal with complexity in ways that verbal ar-guments cannot, producing satisfying explanations ofwhat would otherwise just be vague hand-wavy ar-guments. Further, computational models can handlecomplexity across multiple levels of analysis, allow-ing data across these levels to be integrated and re-lated to each other. For example, the computationalmodels in this book show how biological propertiesgive rise to cognitive behaviors in ways that would beimpossible with simple verbal arguments.

    Models are explicit. Making a computational modelforces you to be explicit about your assumptions andabout exactly how the relevant processes actuallywork. Such explicitness carries with it many poten-tial advantages.

    First, explicitness can help in deconstructing psycho-logical concepts that may rely on homunculi to dotheir work. A homunculus is a “little man,” and manytheories of cognition make unintended use of themby embodying particular components (often “boxes”)of the theory with magical powers that end up doingall the work in the theory. A canonical example is

    the “executive” theory of prefrontal cortex function:if you posit an executive without explaining how itmakes all those good decisions and coordinates allthe other brain areas, you haven’t explained too much(you might as well just put pinstripes and a tie on thebox).

    Second, an explicitly specified computational modelcan be run to generate novel predictions. A compu-tational model thus forces you to accept the conse-quences of your assumptions. If the model must bemodified to account for new data, it becomes veryclear exactly what these changes are, and the scien-tific community can more easily evaluate the result-ing deviance from the previous theory. Predictionsfrom verbal theories can be tenuous due to lack ofspecificity and the flexibility of vague verbal con-structs.

    Third, explicitness can contribute to a greater appre-ciation for the complexities of otherwise seeminglysimple processes. For example, before people tried tomake explicit computational models of object recog-nition, it didn’t seem that difficult or interesting aproblem — there is an anecdotal story about a scien-tist in the ‘60s who was going to implement a modelof object recognition over the summer. Needless tosay, he didn’t succeed.

    Fourth, making a computational model forces you toconfront aspects of the problem that you might haveotherwise ignored or considered to be irrelevant. Al-though one sometimes ends up using simplificationsor stand-ins for these other aspects (see the list ofproblems that follows), it can be useful to at leastconfront these problems.

    Models allow control. In a computational model youcan control many more variables much more pre-cisely than you can with a real system, and you canreplicate results precisely. This enables you to ex-plore the causal role of different components in waysthat would otherwise be impossible.

    Models provide a unified framework. As we dis-cussed earlier, there are many advantages to using asingle computational framework to explain a rangeof phenomena. In addition to providing a more strin-gent test of a theory, it encourages parsimony and

  • 1.5. GENERAL ISSUES IN COMPUTATIONAL MODELING 13

    also enables one to relate two seemingly disparatephenomena by understanding them in light of acommon set of basic principles.

    Also, it is often difficult for people to detect incon-sistency in a purely verbal theory — we have a hardtime keeping track of everything. However, a compu-tational model reveals inconsistencies quite readily,because everything has to hang together and actuallywork.

    Problems:

    Models are too simple. Models, by necessity, involvea number of simplifications in their implementation.These simplifications may not capture all of the rele-vant details of the biology, the environment, the task,and so on, calling into question the validity of themodel.

    Inevitably, this issue ends up being an empirical onethat depends on how wrong the simplifying assump-tions are and how much they influence the results.It is often possible for a model to make a perfectlyvalid point while using a simplified implementationbecause the missing details are simply not relevant— the real system will exhibit the same behavior forany reasonable range of detailed parameters. Fur-thermore, simplification can actually be an importantbenefit of a model — a simple explanation is easier tounderstand and can reveal important truths that mightotherwise be obscured by details.

    Models are too complex. On the flip side, other criticscomplain that models are too complex to understandwhy they behave the way they do, and so they con-tribute nothing to our understanding of human behav-ior. This criticism is particularly relevant if a modelertreats a computational model as a theory, and it pointsto the mere fact that the model reproduces a set ofdata as an explanation of this data.

    However, this criticism is less relevant if the mod-eler instead identifies and articulates the critical prin-ciples that underly the model’s behavior, and demon-strates the relative irrelevance of other factors. Thus,a model should be viewed as a concrete instantiation

    of broader principles, not as an end unto itself, andthe way in which the model “uses” these principlesto account for the data must be made clear. Unfor-tunately, this essential step of making the principlesclear and demonstrating their generality is often nottaken. This can be a difficult step for complex mod-els (which is, after all, one of the advantages of mod-eling in the first place!), but one made increasinglymanageable with advances in techniques for analyz-ing models.

    Models can do anything. This criticism is inevitablyleveled at successful models. Neural network mod-els do have a very large number of parameters in theform of the adaptable weights between units. Also,there are many degrees of freedom in the architec-ture of the model, and in other parameters that deter-mine the behavior of the units. Thus, it might seemthat there are so many parameters available that fit-ting any given set of behavioral phenomena is unin-teresting. Relatedly, because of the large number ofparameters, sometimes multiple different models canprovide a reasonable account of a given phenomenon.How can one address this indeterminacy problem todetermine which is the “correct” model?

    The general issues of adopting a principled, explana-tory approach are relevant here — to the extent thatthe model’s behavior can be understood in terms ofmore general principles, the success of the modelcan be attributed to these principles, and not just torandom parameter fitting. Also, unlike many otherkinds of models, many of the parameters in the net-work (i.e., the weights) are determined by principledlearning mechanisms, and are thus not “free” for themodeler to set. In this book, most of the models usethe same basic parameters for the network equations,and the cases where different parameters were usedare strongly motivated.

    The general answer to the indeterminacy problemis that as you apply a model to a wider range ofdata (e.g., different tasks, newly discovered biolog-ical constraints), and in greater detail on each task(e.g., detailed properties of the learning process), themodels will be much more strenuously tested. It thusbecomes much less likely that two different models

  • 14 CHAPTER 1. INTRODUCTION AND OVERVIEW

    can fit all the data (unless they are actually isomor-phic in some way).

    Models are reductionistic. One common concern isthat the mechanistic, reductionistic models can nevertell us about the real essence of human cognition. Al-though this will probably remain a philosophical is-sue until very large-scale models can be constructedthat actually demonstrate realistic, humanlike cogni-tion (e.g., by passing the Turing test), we note that re-constructionism is a cornerstone of our approach. Re-constructionism complements reductionism by tryingto reconstruct complex phenomena in terms of the re-duced components.

    Modeling lacks cumulative research. There seems tobe a general perception that modeling is somehowless cumulative than other types of research. Thisperception may be due in part to the relative youthand expansive growth of modeling — there has beena lot of territory to cover, and a breadth-first searchstrategy has some obvious pragmatic benefits for re-searchers (e.g., “claiming territory”). As the field be-gins to mature, cumulative work is starting to appear(e.g., Plaut, McClelland, Seidenberg, & Patterson,1996 built on earlier work by Seidenberg & McClel-land, 1989, which in turn built on other models) andthis book certainly represents a very cumulative andintegrative approach.

    The final chapter in the book will revisit some ofthese issues again with the benefit of what comes in be-tween.

    1.6 Motivating Cognitive Phenomena and TheirBiological Bases

    Several aspects of human cognition are particularly sug-gestive of the kinds of neural mechanisms described inthis text. We briefly describe some of the most impor-tant of these aspects here to further motivate and high-light the connections between cognition and neurobiol-ogy. However, as you will discover, these aspects ofcognition are perhaps not the most obvious to the av-erage person. Our introspections into the nature of ourown cognition tend to emphasize the “conscious” as-pects (because this is by definition what we are aware

    of), which appear to be serial (one thought at a time)and focused on a subset of things occurring inside andoutside the brain. This fact undoubtedly contributed tothe popularity of the standard serial computer model forunderstanding human cognition, which we will use as apoint of comparison for the discussion that follows.

    We argue that these conscious aspects of human cog-nition are the proverbial “tip of the iceberg” floatingabove the waterline, while the great mass of cognitionthat makes all of this possible floats below, relativelyinaccessible to our conscious introspection. In the ter-minology of Rumelhart et al. (1986c), neural networksfocus on the microstructure of cognition. Attempts tounderstand cognition by only focusing on what’s “abovewater” may be difficult, because all the underwater stuffis necessary to keep the tip above water in the first place— otherwise, the whole thing will just sink! To pushthis metaphor to its limits, the following are a few illu-minating shafts of light down into this important under-water realm, and some ideas about how they keep the“tip” afloat. The aspects of cognition we will discussare:

    � Parallelism

    � Gradedness

    � Interactivity