21
Bilingualism: Language and Cognition http://journals.cambridge.org/BIL Additional services for Bilingualism: Language and Cognition: Email alerts: Click here Subscriptions: Click here Commercial reprints: Click here Terms of use : Click here Tests of a dualsystem model of speech category learning W. TODD MADDOX and BHARATH CHANDRASEKARAN Bilingualism: Language and Cognition / Volume 17 / Issue 04 / October 2014, pp 709 728 DOI: 10.1017/S1366728913000783, Published online: 17 January 2014 Link to this article: http://journals.cambridge.org/abstract_S1366728913000783 How to cite this article: W. TODD MADDOX and BHARATH CHANDRASEKARAN (2014). Tests of a dualsystem model of speech category learning. Bilingualism: Language and Cognition, 17, pp 709728 doi:10.1017/S1366728913000783 Request Permissions : Click here Downloaded from http://journals.cambridge.org/BIL, IP address: 146.6.232.157 on 02 Oct 2014

Bilingualism:Languageand Cognition€¦ · Additionalservicesfor Bilingualism:Languageand Cognition: Emailalerts: Clickhere Subscriptions: Clickhere Commercialreprints: Clickhere

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Bilingualism:Languageand Cognition€¦ · Additionalservicesfor Bilingualism:Languageand Cognition: Emailalerts: Clickhere Subscriptions: Clickhere Commercialreprints: Clickhere

Bilingualism:  Language  and  Cognitionhttp://journals.cambridge.org/BIL

Additional  services  for  Bilingualism:  Language  and  Cognition:

Email  alerts:  Click  hereSubscriptions:  Click  hereCommercial  reprints:  Click  hereTerms  of  use  :  Click  here

Tests  of  a  dual-­system  model  of  speech  category  learning

W.  TODD  MADDOX  and  BHARATH  CHANDRASEKARAN

Bilingualism:  Language  and  Cognition  /  Volume  17  /  Issue  04  /  October  2014,  pp  709  -­  728DOI:  10.1017/S1366728913000783,  Published  online:  17  January  2014

Link  to  this  article:  http://journals.cambridge.org/abstract_S1366728913000783

How  to  cite  this  article:W.  TODD  MADDOX  and  BHARATH  CHANDRASEKARAN  (2014).  Tests  of  a  dual-­system  model  of  speech  categorylearning.  Bilingualism:  Language  and  Cognition,  17,  pp  709-­728  doi:10.1017/S1366728913000783

Request  Permissions  :  Click  here

Downloaded  from  http://journals.cambridge.org/BIL,  IP  address:  146.6.232.157  on  02  Oct  2014

Page 2: Bilingualism:Languageand Cognition€¦ · Additionalservicesfor Bilingualism:Languageand Cognition: Emailalerts: Clickhere Subscriptions: Clickhere Commercialreprints: Clickhere

Bilingualism: Language and Cognition 17 (4), 2014, 709–728 C⃝ Cambridge University Press 2014 doi:10.1017/S1366728913000783

Tests of a dual-system model ofspeech category learning∗

W. TO D D M A D D OXB H A R AT H C H A N D R A S E K A R A NUniversity of Texas, Austin

(Received: March 18, 2013; final revision received: November 7, 2013; accepted: November 19, 2013; first published online 17 January 2014)

In the visual domain, more than two decades of work has argued for the existence of dual category learning systems. TheREFLECTIVE system uses working memory in an explicit fashion to develop and test rules for classifying. The REFLEXIVE

system operates by implicitly associating perception with actions that lead to reinforcement. Dual-system models posit that inlearning natural categories, learners initially use the reflective system and with practice, transfer control to the reflexivesystem. The role of reflective and reflexive systems in second language (L2) speech learning has not been systematicallyexamined. In the study reported in this paper, monolingual native speakers of American English were trained to categorizeMandarin tones produced by multiple speakers. Our computational modeling approach demonstrates that learners usereflective and reflexive strategies during tone category learning. Successful learners use speaker-dependent, reflectiveanalysis early in training and reflexive strategies by the end of training. Our results demonstrate that dual-learning systemsare operative in L2 speech learning. Critically, learner strategies directly relate to individual differences in successfulcategory learning.

Keywords: L2 acquisition, dual-learning systems, reflexive processing, reflective processing, individual differences

Introduction

A large body of neuropsychological, neuroimaging, andbehavioral studies in vision has identified two separatesystems that are operative during category learning:a REFLECTIVE system, in which processing is underconscious control, and a REFLEXIVE system that is notunder conscious control (Ashby & Maddox, 2011; Ashby& Spiering, 2004; Maddox & Ashby, 2004; Nomura,Maddox, Filoteo, Ing, Gitelman, Parrish, Mesulam &Reber, 2007; Nomura & Reber, 2008; Poldrack, Clark,Pare-Blagoev, Shohamy, Creso Moyano, Myers & Gluck,2001; Poldrack & Foerde, 2008; Poldrack & Packard,2003). The reflective learning system uses workingmemory (WM) and executive attention to develop andtest verbalizable rules based on feedback (Ashby, Maddox& Bohil, 2002; Maddox & Ashby, 2004; Maddox,Filoteo, Lauritzen, Connally & Hejl, 2005; Maddox,Love, Glass & Filoteo, 2008). In contrast, the reflexivelearning system is neither consciously penetrable norverbalizable, and operates by automatically associatingperception with actions that lead to reward (Seger, 2008;Seger & Cincotta, 2005; Seger & Miller, 2010). Thissystem is not dependent on WM and executive attention(DeCaro, Thomas & Beilock, 2008): it is implicit andprocedural. Although there is anatomical evidence ofextensive connectivity between auditory regions and thereflective and reflexive systems (Petrides & Pandya, 1988;

∗ The two authors contributed equally to the paper. This research wassupported by NIMH grants MH077708 and DA032457 to WTM. Wethank the Maddox Lab RAs for data collection.

W. Todd Maddox, Department of Psychology, University of Texas, 1 University Station, A8000, Austin, TX [email protected]

Yeterian & Pandya, 1998), the dual-system frameworkhas not been systematically applied in the auditorydomain.

This paper applies the dual-system theoretical frame-work to speech category learning, using computationalmodeling to examine the learning of natural speechcategories (Mandarin tones) by adult English speakers.Computational modeling provides insight into the specificlearning operations (e.g. reflective vs. reflexive) thatparticipants employ (Cleeremans & Dienes, 2008) andcan account for individual variability in task performance.Although accuracy rates provide information about thelevel of performance, they tell us little about the specificstrategy being used by a participant because there are anumber of reflexive and reflective strategies that yield thesame accuracy rate. The computational models used inthe current study are constrained by our understanding ofthe neurobiology of the reflective and reflexive learningsystems. Modeling therefore allows a more systematicexamination of the computational strategies learnersuse while acquiring novel speech categories. We turnnext to the neurobiology underlying the two learningsystems.

Dual-learning systems: Neurobiology

The COMPETITION BETWEEN VERBAL AND IMPLICIT

SYSTEMS (COVIS) model captures the dual-systemframework, and proposes a neural circuitry involvedin visual category learning (Ashby & Maddox, 2011;Maddox & Ashby, 2004; Nomura & Reber, 2008).

Page 3: Bilingualism:Languageand Cognition€¦ · Additionalservicesfor Bilingualism:Languageand Cognition: Emailalerts: Clickhere Subscriptions: Clickhere Commercialreprints: Clickhere

710 W. Todd Maddox and Bharath Chandrasekaran

Figure 1. (Colour online) Artificial category structures (left panel(a) rule-based (RB), right panel (b) information-integration(II)) used to study dissociations between reflective and reflexive learning systems.

In COVIS, processing in the reflective, hypothesis-testing system is available to conscious awareness andis mediated by a circuit primarily involving structuresin the dorsolateral prefrontal cortex, anterior caudatenucleus, anterior cingulate, and medial temporal lobe.Processing in the reflexive, procedural-based learningsystem is not consciously penetrable and operatesby associating perception with actions that lead toreinforcement. Learning in the procedural system ismediated primarily by the posterior caudate nucleus andputamen (Ashby & Maddox, 2005; Ashby & O’Brien,2005). COVIS assumes that the two systems competethroughout learning, but that initial learning is dominatedby the reflective, hypothesis-testing system. Learners willcontinue to use the hypothesis-testing system until theoutput of the procedural system is more accurate, atwhich point control is passed to the reflexive, proceduralsystem.

Anatomical studies in animal models suggest that theprimary and auditory association cortical regions arestrongly connected to the reflective and reflexive systems.Retrograde anatomical labeling studies in primatesdemonstrate that the primary and association cortexesare bi-directionally connected to the prefrontal cortex andform many-to-one projections to the caudate (Petrides &Pandya, 1988; Yeterian & Pandya, 1998). These studieslend neurobiological plausibility to the operation of adual-system framework in the auditory domain. Note thatextant dual-system models restrict themselves to the visualdomain (Ashby & Ell, 2001; Ashby & Ennis, 2006; Ashby& Maddox, 2005, 2011), although the underlying systemsare assumed to be domain-general.

Dual-system framework: Category structures

COVIS assumes that the reflective and reflexive systemsare complementary and that each is dedicated tothe learning of different types of category structures.Category structures that are learnable by the reflectivesystem, like those shown in Figure 1a, are referred to asrule-based (RB) structures (Ashby & Maddox, 2011). Theoptimal verbal rule (denoted by the solid horizontal andvertical lines) is to “respond A to short, high frequencysounds, B to short, low frequency sounds, C to long,high frequency sounds, and D to long, low frequencysounds”. In contrast, the structure in Figure 1b is notlearnable by the reflective system because the optimalrule (represented by the solid diagonal lines) is notverbalizable (because frequency and duration involveincommensurable units). However, such structures canstill be learned and are referred to as information-integration (II) category structures. Participants who learnthe II category structure are unable to articulate any verbalrules used to learn these categories, but likely integratedimensions prior to decision-making.

These artificial category structures have been usedto test the extent to which the two learning systemsare separate. A large number of dissociations have beenfound between the two learning systems in the visualdomain. One such critical difference is the reliance onWM (DeCaro, Carlson, Thomas & Beilock, 2009; DeCaroet al., 2008). The reflective learning system criticallydepends on WM, while the reflexive learning systemdoes not. Individuals with high working memory capacity(WMC) learn rule-based categories faster than those with

Page 4: Bilingualism:Languageand Cognition€¦ · Additionalservicesfor Bilingualism:Languageand Cognition: Emailalerts: Clickhere Subscriptions: Clickhere Commercialreprints: Clickhere

A dual-system model of speech category learning 711

low WMC (DeCaro et al., 2008, 2009). Interestingly, thispattern does not hold true for II category structures. Whileexperimenter-determined artificial category structures area good test-bed on which to examine dissociationsbetween the two systems, most naturally occurringcategories cannot be cleanly demarcated as either RB orII categories.

The dual-system framework in second languageacquisition (SLA)

While the dual-system framework has not beensystematically examined in L2 speech category learning,there exists a large body of work that focuses onthe role of explicit and implicit learning systems inSLA (see DeKeyser, 2008 for an extensive review).The mechanistic role of multiple learning systems hasgenerated considerable interest as well as controversy,both in the empirical and practical (e.g. second languageinstruction) domain (Hulstijn, 2005; Krashen, 1982;Schmidt, 1995, 2012) and in the theoretical domain(e.g. models of SLA) (Paradis, 1985, 2004; Ullman,2004, 2006). Early models argue for the existenceof conscious and subconscious systems in languageacquisition, with the subconscious system more criticalto language learning. For example, Krashen’s Monitormodel argues that instruction may not be optimal forall language learners, because complex language rulesare mostly acquired subconsciously (Krashen, 1982). Incontrast, other accounts (such as the Noticing hypothesis)suggest that conscious learning processes are criticalto SLA (Schmidt, 1995, 2012). More recent dual-system models argue for different roles for explicitand implicit learning systems in mediating differentcomponents of language. In the DP (Declarative-Procedural) model (Ullman, 2004, 2006) (for a relatedmodel and application see McClelland, McNaughton &O’Reilly, 1995) a declarative system subserves vocabularylearning, and a more automatic, procedural system,mediated by the basal ganglia, subserves key aspectsof grammar processing. Learning by the declarativesystem involves fast mapping, while learning by theprocedural system is more gradual, requiring moreextensive practice. In SLA, the DP model predicts thatthe fast-mapping declarative system initially dominatesduring grammar processing, making L2 processingeffortful and less automatic (Hernandez & Li, 2007). Withpractice and training (such as in advanced L2 learners),the more gradual procedural learning system becomesincreasingly dominant, allowing more automaticity in L2processing. The neurolinguistic theory of bilingualismalso hypothesizes the existence of declarative andprocedural learning systems that mediate metalinguisticand implicit linguistic competence, respectively (Paradis,2004). In this model, implicit linguistic competence is

achieved via incidental learning. It proposes that L2learners have maximum difficulty acquiring languagecomponents that are implicitly acquired. While much ofthe literature on implicit vs. explicit learning processes hasfocused on morphology, syntax, or vocabulary learning,much less is known about the role of the two systemsin L2 phonetic acquisition. Here we extend a dual-system approach to understanding phonetic learning. Ourapproach differs from existing dual-system models inthe language domain in several important ways. In SLAliterature, the declarative system, mediated by the medialtemporal lobe, is argued to play a critical role in explicitprocessing. The neural bases of the implicit system are lessclearly delineated. For example, the DP model proposesthe basal ganglia as a possible structure for procedurallearning. Others suggest a non-specific locus for implicitprocesses (Reber, 2013), or view implicit processes asa form of computing statistics (Evans, Saffran & Robe-Torres, 2009; Hay, Pelucchi, Graf Estes & Saffran, 2011;Romberg & Saffran, 2010; Saffran, Aslin & Newport,1996). In our dual-system theoretical framework, theprefrontal cortex (PFC) is the primary system involved inexplicit, RB learning. The PFC, in coordination with thehead of the caudate and the anterior cingulate, is importantin generating, testing, and revising hypotheses, basedon feedback. When rules are complex, the MTL-baseddeclarative memory system keeps track of hypothesesthat have been tested and those that have been rejected.The implicit reflexive learning system in our modelis instantiated within anatomic regions in the striatum.Importantly, the computational modeling used in thecurrent study is neurobiologically constrained, based onknowledge about the reflective/reflexive systems and theircommunication with the primary and secondary sensoryregions.

Learning L2 speech categories as adults

A significant challenge in speech perception and learningis the mapping of highly variable signals, producedby multiple speakers, to relevant categories, whichcan be likened to a categorization problem (Holt &Lotto, 2008, 2010). Electrophysiological work arguesfor the existence of stored representations of speechcategories that are language-specific (Cheour, Ceponiene,Lehtokoski, Luuk, Allik, Alho & Näätänen, 1998;Näätänen, Lehtokoski, Lennes, Cheour, Huotilainen,Iivonen, Vainio, Alku, Ilmoniemi, Luuk, Allik, Sinkkonen& Alho, 1997). In existing neuroscientific models,these category representations are instantiated withinauditory association regions in the posterior superiortemporal gyrus (Hickok & Poeppel, 2004, 2007). Inneuroimaging studies examining L2 speech learning,short-term auditory training has been shown to enhance

Page 5: Bilingualism:Languageand Cognition€¦ · Additionalservicesfor Bilingualism:Languageand Cognition: Emailalerts: Clickhere Subscriptions: Clickhere Commercialreprints: Clickhere

712 W. Todd Maddox and Bharath Chandrasekaran

activation of the auditory association areas (Wang, Sereno,Jongman & Hirsch, 2003).

The mechanisms underlying speech category learninghave been a focus of several studies. Previous researchhas put forward a number of reasons for difficulties in L2speech learning, attributing it to various levels of speechprocessing, including interference caused by existingspeech categories, and interference due to a “warping”of auditory-perceptual space by prior experience withnative speech categories (Best, 1993; Best, Morrongiello& Robson, 1981; Best & Tyler, 2007; Flege, 1999; Francis,Ciocca, Ma & Fenn, 2008; Francis & Nusbaum, 2002).Laboratory training studies have typically used trial-by-trial feedback and high variability (multiple speakers)training to teach L2 speech categories (Bradlow, Akahane-Yamada, Pisoni & Tohkura, 1999; Lim & Holt, 2011;Lively, Pisoni, Yamada, Tohkura & Yamada, 1994;Tricomi, Delgado, McCandliss, McClelland & Fiez, 2006;Zhang, Kuhl, Imada, Iverson, Pruitt, Stevens, Kawakatsu,Tohkura & Nemoto, 2009). Feedback enhances learningby reducing errors, and multiple-speaker training results inlearners refocusing their attention to cues that are relevantfor distinguishing speech categories and/or reducingattention to irrelevant cues (Bradlow, 2008). Althoughunsupervised training results in some speech learning inadults, the addition of feedback results in substantiallylarger learning gains (Goudbeek, Cutler & Smits, 2008;McClelland, Fiez & McCandliss, 2002; Vallabha &McClelland, 2007). Studies have also examined the role ofhigh-variability training in speech learning. Training withmultiple speakers leads to better real-world generalizationthan single-speaker training (Bradlow, 2008; Lively,Logan & Pisoni, 1993; Lively et al., 1994). This suggeststhat high variability training may be a good way ofcreating robust representations of categories that areresistant to variation across speakers. However, theability to successfully learn with high speaker variabilityduring training may depend on the learner. For example,Perrachione, Lee, Ha and Wong (2011) showed that highvariability training was beneficial to some individuals, butothers performed better when exposed to less variabilityduring training.

Several speech perceptual learning studies haveexamined the role of speaker normalization processesin speech category learning (Kraljic & Samuel, 2005,2006; Samuel & Kraljic, 2009). These studies showthat adults continually adjust their phonemic categoriesto incorporate phonetic variations produced by novelspeakers. Speaker-dependent perceptual analysis duringcategorization is driven by the nature of the speechcategories (e.g. fricatives vs. stops). For fricatives, thespectral mean is a critical acoustic cue differentiatingcategories (e.g. [sh] vs. [s]: [sh] has a lower spectral mean);this cue also correlates with sex differences (male speakershave a lower spectral mean than female speakers).

Perceptual training resulted in sex-specific adjustmentof [s] and [sh] categories (Kraljic & Samuel, 2007). Incontrast, perceptual training leading to adjustment of stops(cued by voice-onset time, a timing cue that is not gender-specific) was not gender-specific.

Taken together, these studies suggest that feedback andspeaker variability lead to significant L2 speech learning.While much of this research has focused on the mechanicsof the perceptual system in speech learning, much less isknown about the role of the dual category learning system,which previous studies suggest is critical to learning RBand II category structures. This leads us to an importantpoint: are speech categories similar to RB categorystructures, or II category structures? Speech categories aretypically difficult to verbalize, have multiple dimensions,and are highly variable. Generating and testing hypothesisfor categories involving many dimensions is resource-intensive. Since the reflective system is dependenton WM and attention, generating rules/hypotheses formultiple dimensions may not be efficient. In addition,the redundancy and variability of cues available duringspeech perception prevents a simple one-to-one mappingof cues to categories. These suggest that reflexive learningmay be optimal for speech categories. Does this meanthere is no role for the reflective system during L2 speechlearning? Speaker-dependent analyses engage WM andexecutive attention resources (Wong, Nusbaum & Small,2004). Therefore, it is possible that resolving speakervariability in a multi-speaker training paradigm maysignificantly engage the reflective system. Our hypothesisis therefore that speech learning is reflexive-optimal,but may involve some reflective analysis. During naturalvisual category learning, the dual-system frameworkassumes that the reflective and reflexive learning systemscompete throughout learning for control (Ashby &Maddox, 2011). Early learning is mostly reflective andinvolves actively testing hypotheses and using feedbackto validate or invalidate rules. With practice, learnersswitch to the more automatic, reflexive learning if theoutput of the reflexive system is more accurate than thereflective system. In line with dual-system prediction,we propose that resolving speaker variability is atleast partially a reflective process. In contrast, learningnatural speech categories, we argue, is reflexive-optimal.We examine these hypotheses across two experimentsinvolving Mandarin tone category learning by nativeEnglish speakers with no prior experience with tonelanguages.

The current study

In the current study we employ trial-by-trial feedbackand high-speaker variability to examine the computationalstrategies that native English adult speakers employ whilelearning non-native Mandarin tone category structures.

Page 6: Bilingualism:Languageand Cognition€¦ · Additionalservicesfor Bilingualism:Languageand Cognition: Emailalerts: Clickhere Subscriptions: Clickhere Commercialreprints: Clickhere

A dual-system model of speech category learning 713

Figure 2. (Left panel (a)) Sample fundamental frequency contours of four Mandarin tones (T1: high level; T2: high rising;T3: low dipping; T4: high falling) produced by a male native Mandarin speaker used in the experiment. (Right panel (b)) Thefour tones plotted in a two-dimensional perceptual space (x-axis: pitch height, y-axis: pitch direction). Pitch height (dim. 1)and pitch direction (dim. 2) are major cues used to distinguish the tone categories.

Mandarin Chinese has four tone categories (ma1 ‘mother’[T1], ma2 ‘hemp’ [T2], ma3 ‘horse’ [T3], ma4 ‘scold’[T4]), described phonetically as high level, high rising,low dipping, and high falling, respectively (Figure 2a).Native English speakers find it particularly difficult tolearn tone categories (Wang, Jongman & Sereno, 2003),and although training can enhance their tone identificationand discrimination, such training paradigms have typicallyresulted in significant inter-individual differences inlearning success (Perrachione, Lee, Ha & Wong, 2011).

Previous speech learning studies have typically reliedon behavioral measures of accuracy to examine categorylearning. This is problematic, since the same accuracyrate can often be achieved by using qualitatively differentstrategies (e.g., reflective or reflexive). We apply reflectiveand reflexive computational models to a) determine theextent to which speech category learning is mediated byreflective and reflexive learning, and b) examine the sourceof individual differences in category learning success.Our working hypothesis, consistent with dual-systempredictions, is that successful learners initially rely on thereflective learning system to perform speaker-dependentanalyses, and switch to the reflexive learning system bythe end of training. We will expand on our predictions inthe next section.

Tone category learning: Category structure

A number of dimensions, such as pitch height or pitchdirection, may serve as cues to tone identification. Theperceptual saliency of these dimensions may be influencedby the presence of specific types of pitch patterns ina language’s tonal inventory (Gandour, 1978, 1983) aswell as by the occurrence of abstract tonal rules inthe listener’s phonological system (Hume & Johnson,2001). Multidimensional scaling analysis of dissimilarity

judgments, reaction time measures, and event-relatedresponses, converges on two primary dimensions thatunderlie the tone space: pitch height and pitch direction(Figure 2b).

Native speakers of Mandarin Chinese emphasizepitch direction more than pitch height. In contrast,English listeners place less emphasis on pitch directionin disambiguating tone categories. This is consistentwith cue-weighting theories, which suggest that categorylearning difficulties in adults may be due to reducedemphasis on critical dimensions that are more invariant,and greater reliance on dimensions that are less invariant(Francis et al., 2008). This is consistent with reasonsgiven for learner difficulties with other speech categories.For example, one reason given for difficulties in learningthe /l/ vs. /r/ category difference in Japanese listenersis reduced emphasis on the third formant, a criticaldimension that differentiates the two categories (Hattori &Iverson, 2009; Iverson, Kuhl, Akahane-Yamada, Diesch,Tohkura, Kettermann & Siebert, 2003).

In Figure 3a, we plot the 80 stimuli used in ourexperiments (five consonant-vowel segments x fourspeakers x four tones) along two dimensions (pitchheight: average fundamental frequency (x-axis) and pitchdirection: slope (y-axis)). A visual inspection of thisspace suggests that this category structure is most likelyinformation-integration (compare with Figure 1b), andtherefore most likely learned by the reflexive learningsystem. Of the two major acoustic dimensions, pitchheight provides information about speaker identityand gender (a simple verbalizable strategy, e.g. malevs. female). Across languages, high pitch is typicallyaligned with female speakers, and a low pitch withmale speakers. Therefore, in a high-variability trainingenvironment involving male and female speakers, theinitial reflective strategy that learners may attempt is one

Page 7: Bilingualism:Languageand Cognition€¦ · Additionalservicesfor Bilingualism:Languageand Cognition: Emailalerts: Clickhere Subscriptions: Clickhere Commercialreprints: Clickhere

714 W. Todd Maddox and Bharath Chandrasekaran

Figure 3. (Colour online) (Top panel (a)) In the tonecategory training paradigm, we use 80 stimuli (5 segments x4 speakers x 4 tones) that are plotted on thetwo-dimensional space (pitch height, direction). In themiddle (b) and lower (c) panel, the stimuli are separated bymale and female talkers. Within the male (b) and female (c)perceptual spaces, category separation is clearer than theperceptual space that uses all speakers (a).

that creates gender-dependent perceptual spaces.This is consistent with the hypothesis of a previousstudy that showed gender-specific perceptual learningfor category structures cued by dimensions that are alsoinvolved in processing sex differences across speakers(Samuel & Kraljic, 2009). In line with this, we suggestthat separating the perceptual space on the basis ofthe sex of the speaker (Figures 3b and 3c) is a simpleverbalizable strategy (“male” vs. “female”) that listenersuse. As Figures 3b and 3c show, separating the perceptualspace by sex of speaker allows for significantly lessoverlap between tone categories, leading to less categoryconfusion. Also, within the male and female perceptualspaces, there is little to distinguish between speakers(i.e., male speaker 1 vs. male speaker 2). Our workinghypothesis is that learners use a reflective strategy earlyin training to resolve speaker variability, and a reflexivestrategy later in training to differentiate tonal categoriesbased on feedback.

In Experiment 1, we examine the extent to whichlearners use reflective and reflexive strategies whilelearning tone categories. Our prediction based on thedual-system framework is that successful learners usea combination of reflective and reflexive strategies.Second, we predict that reflexive learning may be betterfor successful categorization of tone categories. Usingseparate perceptual spaces (male vs. female) may bea good learning strategy to reduce tone confusion, butlikely requires more WM resource than a strategy thatdoes not use separate perceptual space. In Experiment 2,we study the extent to which the two strategies (speakerseparation vs. non-separation) are mediated by WM andexecutive attention resources. Specifically, we examineWMC differences between speaker separators and non-separators. We predict that individuals with lower WMCwill be less effective at using separate perceptual spacesand therefore less effective at tone category learning.The logic here is that separating perceptual space usinga verbalizable strategy (male vs. female) is a reflectivestrategy that therefore requires WM resources. We predictthat individual differences in category learning success arerelated to differential strategy use between learners.

Experiment 1

Material and methods

In this section, we describe the stimulus space used toexamine strategy in this study. We will then describe themodel fitting approach in detail.

StimuliThese consisted of natural native exemplars of the fourMandarin tones, tone 1 (T1), tone 2 (T2), tone 3 (T3),and tone 4 (T4). Monosyllabic Mandarin Chinese words

Page 8: Bilingualism:Languageand Cognition€¦ · Additionalservicesfor Bilingualism:Languageand Cognition: Emailalerts: Clickhere Subscriptions: Clickhere Commercialreprints: Clickhere

A dual-system model of speech category learning 715

(bu, di, lu, ma, and mi) that are minimally contrastedby the four tone categories were used in the experiment.Since these syllables exist in the American Englishinventory, this avoids the need to learn phonetic structuresas well as the tone distinction (Alexander, Wong &Bradlow, 2005). By using different segments and multiplespeakers, our aim is to expose learners to the variabilityinherent in natural language. Each of these syllables wasproduced in citation form with the four Mandarin tones.The speakers in Experiment 1 were two male and twofemale native speakers of Mandarin Chinese originallyfrom Beijing. Two of these speakers (one male and onefemale) were used in Experiment 2. The stimuli wereRMS amplitude and duration normalized (70 dB, 0.4 s)using the software Praat. Duration and amplitude envelopemay be useful cues for the disambiguation of lexicaltones. However, behavioral studies (Howie, 1976) andmultidimensional scaling (MDS) analyses have shownthat dimensions related to pitch (especially pitch heightand pitch direction) are used primarily to distinguish tonecategories (Francis et al., 2008). In fact, phonetically,Mandarin tones 1–4 are described using these twodimensions as “high-level”, “low-rising”, “low-dipping”,and “high-falling” respectively. Five native speakers ofMandarin were asked to identify the tone categories(they were given four choices) and rate their quality andnaturalness. High identification (> 95%) was achievedacross all five native speakers. Speakers rated these stimulias highly natural.

Model fitting approachWe fit each model at the individual participant levelbecause of problems with interpreting fits to aggregatedata (Ashby, Maddox & Lee, 1994; Estes, 1956; Maddox,1999). We fit each model to a block of 80 trials, assuminga fixed decision strategy on each trial within the block. Wefit three classes of models with the multiple instantiationspossible within a class.! A computational model of the reflexive procedural

learning system. We used the STRIATAL PATTERN

CLASSIFIER (SPC) (Maddox, Molis & Diehl, 2002),a model whose processing is consistent with whatis known about the neurobiology of the procedural-based category learning system thought to underlieinformation-integration classification performance(Ashby, Alfonso-Reese, Turken & Waldron, 1998;Ashby & Ennis, 2006; Maddox et al., 2002; Nomuraet al., 2007; Seger & Cincotta, 2005).! Reflective RB and instantiate hypothesis-testingstrategies such as the application of uni-dimensionalor conjunctive rules. These are verbalizablestrategies.! A random responder model that assumes that theparticipant guesses on each trial.

The model parameters were estimated using maximumlikelihood procedures (Ashby, 1992; Wickens, 1982)and the models compared using Akaike weights(Wagenmakers & Farrell, 2004), as described in detail inthe Results section. The details of each model are outlinedin the next section.

Striatal Pattern ClassifierThe SPC assumes that stimuli are represented perceptuallyin higher level auditory areas, such as the superiortemporal gyrus. Because of the massive many-to-one(approximately 10,000-to-1) convergence of afferentsfrom the primary and secondary auditory cortices to thestriatum (Ashby & Ennis, 2006; Wilson, 1995), a low-resolution map of perceptual space is represented amongthe STRIATAL UNITS. Within the auditory domain it is wellknown that there are direct projections from secondaryauditory areas such as the superior temporal gyrus andsupratemporal plane to the caudate (Arnauld, Jeantet,Arsaut & Desmotes-Mainard, 1996; Hikosaka, Sakamoto& Usui, 1989; Yeterian & Pandya, 1998). During learningthe striatal units become associated with one of thecategory labels, so that after learning is complete, acategory response label is associated with each of anumber of different regions of perceptual space. In effect,the striatum learns to associate a response with clumps ofcells in the auditory cortex1. The SPC assumes that thereis one striatal “unit” in the pitch height-pitch directionspace for each category, yielding a total of four striatalunits. Because the location of one of the units can befixed, and since a uniform expansion or contraction of thespace will not affect the location of the resulting responseregion partitions, the SPC contains six free parameters:five that determine the location of the units, and onethat represents the noise associated with the placementof the striatal units. Figure 4a displays a scatterplot ofthe responses and response regions for the four tonecategories in Figure 3a generated from a version of theSPC. It is worth mentioning that versions of the SPC havealready been applied in the auditory domain. Specifically,Maddox, Ing and Lauritzen (2006) applied the model todata from an artificial auditory category learning task, andMaddox, Molis and Diehl (2002) to data from an auditoryvowel categorization task.

1 It is important to be clear that the SPC is a computational modelinspired by what is known about the neurobiology of the striatum.The striatal “units” are therefore hypothetical and could be interpretedin terms of other computational models (e.g., as “prototypes” ina multiple prototype model such as SUSTAIN: Love, Medin &Gureckis, 2004). In addition, we do not model learning in the SPC,in the sense that we do not update association weights between unitsand category labels. Learning models have been proposed (Ashby &Maddox, 2011), but are not used here, because of their complexity.

Page 9: Bilingualism:Languageand Cognition€¦ · Additionalservicesfor Bilingualism:Languageand Cognition: Emailalerts: Clickhere Subscriptions: Clickhere Commercialreprints: Clickhere

716 W. Todd Maddox and Bharath Chandrasekaran

Figure 4. (Colour online) Scatterplot of the responses along with the decision boundaries that separate response regionsfrom versions of the (a) striatal pattern classifier, (b) conjunctive rule-based, (c) uni-dimensional_height, and (d)uni-dimensional_direction models as applied to the stimuli from Figure 3a.

Conjunctive rule-based modelA conjunctive RB model that assumes that the participantsets two criteria along the pitch height dimension and athird along the pitch direction dimension was also appliedto the data. The model assumes that the two criteria alongthe pitch height dimension are used to separate the stimuliinto low, medium or high. Low pitch height items areclassified as tone category 3 (T3) and high pitch heightitems as tone category 1 (T1). If an item is classifiedas medium pitch height, the pitch direction dimension isexamined. The single criterion along the pitch directiondimension is used to separate the stimuli into low and highpitch direction. Stimuli that have medium pitch height andlow pitch direction (negative slope) are classified as tonecategory 4 (T4) and medium pitch height items of highpitch direction as tone category 2 (T2). Figure 4b displaysa scatterplot of the responses and response regions for thefour tone categories in Figure 3a generated from a versionof the Conjunctive model. This model contains four freeparameters: three criteria and one noise parameter.

Uni-dimensional rule-based modelsA uni-dimensional_height RB model that assumes thatthe participant sets three criteria along the pitch heightdimension was also applied to the data. This modelassumes that the three criteria along the pitch heightdimension are used to separate the stimuli into low,medium-low, medium-high or high, each of these beingassociated with one of the tone categories. Notice that thismodel completely ignores the pitch direction. Although,given four category labels, 24 versions of the model arepossible, some are highly unrealistic (e.g., a model thatassumes that tone category 1 (T1) was the lowest in pitchheight). We examined the eight most reasonable variantsof the model.

A uni-dimensional_direction RB model that assumesthat the participant sets three criteria along the pitchdirection dimension was also applied to the data. Thismodel assumes that the three criteria along the pitchdirection dimension are used to separate the stimuliinto low, medium-low, medium-high or high, each of

Page 10: Bilingualism:Languageand Cognition€¦ · Additionalservicesfor Bilingualism:Languageand Cognition: Emailalerts: Clickhere Subscriptions: Clickhere Commercialreprints: Clickhere

A dual-system model of speech category learning 717

these being associated with one of the tone categories.Notice that this model completely ignores the pitchheight dimension. Although, given four category labels,24 versions of the model are possible, many are highlyunrealistic. We examined the two most reasonable variantsof the model. Figure 4c displays a scatterplot of theresponses and response regions for the four tone categoriesin Figure 3a generated from a version of the uni-dimensional_height model, and Figure 4d displays thecorresponding scatterplot generated from a version of theuni-dimensional_direction model. The uni-dimensionalmodels each contain four free parameters: three criteriaand one noise parameter.

Random Responder ModelThis model assumes a fixed probability of respondingtone 1, tone 2, tone 3, and tone 4 but allows for responsebiases. The model has three free parameters to reflect thepredicted probability of responding “1,” “2,” or “3”, theprobability of responding “4” being equal to one minusthe sum of the other three.

Modeling speaker separationAn a priori prediction was that learners use a verbalizablestrategy (speaker sex) to separate male and femaleperceptual spaces. We predicted that using separateperceptual spaces reduces overlap between categories andincreases successful learning of the tone categories. Themodel procedure assumes that each model applied to ablock of 80 trials using the 80 stimuli. Figure 3a shows amodeling procedure that assumes no speaker separation(a non-separation model).

To model the presence of speaker separation (aseparation model), we assume that the participantconverted the 80 stimuli perceptual space in Figure 3ainto two separate perceptual spaces, one that characterizesthe 40 stimuli spoken by male speakers and one thatcharacterizes the 40 stimuli spoken by female speakers.A scatterplot of the stimuli associated with the male andfemale sub-perceptual spaces are displayed in Figures 3band 3c.

We fit each of the models outlined above(SPC, conjunctive, uni-dimensional_height, uni-dimensional_direction, and random responder) separatelyto the 40 trials with a female speaker and the 40 trialswith the male speaker and estimated separate parametersfor each of the relevant perceptual spaces (male orfemale). For example, the conjunctive model requiredfour parameters when no speaker separation was assumed,but eight when speaker separation was assumed.

Participants

Twenty-four participants aged 18–35 were recruitedfrom the undergraduate population at the University of

Texas at Austin; they were paid $10 per hour for theirparticipation. The participants were monolingual andraised in monolingual English households, as reportedin detailed background questionnaires. Participants withsignificant exposure to another language before the age of12 were not included. Participants reported no historyof neurological, visual, or hearing deficits. A detailedlanguage history questionnaire was used to ensure thatparticipants had no previous exposure to a tone language.All participants provided informed consent and weredebriefed following the conclusion of the experiment, inaccordance with UT IRB requirements. Participants (n =2) with poorer performance in the final block than the firstwere excluded, leaving data from 22 participants for thestatistical analyses. Average accuracy in the final blockfor the excluded participants was 29%, which is close tochance-level performance (25%).

Tone category training procedure

In each trial, participants were presented with a singleexample from one of four Mandarin tone categories (T1,T2, T3, or T4) and instructed to place it into one of fourcategories. The participants were told that high accuracylevels are possible. They were given feedback on each trialand exposed to multiple speakers throughout the trainingprogram. They listened to 80 stimuli per block (four tonecategories x five syllables x four speakers). Within eachblock, the speakers were randomized. Each participantcompleted six 80-trial blocks of training. Participantsresponded by pressing one of four buttons on a responsebox labeled “1,” “2,” “3,” or “4”. Corrective feedbackwas displayed for one second on the screen immediatelyfollowing the button press and consisted of the word“Correct” or “Error”, followed by the correct label forthe tone. For example, on a correct tone 2 (T2) trial thefeedback display was as follows: “Correct, that was a 2”.On an incorrect response trial, if tone 3 was the correctresponse, the feedback display was: “Error, that was a 3”.A one-second inter-trial interval followed the feedback.

Results: Overall accuracy

Figure 5a displays the average accuracy along withstandard error bars. We also include average accuracyfor participants best fit by a separation or non-separationmodel (discussed below). We conducted a two-wayANOVA (block x tone category) and found significantlearning across blocks (F (5, 95) = 23.98, p <0.001,partial η2 = .56), with average accuracy increasing from42% in block 1 (b1) to 71% in block 6 (b6). We also founda main effect of tone category (F (3, 57) = 4.51, p = 0.007,partial η2 = .19). Average accuracy was greatest for T3(69%); pairwise comparison showed significantly betterlearning for T3 stimuli than for T2 (p = 0.004) or T4

Page 11: Bilingualism:Languageand Cognition€¦ · Additionalservicesfor Bilingualism:Languageand Cognition: Emailalerts: Clickhere Subscriptions: Clickhere Commercialreprints: Clickhere

718 W. Todd Maddox and Bharath Chandrasekaran

0

0.2

0.4

0.6

0.8

1

1 2 3 4 5 6

Pro

port

ion

Cor

rect

Block

OverallNon-SeparatorsSeparators

A

0

0.2

0.4

0.6

0.8

1

1 2 3 4 5

Pro

port

ion

Cor

rect

Block

Overall

Non-Separators

Separators

B

Figure 5. Overall proportion correct for final blocksseparators versus non-separators in Experiment 1 (A) andExperiment 2 (B).

(p = 0.007). No other pairwise comparison wassignificant. The interaction between tone category andblock was not significant (F (15, 285) = 1.11, p = 0.35,partial η2 = .05), suggesting similar learning patternsacross the four tones. A similar pattern held for tonesproduced by female and male speakers: average accuracyincreased from 44% in block 1 to 72% in block 6 fortones spoken by female speakers and increased from 41%in block 1 to 69% in block 6 for tones spoken by malespeakers.

Results: Modeling

Model fitting and model comparisonAs outlined above, each model was fit to the data fromeach participant on a block-by-block basis. The modelswere fit to the Mandarin tone category learning data fromeach trial by maximizing negative log-likelihood. We usedAkaike weights to compare the relative fit of each model(Akaike, 1974; Wagenmakers & Farrell, 2004). Akaikeweights are derived from Akaike’s Information Criterion(AIC), which is used to compare models with differentnumbers of free parameters (AIC penalizes models withmore free parameters). For each model i, AIC is defined

as:

AICi = −2logLi + 2Vi (1)

where Li is the maximum likelihood for model i and Vi isthe number of free parameters in the model. Smaller AICvalues indicate a better fit to the data. We first computedAIC values for each model and for each participant’s datain each block. Akaike weights were then calculated toobtain a continuous measure of degree-of-fit. A differencescore is computed by subtracting the AIC of the best fittingmodel for each data set from the AIC of each model forthe same data set:

"i (AIC) = AICi − minAIC (2)

From the differences in AIC we then computed the relativelikelihood L of each model i with the transform:

L (Mi | data) ∝ exp!−1

2"i(AIC)

"(3)

Finally, the relative model likelihoods were normalized bydividing the likelihood for each model by the sum of thelikelihoods for all models. This yields Akaike weights:

wi (AIC) =exp

#− 1

2 "i(AIC)$

exp#− 1

2 "k(AIC)$ (4)

These weights can be interpreted as the probabilitythat the model is the best model given the data set andthe set of candidate models (Wagenmakers & Farrell,2004). Akaike weights range from 0 to 1.0, an Akaikeweight of 0 implying that a given model is the best modelwith probability 0, and an Akaike weight of 1 implyingthat a given model is the best model with probability1.0. Equivocal evidence in support of a given model isassociated with an Akaike weight of 1/n where n denotesthe number of models being compared (e.g., with twomodels, an Akaike weight of 0.5 implies equivocal supportfor the given model).

Best fitting model vs. random responder modelWe began by comparing the Akaike weights from the bestfitting uni-dimensional, conjunctive or SPC model thatassumed non-separation or separation with the best fittingrandom responder model. This comparison allowed usto determine whether the best fitting model is capturingnoise or meaningful strategic responding. The results wereclear. The resulting Akaike weights were .964, .990, .946,.956, .991, and .991 in blocks 1–6, respectively. In eachcase these values were significantly above 0.5 based on aone-sample t-test (all p’s < .0001), which indicates thatthe best fitting models are effectively fitting the data.

Best fitting non-separation model vs. best fittingseparation modelNext we compared the Akaike weights from the best fittingseparation model against the best fitting non-separation

Page 12: Bilingualism:Languageand Cognition€¦ · Additionalservicesfor Bilingualism:Languageand Cognition: Emailalerts: Clickhere Subscriptions: Clickhere Commercialreprints: Clickhere

A dual-system model of speech category learning 719

0

0.2

0.4

0.6

0.8

1

1 2 3 4 5 6

Prop

ortio

n by

Eac

h M

odel

Typ

e

Block

Non-Separator

Separator

A

0

0.2

0.4

0.6

0.8

1

1 2 3 4 5

Prop

ortio

n by

Eac

h M

odel

Typ

e

Block

Non-Separator

Separator

B

Figure 6. Proportion of participants whose data was best fitby a non-separation or separation model as a function ofblock in Experiment 1 (A) and Experiment 2 (B).

model. This comparison allows us to determine whetherthe best fitting model is truly capturing additional strategicresponding or just more noise. Again, the results wereclear. When a separation model provided the best accountof the data, the Akaike weights ranged from .880 to .982and in every block were significantly above 0.5 based on aone-sample t-test (all p’s < .001). When a non-separationmodel provided the best account of the data, the Akaikeweights ranged from .893 to .982 and in every block weresignificantly above 0.5 based on a one-sample t-test (all p’s< .001). These findings suggest that the best fitting model(separation or non-separation) is capturing meaningfulstrategic variance in the data, not just random noise.

Distribution of best fitting non-separation andseparation modelsBecause it was hypothesized that speaker separationimproves performance, we predicted that as theparticipants gained experience they would increasinglyfit one of the separation models rather than one of thenon-separation models. Figure 6 displays the proportionof participants whose data was best fit by a separation ornon-separation model in each block. As a formal test of

our hypothesis, we compared the number of separators andnon-separators across the first and final blocks. A χ2 testsuggested that the number of separators increased whilethe number of non-separators decreased from the first tothe final block of trials (χ2 (1, N = 22) = 5.94, p < .005).

Separation strategy distribution for final blockseparators and final block non-separatorsAnother way to examine changes in the use of separationstrategies across blocks is to compare the number ofblocks of trials best fit by the separation model forparticipants whose final block of trials is best fit by aseparation (hereafter referred to as final block separators)or non-separation (hereafter referred to as final blocknon-separators) model. We hypothesized that participantswhose data is best fit by a separation model in the finalblock of trials will also be more likely to use separationstrategies earlier in learning. The results supported ourprediction (see Figure 7a), with significantly more blocksof trials best fit by a separation model for final blockseparators (4.9 blocks) than for final block non-separators(1.0 blocks) (F(1, 20) = 15.449, p < .001, partial η2 =.436).

We also examined the first block of trials forwhich a separation model provided the best fit to thedata for final block separators versus final block non-separators. We hypothesized that final block separatorswould begin to speaker-normalize sooner. The resultssupported our prediction (Figure 7b), with final blockseparators speaker-normalizing earlier (1.65 blocks) thannon-separators (4.50 blocks) (F(1, 20) = 7.20, p < .005,partial η2 = .265).

Learning curves for final block separators and finalblock non-separatorsFigure 5a displays the learning curves for final blockseparators and final block non-separators. A 2-modelstrategy x 6-block mixed ANOVA was conducted on thesedata. We observed a main effect of model strategy (F(1,20) = 6.70, p < .05, partial η2 = .251), with performancesignificantly better for separators (.65) than for non-separators (.27). We also observed a main effect of block(F(5, 100) = 3.83, p < .01, partial η2 = .161), suggestingsignificant learning. Finally, we observed a significantmodel strategy by block interaction (F(5, 100) = 2.59, p< .05, partial η2 = .151). The interaction is characterizedby significant learning in the separator group (F(5, 95) =28.77, p < .001, partial η2 = .602), and non-significantlearning in the non-separators (F(5, 5) < 1.0).

Reflective and reflexive strategies and accuracy ratesfor final block separators. Here we examined performancefor the reflective and reflexive strategies used by finalblock separators. Of the 20 final block separators, tenwere best fit by the SPC, two by the conjunctive RBmodel, and eight by the uni-dimensional_height model.

Page 13: Bilingualism:Languageand Cognition€¦ · Additionalservicesfor Bilingualism:Languageand Cognition: Emailalerts: Clickhere Subscriptions: Clickhere Commercialreprints: Clickhere

720 W. Todd Maddox and Bharath Chandrasekaran

0

1

2

3

4

5

6

Tota

l Num

ber o

f Sep

arat

ion

Bloc

ks

Non-Separator

Separator

0

1

2

3

4

5

6

7

Firs

t Sep

arat

ion

Bloc

k

Non-Separator

Separator

0

1

2

3

4

Tota

l Num

ber o

f Sep

arat

ion

Bloc

ks

Non-Separator

SeparatorC

0

1

2

3

4

Firs

t Sep

arat

ion

Bloc

k

Non-Separator

SeparatorD

A B

Figure 7. A. Average number of blocks best fit by a separation model for final block separators and final blocknon-separators in Experiment 1. B. Average block first best fit by a separation model for final block separators and final blocknon-separators in Experiment 1. C. Average number of blocks best fit by a separation model for final block separators andfinal block non-separators in Experiment 2. D. Average block first best fit by a separation model for final block separators andfinal block non-separators in Experiment 2.

Because the optimal strategy requires application of areflexive strategy, we predicted that reflexive participantswould outperform reflective participants. To test this,we compared overall accuracy across the ten reflexiveseparators and the ten reflective separators. The effect ofstrategy was significant (F(1, 18) = 9.44, p < .01, partialη2 = .344), with participants using a reflexive strategy(.762) outperforming those using a reflective strategy(.532).

Discussion

Experiment 1 examined Mandarin tone category learningin native English speakers. A series of computationalmodels derived from a dual-system model of visualcategory learning were applied. These models capturetwo aspects of learning that are hypothesized as criticalto a complete understanding of Mandarin tone categorylearning. First, the models capture the distinction betweenreflective category learning strategies that are availableto conscious awareness and require WM and executiveattention, and reflexive category learning strategies that

are not available to conscious awareness and operatewithout relying on WM or executive attention (Ashbyet al., 1998; Ashby & Maddox, 2011). This distinctionis modeled by placing constraints on the nature of theparticipant’s decision process (see Figure 4). Second, themodels capture speaker-dependent strategies. The lackof speaker dependency is modeled by assuming thatthe participant generates categorization responses froma perceptual space that makes no distinction betweenspeakers (see Figure 3a). Speaker separation, on the otherhand, is modeled by assuming that the participant firstdetermines whether the speaker is male or female and thengenerates categorization responses from the relevant maleor female pitch height-pitch direction perceptual space(see Figures 3b and 3c).

Several results emerged from Experiment 1.Behaviorally, significant learning occurred within onesession of training. Importantly, learning across blockswas similar for all four tonal categories and did notdiffer between male and female speakers. We found asignificant main effect of tone, driven by the fact that T3was easier to learn than T2 or T4. This is an interesting

Page 14: Bilingualism:Languageand Cognition€¦ · Additionalservicesfor Bilingualism:Languageand Cognition: Emailalerts: Clickhere Subscriptions: Clickhere Commercialreprints: Clickhere

A dual-system model of speech category learning 721

finding because T3 is by far the most distinct category fornative English speakers. T2 and T4 can be mapped on toexisting intonational categories (rising and falling pitchare important cues in intonational processing, indicating,for example, the difference between a question and astatement). This finding is consistent with predictionsmade by the Speech Learning Model (SLM) (Flege, 1995,1999). Specifically, SLM predicts that a native categorythat is similar to a non-native category, but not identical toit, may interfere with processing and learning of the non-native category. By this account, T2 and T4 categoriessuffer interference from existing intonational categories;in contrast, T3 is learned more easily because there is nointerference from existing category representation.

The basic computational modeling approach receivedstrong validation from the data. First, Akaike weights,which can be interpreted as the probability that a particularmodel is the best model given the data set and the setof candidate models, were very large for the best fittingmodel. When the best fitting model was compared with arandom responder model, all Akaike weights were largerthan .946. In addition, when the best fitting model assumedspeaker separation, the Akaike weights for this modelcompared with the best fitting non-separation model wereall larger than .880. When the best fitting model assumedno speaker separation, the Akaike weights for this modelcompared with the best fitting separation model were alllarger than .893. Second, we found that the number ofparticipants whose data was best fit by a separation modelincreased across blocks, suggesting that the prevalence ofspeaker separation increased with experience. Third, wefound that participants whose final block of data was bestfit by a separation model were more likely to use a speaker-dependent strategy in other blocks and showed speakerseparation earlier in learning than participants whose finalblock of data was best fit by a non-separation model.Finally, amongst final block separators, we found thatthose who used a reflexive strategy were more accuratethan those who used a reflective strategy. Taken together,these results provide strong validation of our modelingapproach and demonstrate its usefulness.

Experiment 2

In Experiment 2, we conducted a large-scale replicationof Experiment 1 and extended the study by collectingmeasures of WM capacity. First, we hypothesized thatindividuals using separate perceptual spaces (separators)would be more effective at learning tone categories.Second, we predicted that those learners would showhigher WMC than those who do not use separateperceptual spaces (non-separators). This prediction is verysimilar to that proposed by Tagarelli and colleagues,who showed that WMC correlated with performanceon an explicit, but not an implicit, artificial language

task (Tagarelli, Borges-Mota & Rebuschat, 2011), aswell as earlier work by Reber and colleagues, whoshowed that aptitude measures influence explicit, butnot implicit, learning processes (Reber, Walkenfeld &Hernstadt, 1991). Similarly, a recent neuroimaging studyshowed that in an artificial grammar learning task,individual differences in WM predicted the performanceof participants who learned the task explicitly, but notthose who learned implicitly (Yang & Li, 2012).

Participants

Ninety-eight monolingual participants aged 18 to 35were recruited from the undergraduate population atthe University of Texas; they were paid $10 per hourfor their participation. No participant reported priorexposure to a tone language, or any history of neurologicalor hearing deficits. All participants provided informedconsent and were debriefed following the conclusion ofthe experiment, in accordance with UT IRB requirements.Participants (n = 16) with poorer performance in the finalblock than the first were excluded, leaving data from 82participants for the statistical analyses. Average accuracyin the final block for the excluded participants was 30%,which is close to chance (25%).

Stimuli for tone category training

The stimuli were identical to those in Experiment 1,with the exception that only one male and one femalenative speaker of Mandarin Chinese were included. Thisresulted in a shorter training experiment that allowed aWM measure to be included.

Procedure

The procedure was identical to that from Experiment 1except that participants listened to 40 stimuli per block(four tone categories x five syllables x two speakers)across five blocks of training. In addition, followingcompletion of the tone category learning task, eachparticipant completed the immediate recall portion ofthe logical memory test in the Wechsler Memory Scale,third edition (WMS-III) (Wechsler, 1997). In this test,two stories were read at conversational rate. One storywas read only once, the other twice. Participants weretold to listen to the stories and repeat as much as theycould recall immediately after each story was read, for atotal of three story recall opportunities. The total numberof key phrases or words correctly recalled across storiesserved as their raw score for the task.

Page 15: Bilingualism:Languageand Cognition€¦ · Additionalservicesfor Bilingualism:Languageand Cognition: Emailalerts: Clickhere Subscriptions: Clickhere Commercialreprints: Clickhere

722 W. Todd Maddox and Bharath Chandrasekaran

Results: Overall accuracy

Figure 5b displays the average accuracy along withstandard error bars. We also include average accuracyfor participants best fit by a separation or non-separationmodel (discussed below). Participants showed significantlearning across blocks (F(4, 324) = 133.59, p < .0001,partial η2 = .623), with accuracy increasing from 40% inblock 1 to 74% in block 5. A similar pattern held for tonesspoken by female and male speakers: average accuracyincreased from 40% in block 1 to 74% in block 5 fortones spoken by female speakers and increased from 40%in block 1 to 73% in block 5 for tones spoken by malespeakers2.

Results: Modeling

The Experiment 1 model fitting and model comparisonapproach was used.

Best fitting model vs. random responder modelFirst we compared the Akaike weights from the bestfitting model and compared that with the fit of therandom responder model to determine whether the bestfitting model was capturing noise or meaningful strategicresponding. The resulting average Akaike weights were.944, .975, .990, .994, and .993 in blocks 1 to 5,respectively. In every case these values were significantlyabove 0.5 based on a one-sample t-test (all p’s < .0001).

Best fitting non-separation model vs. best fittingseparation modelNext we compared the Akaike weights from the best fittingseparation model against the best fitting non-separationmodel. When a separation model provided the bestaccount of the data, the Akaike weights ranged from .908to .944 and in every block were significantly above 0.5 (allp’s < .001). When a non-separation model provided thebest account of the data, the Akaike weights ranged from.775 to .853 and in every block were significantly above0.5 based on a one-sample t-test (all p’s < .01). Thesefindings suggest that the models are capturing meaningfulstrategic variance in the data.

2 It is worth noting that performance was very similar across the twoexperiments, despite the fact that four speakers were included (with80 stimuli per training block) in Experiment 1 and only two (with40 stimuli per training block) in Experiment 2. In Experiment 1, twomale and two female speakers were included (see Figure 3) whereasin Experiment 2, one male and one female speaker were included.Importantly, the perceptual features (pitch height and pitch direction)were very similar across the two male and across the two femalespeakers. This likely explains why performance was similar in thetwo studies.

Distribution of best fitting non-separation andseparation modelHere we test the hypothesis that speaker separation leadsto improved performance and thus that the number ofparticipants best fit by one of the separation modelswill increase with experience. Figure 6b displays theproportion of participants whose data was best fit bya separation or non-separation model in each block. Insupport of our hypothesis, a χ2 test suggested that thenumber of separators increased while the number of non-separators decreased from the first to the final block oftrials (χ2(1, N = 82) = 22.61, p < .00001).

Separation strategy distribution for final blockseparators and final block non-separatorsHere we examined changes in the use of separationstrategies across blocks by comparing the number ofblocks of trials best fit by the separation model forfinal block separators and final block non-separators. Aspredicted (see Figure 7c), final block separators (3.84blocks) used separation strategies in significantly moreblocks than final block non-separators (1.58 blocks) (F(1,80) = 56.90, p < .001, partial η2 = .416).

We also examined the first block of trials for which aseparation model provided the best fit to the data for finalblock separators and non-separators. As predicted (seeFigure 7d), final block separators began using separateperceptual spaces earlier (1.95 blocks) than final blocknon-separators (2.68 blocks) (F(1, 80) = 5.09, p < .05,partial η2 = .060).

Learning curves for final block separators and finalblock non-separatorsFigure 5b shows the learning curves for final blockseparators and final block non-separators. A 2-modelstrategy x 5-block mixed ANOVA was conducted onthese data. We observed a main effect of model strategy(F(1, 80) = 19.34, p < .001, partial η2 = .195), withperformance significantly better for separators (.67) thanfor non-separators (.44). We observed a main effectof block (F(4, 320) = 78.28, p < .001, partial η2 =.495), suggesting significant learning. We also observeda significant model strategy by block interaction (F(4,320) = 4.33, p < .005, partial η2 = .051). Post hocanalyses suggested that performance of the separatorswas superior to that of the non-separators in everyblock (all p’s < .05). Both groups showed significantlearning, but a comparison of performance in block 1with that in block 5 suggested that separators showedgreater learning (an increase of .37) than non-separators(an increase of .22) (F(1, 80) = 8.64, p < .005, partialη2 = .097).

Page 16: Bilingualism:Languageand Cognition€¦ · Additionalservicesfor Bilingualism:Languageand Cognition: Emailalerts: Clickhere Subscriptions: Clickhere Commercialreprints: Clickhere

A dual-system model of speech category learning 723

Working memory capacity and speaker separationAs outlined above, we expected that individuals who usedspeaker separation strategies would be more likely to havehigh WMC. As a test of this hypothesis we comparedthe WMS-III scores for final block separators and finalblock non-separators. As predicted, final block separatorsremembered significantly more story items (mean =43.22; standard error = 1.21) than non-separators(mean = 37.95; standard error = 1.79), as measured bythe WMS-III (F(1, 79) = 4.80, p < .05, partial η2 = .057).

Discussion

In Experiment 1, we examined the extent to whichreflective and reflexive strategies were used in tonecategory learning across different blocks. We showedthat learners who used a reflexive strategy at theend of training were more accurate at categorizationthan those who used reflective strategies. Furthermore,our data demonstrated that a majority of learners usespeaker-dependent strategies to deal with the multi-speaker variability in the training paradigm. The extentto which this strategy is beneficial in category learningsuccess, and whether it is indeed dependent on WM (andtherefore, reflective) was not addressed in Experiment 1.In Experiment 2 we used a larger sample, and examinedWM ability to address both these issues. Our resultsshowed that participants who used a speaker separationstrategy showed superior learning to those who didnot. Moreover, participants who did not use a speakerseparation strategy showed lower WMC than participantswho did. Experiment 2 thus shows that reflective strategyuse is an important determiner of success in the categorylearning task.

General discussion

In this study we have provided a computational account ofstrategies that adults use while learning non-native speechsound categories using a dual-system framework. Weused two experiments to study the learning of Mandarintone categories by native English speakers with no priorexperience of tone languages. Our results demonstratesignificant individual differences in the computationalstrategies participants use during category learning.Importantly, these differences in strategies have a directbearing on category learning success.

We hypothesized that tone category learning isreflexive-optimal, but that learners may use reflective,speaker-dependent strategies early in training. Consistentwith these predictions, we demonstrated in Experiment1 that learners who used reflexive strategies by the endof training were more accurate in category learning thanthose who used reflective learning strategies. We showedthat successful learners initially separate perceptual

spaces based on a reflective strategy (“male/female”)to perform category-related computations. Successfullearners then shift to a reflexive computational strategy(SPC) by the end of the training session. According tothe dual-system model, WMC is a critical componentof the reflective learning system. Reflective learning isresource-intensive, requiring WM and executive attentionto develop, test, and update verbal rules. We predictedthat WMC may therefore be an important determinerof individual differences in category learning success.Indeed, in Experiment 2 we demonstrated that individualswho use the verbalizable speaker-dependent learningstrategy also show enhanced learning and have higherWMC, as measured by standardized tests of WM.

The extent to which L2 learning in adulthood isexplicit versus implicit has been a significant topic ofinquiry in the SLA literature (DeKeyser, 2008). Muchof this work has focused on grammar, morphology, andvocabulary learning. In contrast, much less is known aboutthe role of explicit/implicit processes during phoneticlearning. Speech sounds are typically difficult to verbalizeand involve many redundant dimensions. These featuresargue for a dominance of the procedure-based reflexivelearning system. Indeed, our results show that participantswho use reflexive strategies at the end of training learnmore accurately than those who use reflective strategies.However, some reflective strategies are advantageous incertain situations. For example, in multi-speaker trainingparadigms (Lively et al., 1994) speaker variability hasbeen shown to be important for generalizing categories tonovel speakers. Indeed, for speech learning to be effectivein the real world, categorization performance must transferto novel speakers. However, a previous tone categorylearning study showed that not all learners benefit fromhigh speaker variability (Perrachione et al., 2011), butthat some may find high variability training too resource-intensive, and may therefore be less successful. Whysome learners have difficulty in such a high variabilityenvironment is unclear. Our results show large individualdifferences in computational strategies that learners use todeal with speaker variability. Tone categories are cued byspeaker-dependent and speaker-independent dimensions.We predicted that an initial reflective strategy is to createspeaker-dependent perceptual spaces. Our computationalmodeling results show that reduced effectiveness inusing this strategy results in substantially lower categorylearning success. While a majority of participants useseparate perceptual spaces for male and female speakers,some operate with a single perceptual space containingboth male and female speakers. Across both experiments,our results demonstrate that learning is enhanced whenparticipants use separate perceptual spaces. However,this is a reflective strategy, and therefore requires WMresources, so individuals with low WMC may havedifficulty maintaining separate perceptual spaces for

Page 17: Bilingualism:Languageand Cognition€¦ · Additionalservicesfor Bilingualism:Languageand Cognition: Emailalerts: Clickhere Subscriptions: Clickhere Commercialreprints: Clickhere

724 W. Todd Maddox and Bharath Chandrasekaran

different speakers. Individuals with high WMC, on theother hand, have an advantage in learning, a findingthat addresses a source of individual differences incategory learning success. Our results suggest a trade-offin category learning. Multiple-speaker training benefitsreal-world generalization, but requires WM resources,and thus may lead to individual differences in learningsuccess.

Previous studies have focused on perceptualdifferences as a source of individual variability inspeech learning success. For example, neuroanatomicaland neurophysiological studies show that pre-trainingdifferences in auditory regions are a significantpredictor of individual differences in successful speechlearning (Golestani, Molko, Dehaene, LeBihan & Pallier,2007; Wong, Chandrasekaran, Garibaldi & Wong,2011; Wong, Perrachione & Parrish, 2007). Individualdifferences in phonetic acquisition under natural learningsituations were found to relate to neural processing ofspeech sounds, as evidenced by preattentive change-detection electrophysiological responses, but not basicpsychoacoustic differences between learners (Diaz,Baus, Escera, Costa & Sebastian-Galles, 2008). It isinteresting that functional connectivity between frontaland parietal brain regions at rest also relates to individualdifferences in speech learning success (Ventura-Campos,Sanjuán, González, Palomar-García, Rodríguez-Pujadas,Sebastián-Gallés, Deco & Ávila, 2013). Our results showthat individual differences in computational strategies mayfurther contribute to variability in speech learning success.

Previous studies have viewed speech learning fromthe perspective of unsupervised learning (McClellandet al., 2002; Vallabha & McClelland, 2007) or interactive(lexically-mediated), supervised learning mechanisms(McClelland, Mirman & Holt, 2006; Mirman, McClelland& Holt, 2006). The role of feedback-based multiplelearning systems, known to play an important role in visualcategory learning, is unclear. Our theoretical approachargues for two competing learning systems that operatewhen feedback is provided: a reflective learning systemthat uses feedback to develop rules, and an implicit,reflexive learning system that unconsciously associatesfeedback with stimulus and reward. Our computationalapproach demonstrates that one predictor of learningsuccess is the type of computational strategy thatparticipants employ while learning category differences.Thus, our framework views L2 speech learning as acategory learning problem (Lotto, 2000).

We are aware that our computational approach is justa start to understanding dual-learning system influenceson L2 speech learning. First, speech categories areextensively multidimensional (Diehl, Lotto & Holt, 2004),a factor that distinguishes speech learning from visualcategory learning (Lotto, 2000). The two-dimensionalperceptual space (pitch height/direction) that is derived

likely underestimates the inherent complexity in listeners’processing of linguistic tone. However, our model fitssuggest that this approach is a good way to start examiningstrategy use during speech learning. Future experimentsshould build on complexity to allow a better understandingof real-world speech processing. Second, we use Mandarintone categories to evaluate speech learning. Previousresearch suggests several similarities between segmental(vowels and consonants) and suprasegmental speechlearning. From a neural perspective, the extent to whichlexical tones behave like segmental information has beenextensively studied (Gandour & Dardarananda, 1983;Gandour, Wong, Hsieh, Weinzapfel, Van Lancker &Hutchins, 2000; Gandour, Dzemidzic, Wong, Lowe, Tong,Hsieh, Satthamnuwong & Lurito 2003; Gandour, Wong& Hutchins, 1998; Klein, Zatorre, Milner & Zhao, 2001;Xu, Gandour & Francis, 2006). For the purpose of thispaper, three main points are particularly relevant. First,native speakers use a left hemisphere-dominant networkto process linguistic tones that is indistinguishable fromthe network used to process other speech sound categories(Gandour et al., 1998; Gandour & Dardarananda, 1983;Wong et al., 2004b). Second, when non-native participantslearn lexical tone categories, there is increased activityin the left-hemisphere anterior and posterior languageregions (Wang, Sereno, Jongman & Hirsch, 2003b). Third,native listeners process Mandarin tones categorically, in amanner similar to consonants (Xi, Zhang, Shu, Zhang& Li, 2010). While this suggests a certain degree ofconsistency in the processing of linguistically “tainted”stimuli, the extent to which our dual-system approachcan be applied to segmental learning is unclear. The SPCmodel has been successfully applied to the examinationof vowel categorization (Maddox et al., 2002), suggestinga possible generalization to segmental learning. Theapplicability of our approach to learning segmentalinformation is an important direction for future research.Finally, our data suggests that successful learners useseparate perceptual spaces for male and female speakers,to deal with a high degree of variability across speakers.We argue that this strategy is an explicit process, becauseparticipants who use this strategy have higher WM abilitythan those who do not. However, many studies haveshown that normalizing for speaker differences is anautomatic process (Holt, 2006; Huang & Holt, 2012;Laing, Liu, Lotto & Holt, 2012), reliant on generalauditory processes such as neural adaptation to the long-term average spectrum of speech rather than cognitiveeffort. We reconcile these differences with the fact thatprocessing speaker information while categorizing L1speech sounds may be fundamentally different fromthe role of speaker information during the learningof L2 speech categories. Initial learning of categoriesin a high speaker variability environment may be aneffortful process requiring speaker-dependent analysis.

Page 18: Bilingualism:Languageand Cognition€¦ · Additionalservicesfor Bilingualism:Languageand Cognition: Emailalerts: Clickhere Subscriptions: Clickhere Commercialreprints: Clickhere

A dual-system model of speech category learning 725

Indeed, a number of studies demonstrate a processingcost in multi-speaker paradigms, as well as showingmore cognitive effort in mixed-speaker presentationsthan blocked-speaker presentations (Wong et al., 2004).However, with practice, and a switch to reflexive analysiswithin the perceptual space, such effortful analysis maynot be necessary. Although the current experiments werenot designed to distinguish between various models ofspeaker normalization, we believe that the computationalmodeling approach developed here could contribute tothis research topic.

In conclusion, our computational modeling resultsdemonstrate that learners use a variety of reflective andreflexive strategies to learn new categories. Successfullearners use a combination of both types of strategyand are likely to become more reflexive with practice.The computational modeling approach developed hereprovides a foundation for future work on L2 speechlearning and perception.

References

Akaike, H. (1974). A new look at the statistical modelidentification. IEEE Transactions on Automatic Control,19, 716–723.

Alexander, J. A., Wong, P. C. M., & Bradlow, A. R. (2005).Lexical tone perception in musicians and non-musicians.Paper presented at INTERSPEECH-2005.

Arnauld, E., Jeantet, Y., Arsaut, J., & Desmotes-Mainard, J.(1996). Involvement of the caudal striatum in auditoryprocessing: c-fos response to cortical application ofpicrotoxin and to auditory stimulation. Molecular BrainResearch, 41, 27–35.

Ashby, F. G. (1992). Multivariate probability distributions. InF. G. Ashby (ed.), Multidimensional models of perceptionand cognition, pp. 1–34. Hillsdale, NJ: Erlbaum.

Ashby, F. G., Alfonso-Reese, L. A., Turken, A. U., & Waldron,E. M. (1998). A neuropsychological theory of multiplesystems in category learning. Psychological Review, 105,442–481.

Ashby, F. G., & Ell, S. W. (2001). The neurobiology of humancategory learning. Trends in Cognitive Sciences, 5, 204–210.

Ashby, F. G., & Ennis, J. M. (2006). The role of the basalganglia in category learning. Psychology of Learning andMotivation, 46, 1–36.

Ashby, F. G., & Maddox, W. T. (2005). Human categorylearning. Annual Review of Psychology, 56, 149–178.

Ashby, F. G., & Maddox, W. T. (2011). Human category learning2.0. Annals of the New York Academy of Sciences, 1224,147–161.

Ashby, F. B., Maddix, W. T., & Lee, W. W. (1994). On the dangersof averaging across subjects when using multidimensionalscaling or the similarity-choice model. PsychologicalScience, 5, 144–151.

Ashby, F. G., & O’Brien, J. B. (2005). Category learning andmultiple memory systems. Trends in Cognitive Sciences, 9,83–89.

Ashby, F. G., & Spiering, B. J. (2004). The neurobiology ofcategory learning. Behavioral and Cognitive NeuroscienceReviews, 3, 101–113.

Ashby, F. G., Maddox, W. T., & Bohil, C. J. (2002).Observational versus feedback training in rule-basedand information-integration category learning. Memory &Cognition, 30, 666–677.

Best, C. T. (1993). Emergence of language-specific constraintsin perception of nonnative speech: A window on earlyphonological development. Developmental Neurocogni-tion: Speech and Face Processing in the First Year of Life,69, 289–304.

Best, C. T., Morrongiello, B., & Robson, R. (1981). Perceptualequivalence of acoustic cues in speech and nonspeechperception. Perception & Psychophysics, 29, 191–211.

Best, C. T., & Tyler, M. D. (2007). Nonnative andsecond-language speech perception: Commonalities andcomplementarities. In O.-S. Bohn & M. J. Munro (eds.),Language experience in second language speech learning:In honor of James Emil Flege. Amsterdam: Benjamins,13–34.

Bradlow, A. R. (2008). Training non-native language soundpatterns. In J. G. Hansen Edwards & M. L. Zampini (eds.),Phonology and second language acquisition, pp. 287–308.Amsterdam: Benjamins.

Bradlow, A. R., Akahane-Yamada, R., Pisoni, D. B., & Tohkura,Y. (1999). Training Japanese listeners to identify English/r/ and /l/: Long-term retention of learning in perceptionand production. Perception & Psychophysics, 61, 977–985.

Cheour, M., Ceponiene, R., Lehtokoski, A., Luuk, A., Allik,J., Alho, K., & Naatanen, R. (1998). Development oflanguage-specific phoneme representations in the infantbrain. Nature Neuroscience, 1, 351–353.

Cleeremans, A., & Dienes, Z. (2008). Computational models ofimplicit learning. In R. Sun (ed.), Cambridge handbookof computational psychology, pp. 396–421. Cambridge:Cambridge University Press.

DeCaro, M. S., Carlson, K. D., Thomas, R. D., & Beilock, S. L.(2009). When and how less is more: Reply to Tharp andPickering. Cognition, 111, 397–403.

DeCaro, M. S., Thomas, R. D., & Beilock, S. L. (2008).Individual differences in category learning: Sometimes lessworking memory capacity is better than more. Cognition,107, 284–294.

DeKeyser, R. (2008). Implicit and explicit learning. InC. Doughty & M. H. Long (eds.), The handbook of secondlanguage acquisition, 313–348. Oxford: Blackwell.

Diaz, B., Baus, C., Escera, C., Costa, A., & Sebastian-Galles, N.(2008). Brain potentials to native phoneme discriminationreveal the origin of individual differences in learning thesounds of a second language. Proceedings of the NationalAcademy of Sciences of the United States of America, 105,16083–16088.

Diehl, R. L., Lotto, A. J., & Holt, L. L. (2004). Speechperception. Annual Review of Psychology, 55, 149–179.

Estes, W. K. (1956). The problem of inference from curves basedon group data. Psychological Bulletin, 53, 134–140.

Evans, J. L., Saffran, J. R., & Robe-Torres, K. (2009). Statisticallearning in children with specific language impairment.

Page 19: Bilingualism:Languageand Cognition€¦ · Additionalservicesfor Bilingualism:Languageand Cognition: Emailalerts: Clickhere Subscriptions: Clickhere Commercialreprints: Clickhere

726 W. Todd Maddox and Bharath Chandrasekaran

Journal of Speech Language and Hearing Research, 52,321–335.

Flege, J. E. (1995). Second language speech learning: Theory,findings, and problems. In W. Strange (ed.), Speechperception and linguistic experience: Issues in cross-language research, pp. 233–277. Timonium, MD: YorkPress.

Flege, J. E. (1999). Age of learning and second language speech.In D. Birdsong (ed.), Second language acquisition andthe critical period hypothesis, pp. 101–131. Mahwah, NJ:Erlbaum.

Francis, A. L., & Nusbaum, H. C. (2002). Selective attentionand the acquisition of new phonetic categories. Journalof Experimental Psychology: Human Perception andPerformance, 28, 349–366.

Francis, A. L., Ciocca, V., Ma, L., & Fenn, K. (2008). Perceptuallearning of Cantonese lexical tones by tone and non-tone language speakers. Journal of Phonetics, 36, 268–294.

Gandour, J. (1978). Perceived dimensions of thirteen tones:a multidimensional scaling investigation. Phonetica, 35,169–179.

Gandour, J. (1983). Tone perception in Far Eastern languages.Journal of Phonetics, 11, 149–175.

Gandour, J., & Dardarananda, R. (1983). Identification of tonalcontrasts in Thai aphasic patients. Brain & Language, 18,98–114.

Gandour, J., Wong, D., & Hutchins, G. (1998). Pitch processingin the human brain is influenced by language experience.Neuroreport, 9, 2115–2119.

Gandour, J., Dzemidzic, M., Wong, D., Lowe, M., Tong,Y., Hsieh, L., Satthamnuwong, N., & Lurito, J. (2003).Temporal integration of speech prosody is shaped bylanguage experience: An fMRI study. Brain and Language,84, 318–336.

Gandour, J., Wong, D., Hsieh, L., Weinzapfel, B., Van Lancker,D., & Hutchins, G. D. (2000). A crosslinguistic PET studyof tone perception. Journal of Cognitive Neuroscience, 12,207–222.

Golestani, N., Molko, N., Dehaene, S., LeBihan, D., & Pallier,C. (2007). Brain structure predicts the learning of foreignspeech sounds. Cerebral Cortex, 17, 575–582.

Goudbeek, M., Cutler, A., & Smits, R. (2008). Supervised andunsupervised learning of multidimensionally varying non-native speech categories. Speech Communication, 50, 109–125.

Hattori, K., & Iverson, P. (2009). English /r/–/l/ categoryassimilation by Japanese adults: Individual differences andthe link to identification accuracy. Journal of the AcousticalSoceity of America, 125, 469–479.

Hay, J. F., Pelucchi, B., Graf Estes, K., & Saffran, J. R. (2011).Linking sounds to meanings: Infant statistical learningin a natural language. Cognitive Psychology, 63, 93–106.

Hernandez, A. E., & Li, P. (2007). Age of acquisition: Its neuraland computational mechanisms. Psychological Bulletin,133, 638–650.

Hickok, G., & Poeppel, D. (2004). Dorsal and ventral streams:A framework for understanding aspects of the functionalanatomy of language. Cognition, 92, 67–99.

Hickok, G., & Poeppel, D. (2007). The cortical organization ofspeech processing. Nature Reviews Neuroscience, 8, 393–402.

Hikosaka, O., Sakamoto, Y., & Usui, S. (1989). Functionalproperties of monkey caudate neruons: III. Activitiesrelated to expectation of target and reward. Journal ofNeurophysiology, 61, 814–832.

Holt, L. L. (2006). The mean matters: Effects of statisticallydefined nonspeech spectral distributions on speechcategorization. The Journal of the Acoustical Society ofAmerica, 120, 2801–2817.

Holt, L. L., & Lotto, A. J. (2008). Speech perception within anauditory cognitive science framework. Current Directionsin Psychological Science, 17, 42–46.

Holt, L. L., & Lotto, A. J. (2010). Speech perception ascategorization. Attention, Perception & Psychophysics, 72,1218–1227.

Howie, J. M. (1976). Acoustical studies of Mandarin vowels andtones. Cambridge: Cambridge University Press.

Huang, J., & Holt, L. L. (2012). Listening for the norm: Adaptivecoding in speech categorization. Frontiers in Psychology,3, 10.

Hulstijn, J. H. (2005). Theoretical and empirical issues in thestudy of implicit and explicit second-language learning.Studies in Second Language Acquisition, 27, 129–140.

Hume, E., & Johnson, K. (2001). A model of the interplay ofspeech perception and phonology. In E. Hume & K. Johnson(eds.), The role of speech perception in phonology, pp. 3–26. San Diego, CA: Academic Press.

Iverson, P., Kuhl, P. K., Akahane-Yamada, R., Diesch, E.,Tohkura, Y., Kettermann, A., & Siebert, C. (2003). Aperceptual interference account of acquisition difficultiesfor non-native phonemes. Cognition, 87, B47–57.

Klein, D., Zatorre, R. J., Milner, B., & Zhao, V. (2001). Across-linguistic PET study of tone perception in MandarinChinese and English speakers. Neuroimage, 13, 646–653.

Kraljic, T., & Samuel, A. G. (2005). Perceptual learning forspeech: Is there a return to normal? Cognitive Psychology,51, 141–178.

Kraljic, T., & Samuel, A. G. (2006). Generalization in perceptuallearning for speech. Psychonomic Bulletin and Review, 13,262–268.

Kraljic, T., & Samuel, A. G. (2007). Perceptual adjustments tomultiple speakers. Journal of Memory and Language, 56,1–15.

Krashen, S. (1982). Principles and practice in second languageacquisition. Oxford: Pergamon.

Laing, E. J., Liu, R., Lotto, A. J., & Holt, L. L. (2012). Tunedwith a tune: Talker normalization via general auditoryprocesses. Frontiers in psychology, 3, 203.

Lim, S. J., & Holt, L. L. (2011). Learning foreign sounds inan alien world: Videogame training improves non-nativespeech categorization. Cognitive Science, 35, 1390–1405.

Lively, S. E., Logan, J. S., & Pisoni, D. B. (1993). TrainingJapanese listeners to identify English /r/ and /l/. II: The roleof phonetic environment and talker variability in learningnew perceptual categories. Journal of the AcousticalSociety of America, 94, 1242–1255.

Lively, S. E., Pisoni, D. B., Yamada, R. A., Tohkura, Y., &Yamada, T. (1994). Training Japanese listeners to identify

Page 20: Bilingualism:Languageand Cognition€¦ · Additionalservicesfor Bilingualism:Languageand Cognition: Emailalerts: Clickhere Subscriptions: Clickhere Commercialreprints: Clickhere

A dual-system model of speech category learning 727

English /r/ and /l/. III: Long-term retention of new phoneticcategories. Journal of the Acoustical Society of America,96, 2076–2087.

Lotto, A. J. (2000). Language acquisition as complex categoryformation. Phonetica, 57, 189–196.

Maddox, W. T. (1999). On the dangers of averaging acrossobservers when comparing decision bound models andgeneralized context models of categorization. Perception& Psychophysics, 61, 354–375.

Maddox, W. T., & Ashby, F. G. (2004). Dissociating explicit andprocedural-learning based systems of perceptual categorylearning. Behavioral Processes, 66, 309–332.

Maddox, W. T., Ing, A. D., & Lauritzen, J. S. (2006). Stimulusmodality interacts with category structure in perceptualcategory learning. Perception & Psychophysics, 68, 1176–1190.

Maddox, W. T., Molis, M. R., & Diehl, R. L. (2002).Generalizing a neuropsychological model of visualcategorization to auditory categorization of vowels.Perception & Psychophysics, 64, 584–597.

Maddox, W. T., Love, B. C., Glass, B. D., & Filoteo, J. V. (2008).When more is less: Feedback effects in perceptual categorylearning. Cognition, 108, 578–589.

Maddox, W. T., Filoteo, J. V., Lauritzen, J. S., Connally,E., & Hejl, K. D. (2005). Discontinuous categoriesaffect information-integration but not rule-based categorylearning. Journal of Experimental Psychology: Learning,Memory and Cognition, 31, 654–669.

McClelland, J. L., Fiez, J. A., & McCandliss, B. D. (2002).Teaching the /r/–/l/ discrimination to Japanese adults:Behavioral and neural aspects. Physiology & Behavior, 77,657–662.

McClelland, J. L., Mirman, D., & Holt, L. L. (2006). Arethere interactive processes in speech perception? Trendsin Cognitive Sciences, 10, 363–369.

McClelland, J. L., McNaughton, B. L., & O’Reilly, R. C.(1995). Why there are complementary learning systemsin the hippocampus and neocortex: Insights from thesuccesses and failures of connectionist models oflearning and memory. Psychological Review, 102,419.

Mirman, D., McClelland, J. L., & Holt, L. L. (2006). Aninteractive Hebbian account of lexically guided tuning ofspeech perception. Psychonomic Bulletin & Review, 13,958–965.

Näätänen, R., Lehtokoski, A., Lennes, M., Cheour, M.,Huotilainen, M., Iivonen, A., Vainio, M., Alku, P.,Ilmoniemi, R. J., Luuk, A., Allik, J., Sinkkonen, J., & Alho,K. (1997). Language-specific phoneme representationsrevealed by electric and magnetic brain responses. Nature,385, 432–434.

Nomura, E. M., & Reber, P. J. (2008). A review of medialtemporal lobe and caudate contributions to visual categorylearning. Neuroscience and Biobehavioral Reviews, 32,279–291.

Nomura, E. M., Maddox, W. T., Filoteo, J. V., Ing, A. D.,Gitelman, D. R., Parrish, T. B., Mesulam, M.-M., &Reber, P. J. (2007). Neural correlates of rule-based andinformation-integration visual category learning. CerebralCortex, 17, 37–43.

Paradis, M. (1985). On the representation of two languages inone brain. Language Sciences, 7, 1–39.

Paradis, M. (2004). A neurolinguistic theory of bilingualism.Amsterdam: Benjamins.

Perrachione, T. K., Lee, J., Ha, L. Y. & Wong, P. C. (2011b).Learning a novel phonological contrast depends oninteractions between individual differences and trainingparadigm design. Journal of the Acoustical Society ofAmerica, 130, 461.

Petrides, M., & Pandya, D. N. (1988). Association fiber pathwaysto the frontal cortex from the superior temporal region in therhesus monkey. Journal of Comparative Neurology, 273,52–66.

Poldrack, R. A., & Foerde, K. (2008). Category learning and thememory systems debate. Neuroscience and BiobehavioralReviews, 32, 197–205.

Poldrack, R. A., & Packard, M. G. (2003). Competition amongmultiple memory systems: Converging evidence fromanimal and human brain studies. Neuropsychologia, 41,245–251.

Poldrack, R. A., Clark, J., Pare-Blagoev, E. J., Shohamy, D.,Creso Moyano, J., Myers, C., & Gluck, M. A. (2001).Interactive memory systems in the human brain. Nature,414, 546–550.

Reber, P. J. (2013). The neural basis of implicit learningand memory: A review of neuropsychological andneuroimaging research. Neuropsychologia, 51, 2026–2042.

Reber, A. S., Walkenfeld, F. F., & Hernstadt, R. (1991). Implicitand explicit learning: Individual differences and IQ.Journal of Experimental Psychology: Learning, Memory& Cognition, 17, 888–896.

Romberg, A. R., & Saffran, J. R. (2010). Statistical learningand language acquisition. Wiley Interdisciplinary Reviews:Cognitive Science, 1, 906–914.

Saffran, J. R., Aslin, R. N., & Newport, E. L. (1996). Statisticallearning by 8-month-old infants. Science, 274, 1926–1928.

Samuel, A. G., & Kraljic, T. (2009). Perceptual learning forspeech. Attention, Perception & Psychophysics, 71, 1207–1218.

Schmidt, R. (1995). Consciousness and foreign languagelearning: A tutorial on the role of attention and awarenessin learning. In R. Schmidt (ed.), Attention and awareness inforeign language learning, 1–63. Honolulu, HI: Universityof Hawai‘i Press.

Schmidt, R. (2012). Attention, awareness, and individualdifferences in language learning. Perspectives on IndividualCharacteristics and Foreign Language Education, 6, 27.

Seger, C. A. (2008). How do the basal ganglia contributeto categorization? Their roles in generalization, responseselection, and learning via feedback. Neuroscience andBiobehavioral Reviews, 32, 265–278.

Seger, C. A. & Cincotta, C. M. (2005). The roles of thecaudate nucleus in human classification learning. Journalof Neuroscience, 25, 2941–2951.

Seger, C. A., & Miller, E. K. (2010). Category learning in thebrain. Annual Review of Neuroscience, 33, 203–219.

Tagarelli, K., Borges-Mota, M., & Rebuschat, P. (2011). Therole of working memory in implicit and explicit language

Page 21: Bilingualism:Languageand Cognition€¦ · Additionalservicesfor Bilingualism:Languageand Cognition: Emailalerts: Clickhere Subscriptions: Clickhere Commercialreprints: Clickhere

728 W. Todd Maddox and Bharath Chandrasekaran

learning. Proceedings of the 33rd Annual Conference ofthe Cognitive Science Society, pp. 2061–2066. Austin, TX:Cognitive Science Society.

Tricomi, E., Delgado, M. R., McCandliss, B. D., McClelland,J. L., & Fiez, J. A. (2006). Performance feedback drivescaudate activation in a phonological learning task. Journalof Cognitive Neuroscience, 18, 1029–1043.

Ullman, M. T. (2004). Contributions of memory circuits tolanguage: The declarative/procedural model. Cognition, 92,231–270.

Ullman, M. T. (2006). The declarative/procedural model and theshallow structure hypothesis. Applied Psycholinguistics,27, 97–105.

Vallabha, G. K., & McClelland, J. L. (2007). Success andfailure of new speech category learning in adulthood:Consequences of learned Hebbian attractors in topographicmaps. Cognitive, Affective & Behavioral Neuroscience, 7,53–73.

Ventura-Campos, N., Sanjuán, A., González, J., Palomar-García,M.-A., Rodríguez-Pujadas, A., Sebastián-Gallés, N., Deco,G., & Ávila, C. (2013). Spontaneous brain activity predictslearning ability of foreign sounds. Journal of Neuroscience,33, 9295–9305.

Wagenmakers, E. J., & Farrell, S. (2004). AIC model selectionusing Akaike weights. Psychonomic Bulletin & Review, 11,192–196.

Wang, Y., Jongman, A., & Sereno, J. A. (2003). Acoustic andperceptual evaluation of Mandarin tone productions beforeand after perceptual training. Journal of the AcousticalSociety of America, 113, 1033–1043.

Wang, Y., Sereno, J. A., Jongman, A., & Hirsch, J. (2003).fMRI evidence for cortical modification during learning ofMandarin lexical tone. Journal of Cognitive Neuroscience,15, 1019–1027.

Wechsler, D. (1997). Wechsler Adult Intelligence Scale (3rd edn).San Antonio, TX: Harcourt Brace.

Wickens, T. D. (1982). Models for behavior: Stochasticprocesses in psychology: San Francisco, CA: W. H.Freeman.

Wilson, C. J. (1995). The contribution of cortical neurons tothe firing pattern of striatal spiny neurons. In J. C. Houk,J. L. Davis & D. G. Beiser (eds.), Models of informationprocessing in the basal ganglia, pp. 29–50. Cambridge,MA: MIT Press.

Wong, F. C., Chandrasekaran, B., Garibaldi, K., & Wong, P. C.(2011). White matter anisotropy in the ventral languagepathway predicts sound-to-word learning success. Journalof Neuroscience, 31, 8780–8785.

Wong, P. C., Nusbaum, H. C., & Small, S. L. (2004a). Neuralbases of talker normalization. Journal of CognitiveNeuroscience, 16, 1173–1184.

Wong, P. C., Perrachione, T. K., & Parrish, T. B. (2007). Neuralcharacteristics of successful and less successful speech andword learning in adults. Human Brain Mapping, 28, 995–1006.

Xi, J., Zhang, L., Shu, H., Zhang, Y., & Li, P. (2010). Categoricalperception of lexical tones in Chinese revealed by mismatchnegativity. Neuroscience, 170, 223–231.

Xu, Y., Gandour, J. T., & Francis, A. L. (2006). Effects oflanguage experience and stimulus complexity on thecategorical perception of pitch direction. Journal of theAcoustical Society of America, 120, 1063–1074.

Yang, J., & Li, P. (2012). Brain networks of explicit and implicitlearning. PloS one, 7, e42993.

Yeterian, E. H., & Pandya, D. N. (1998). Corticostriatalconnections of the superior temporal region in rhesusmonkeys. Journal of Comparative Neurology, 399, 384–402.

Zhang, Y., Kuhl, P. K., Imada, T., Iverson, P., Pruitt, J., Stevens,E. B., Kawakatsu, M., Tohkura, Y., & Nemoto, I. (2009).Neural signatures of phonetic learning in adulthood: Amagnetoencephalography study. Neuroimage, 46, 226–240.