The role of associative history in models of associative ... · The role of associative history in models of associative learning: A selective review and a hybrid model M. E. Le Pelley

The role of associative history in models of

associative learning: A selective review and a

hybrid model

M. E. Le PelleyCardiff University, Cardiff, UK

Associative learning theories strive to capture the processes underlying and driving the change instrength of the associations between representations of stimuli that develop as a result ofexperience of the predictive relationships between those stimuli. Historically, formal models ofassociative learning have focused on two potential factors underlying associative change, namelyprocessing of the conditioned stimulus (in terms of changes in associability) and processing of theunconditioned stimulus (in terms of changes in error). This review constitutes an analysis of theproper role of these two factors, specifically with regard to the way in which they are influenced byassociative history (the prior training undergone by cues). A novel “hybrid” model of associativelearning is proposed and is shown to provide a more satisfactory account of the effects ofassociative history on subsequent learning than any previous single-process theory.

It has long been a goal of experimental psychologists to discover how animals and humans areable to learn about relationships between stimuli and events in the world around them. For it isthis ability to learn about predictive relationships that enables organisms to adapt and survivein a changing environment. Over a century has passed since Thorndike (1898) proposed thefirst theory of associative learning, and debate over the proper way to characterize the learningability of animals shows little sign of abating. In recent years this debate has tended to bedirected towards two outstanding issues relevant to any model of associative learning: (1) theway in which stimuli are represented by the learning system (either as sets of independentelements or as more holistic configurations); and (2) the mechanics of the learning processitself—that is to say, the factors that determine the amount of associative change (i.e.,learning) that a given cue will undergo on a learning episode. The former topic has been thesubject of several recent reviews (Pearce, 1994; Pearce & Bouton, 2001; Wagner, 2003;Wagner & Brandon, 2001). This paper, in contrast, concentrates almost exclusively on thesecond issue—that is, on the factors that influence the extent to which a representation of a

Correspondence should be addressed to Mike Le Pelley, School of Psychology, Cardiff University, Wales, CF103YG, UK. Email: [email protected]

I would like to thank John M. Pearce, I. P. L. McLaren, and S. M. Oakeshott for their helpful advice during thepreparation of this article. This work was supported by a grant from the ESRC.

! 2004 The Experimental Psychology Societyhttp://www.tandf.co.uk/journals/pp/02724995.html DOI:10.1080/02724990344000141

THE QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 2004, 57B (3), 193–243

Q232738—QJEP(B)03b14 / Jun 1, 04 (Tue)/ [51 pages – 2 Tables – 8 Figures – 6 Footnotes – 0 Appendices] .Centre single caption • cf. [no comma] • Shortcut keys • UK Spelling

given cue (or configuration of cues) will engage the learning process. More specifically, thisreview seeks to address the question of how best to characterize the effect of associative history(the prior training that a stimulus has received) on learning by examining its influence invarious representative formal models. We begin by looking at the role of associative history inmodels of learning based on processing of the unconditioned stimulus (US), before providingempirical evidence to suggest that such theories do not go far enough in their characterizationof how the previous training undergone by a conditioned stimulus (CS) affects subsequentlearning about that stimulus. Instead, it seems that the processing received by a stimulus canvary as a function of the associative history of that stimulus: This leads on to discussion of twoof the most influential CS-processing models of learning, the Mackintosh (1975) theory andPearce and Hall’s (1980) model.

The view of associative history taken by CS-processing models differs profoundly fromthat taken by US-processing models. Moreover, the approaches taken by different CS-processing models differ from one another: The Mackintosh and Pearce–Hall models, forinstance, take (in some sense) opposing views of the way in which processing of a stimuluschanges as a result of experience of predictive relationships involving that stimulus. Given theexistence of evidence supporting each of these conflicting views of the role of associativehistory, it rapidly becomes clear that none of these approaches alone is sufficient to account forthe range of effects observed empirically. As a possible solution to this problem, I present a“hybrid” model of associative learning that attempts to reconcile the various empiricaldemonstrations of the effects of associative history by borrowing from all of the previouslymentioned approaches. This model is shown to have more explanatory power than any of itspredecessors, providing a more satisfactory account of an array of experimental findings thanany existing “single process” model.

Before going any further, it would seem sensible to set out the limitations of this article. Acomplete review of the effects of previous experience of stimuli on the subsequent learningthat they undergo, and the models that have been used to describe these effects, would take farmore space than is available here. As a consequence the current analysis must be somewhatselective in its focus. For example, there will be only brief discussion of the effects ofnonreinforced preexposure to stimuli on the subsequent learning that they undergo, either interms of subsequent conditioning (which makes contact with the phenomenon of latent inhi-bition) or in terms of discrimination learning (perceptual learning). For an in-depth discus-sion of empirical studies of preexposure effects in relation to various models of associativelearning, the interested reader is referred to Hall (1991). On the theoretical side, the discussionhere is confined to acquisition-based models of associative learning, and more specificallymodels that view effects of associative history as acquisition based—that is, based on a prop-erty that changes incrementally as a result of experience with stimuli and their predictiverelationships. This precludes discussion of models of associative interference and the charac-terization of associative history effects that they offer (e.g., Bouton, 1993, 1994), or models ofresponse competition (e.g., Miller & Matzel, 1988). Despite this necessarily narrow focus, thisreview aims to introduce a number of general principles of associative history and its effects onlearning, and to relate these principles to various formal models as a first step towards under-standing how learning in the past affects learning in the future.

194 LE PELLEY

US-PROCESSING MODELS

Separable error term models

Among the earliest formal models of associative learning are those based on the standard linearoperator (Bush & Mosteller, 1951; Estes, 1950; Kendler, 1971), in which the increment inassociative strength undergone by a cue is a linearly decreasing function of the strength of thatcue. For example, Bush and Mosteller proposed that the associative change undergone by aCS, A, on a given learning episode was defined by:

"VA = #A$(% – VA) 1

where VA is the associative strength of Cue A; #A represents the associability of A and is afunction of the cue’s intensity or salience; $ is a learning-rate parameter reflecting the inten-sity of the US occurring on that trial; and % is the asymptote of conditioning supportable bythat US.1

According to this early view, then, modulation of associative change is determined solely bychanges in processing of the US. The “error term”, (% – VA) represents the extent to which theUS occurring on a given trial is predicted by Stimulus A. If the US is not well predicted by aCS then it is able to support more learning with respect to that CS than if it is already wellpredicted. The contribution of the CS to learning is fixed, taking a value determined by itsintensity. The Bush and Mosteller model thus assumes that the associative change occurringon a given trial depends only on the current associative strength of a stimulus and the asymp-tote of conditioning supportable by the US presented on that trial, not on how the currentassociative strength was reached (the “associative path”)—this is known as the assumption ofpath independence.

Moreover, Bush and Mosteller’s (1951) theory assumes that stimuli are treated whollyindependently of one another in the determination of associative change. If two cues, A and B,are presented on the same learning episode, the error term for A will be (% – VA) while that forB will be (% – VB): This model employs a separate error term for each presented stimulus. Assuch, the associative change undergone by each cue will be independent of the current associa-tive strength of the other.

This assumption of cue independence has been challenged by a number of studies demon-strating that cues presented in compound can, and will, interact and compete for associativestrength. This is most powerfully demonstrated in the phenomenon of blocking (Kamin,1969), which refers to the finding that the gain in excitatory strength of a cue, B, followingreinforcement of an AB compound is much reduced if Cue A has previously been trained asbeing a good predictor of that US. Learning does not simply progress with each cue independ-ently. Instead the two cues seem to compete for a limited amount of associative strength.

ASSOCIATIVE HISTORY 195

1In fact Bush and Mosteller’s model described the change in probability with which a conditioned response would

be emitted in the presence of a given CS, rather than the change in associative strength of the CS. As long as it isassumed that probability of conditioned response is monotonically related to associative strength, however, Bush andMosteller’s model can reasonably be rephrased as Equation 1 (this is also the formulation offered by Kendler, 1971).

The Rescorla–Wagner (1972) summed error term model

This idea of competition for associative strength is encapsulated by the model of learningproposed by Rescorla and Wagner (1972) which states that:

"VA = #A$(% – &V) 2

where &V is the summed associative strength of all currently presented cues. Hence theRescorla–Wagner model states that the error governing associative change for any cue on atrial is based on the combined associative strength of all cues present on that trial, allowing it toaccount for a number of observations of interaction between cues in the acquisition of associa-tive strength. For example, the use of this “summed error term” is essential to the model’sexplanation of blocking. According to Equation 2, when the AB compound is followed by rein-forcement, the associative change undergone by B will be determined by the discrepancybetween % and the combined associative strengths of A and B. As a result of pretraining with A(such that VA ' %), though, this discrepancy will be near zero, and hence B will gain little asso-ciative strength.

A second strength of the formulation offered by Rescorla and Wagner (1972) is its abilityto account for the phenomenon of conditioned inhibition (Hearst, 1972; Pearce, Nicholas, &Dickinson, 1982; Rescorla, 1969a, 1969b). In the first stage of the experiment by Pearce et al.(1982) rats experienced trials on which a clicker signalled electric shock in a conditionedsuppression procedure. In the second stage, trials on which the clicker was followed byshock were intermixed with trials on which a compound of the clicker and a light was notfollowed by shock. Thus the light signalled the absence of a shock that was otherwisepredicted by the presence of the clicker. Following this training, presentation of the clickersuppressed lever pressing for food, indicating that the rats had learnt the clicker ( shockassociation. The compound of the clicker and the light, on the other hand, produced consider-ably less suppression of lever pressing, indicating that the rats had learnt that the lightsignalled the absence of the shock, and that this learning opposed the excitatory learning aboutthe clicker. In support of the idea that the light had acquired inhibitory potential as a result ofthis training, Pearce et al. found that when the light was subsequently paired with shock,conditioning proceeded more slowly than for a cue that had not previously received this inhib-itory training (the retardation test for conditioned inhibition). Furthermore, they demon-strated that a light given conditioned inhibition training would also counteract theconditioned suppression produced by a CS that was trained separately—that is, it was able totransfer its inhibitory potential to a novel excitor (the summation test for conditionedinhibition).

Conditioned inhibition provides another demonstration of the interaction of cues in thelearning process. In standard inhibitory conditioning procedures, a cue, A, will only become aconditioned inhibitor as a result of nonreinforcement in the presence of an excitor, B. If cuesare treated entirely independently in the determination of associative change, there can be noway for the excitatory potential of B to influence the inhibitory learning undergone by A. TheRescorla–Wagner model, on the other hand, by specifying competition between cues for alimited resource (amount of available learning) by means of a summed error term, is able toaccount for the development of conditioned inhibition. The model assumes that % will bezero on nonreinforced AB trials, while VB > 0 as a result of separate excitatory training of B.

196 LE PELLEY

Therefore A will acquire negative associative strength on these trials. As a result, excitatoryconditioning with A will occur slowly if it is subsequently paired with the US, and thepresence of A will be able to suppress responding to any other excitatory CS—A will pass theretardation and summation tests as described above.

The Rescorla–Wagner theory shares with the earlier Bush and Mosteller model the ideathat learning is wholly governed by changes in the effectiveness of the US, with a surprisingoutcome supporting more learning than an outcome that is predicted, while the contributionof the CS to learning is fixed—as such it too is a “US–processing” model of learning. And likeBush and Mosteller (1951), Rescorla and Wagner (1972) also make the assumption of pathindependence, wherein the associative change undergone by a cue is independent of the asso-ciative history of that cue.

In fact, the Rescorla–Wagner model takes an even more extreme view of the assumptionof path independence than its forerunner: The use of a summed error term extends the ideaof path independence to compounds of cues. In other words, the associative change under-gone by the elements of a compound depends only on the current associative strength of thatcompound and the current outcome, not on the individual strengths of the elements ofthat compound, and not on how those associative strengths were reached. The validityof this assumption was tested recently by Rescorla (2000), who investigated the distributionof associative change between the elements of a compound made up of an excitor and aninhibitor occurring as a result of reinforcement of that compound. His Experiment 1aemployed a magazine approach paradigm, in which rats learnt that certain stimuli (tones,clickers, lights, etc.) predicted the delivery of food. In the first stage of this experiment, ratsexperienced trial types A+, C+, X+, BX–, DX–. Thus Cues A and C were initially trainedas equivalent excitors (i.e., they predicted the delivery of food), whereas B and D weretrained as equivalent inhibitors (i.e., they predicted the absence of the US that would other-wise have been expected on the basis of the presence of X). In Stage 2 a compound of anexcitor (A) and an inhibitor (B) was reinforced (AB+). The question of interest was whetherthis reinforcement would lead to equal-sized increments in the associative strengths of CuesA and B, or whether one cue would undergo a greater increment than the other. Rescorlaaddressed this question by looking at responding to compounds AD and BC following Stage2 training. If these compounds were compared in the absence of AB+ training, they shouldyield equal responding: Each is comprised of one excitor (A or C) and one inhibitor (B or D).If AB+ trials led to equal-sized increments in the strengths of A and B, then responding tothe AD and BC compounds would remain equal (as each starts Stage 2 at the same level andundergoes the same change). If, however, AB+ trials produced a greater associative incre-ment in the excitatory A than the inhibitory B, then the test should reveal greater respondingto AD than to BC. Conversely, a greater increment in VB than VA would be evidenced bygreater responding to BC than to AD.

The use of the summed associative strength of all presented cues in the error term of theRescorla–Wagner model means that all cues presented on a given trial must have an identicalerror term. Applying Equation 2 to Cues A and B on the AB+ trials in Stage 2 of Rescorla’s(2000) experiment gives:

"VA = #A$(% – (VA + VB)) 3

and


"VB = #B$(% – (VA + VB)) 4

It is easy to see that if the salience (#) of A and B is equal (ensured empirically by appropriatecounterbalancing of stimuli), Equations 3 and 4 will be identical. In other words, the Rescorla–Wagner model is constrained to predict that A and B will undergo identical increments inassociative strength as a result of AB+ trials, despite the fact that A (an excitor) and B (aninhibitor) begin these trials with very different associative strengths (VA > 0, VB < 0). As statedabove, the use of a summed error term means that the individual associative strengths of theelements are unimportant in the determination of associative change—all that matters is theoverall strength of the cue compound presented on each Stage 2 trial.

Contrary to this fundamental prediction of the Rescorla–Wagner model, Rescorla (2000)found significantly greater responding to BC than to AD on test, indicating that the inhibitoryB had undergone a greater increment in associative strength than the excitatory A over theAB+ trials. This finding provides strong evidence against the idea that the assumption of pathindependence applies to compounds of cues—the distribution of associative change amongthe elements of a reinforced compound is not independent of the associative status of thoseseparable elements. The results of Rescorla’s study instead fit better with the predictions ofBush and Mosteller’s (1951) earlier model, which employs separable error terms for thedifferent elements present on a given trial and hence predicts that the stimulus whose associa-tive strength is more discrepant from that supportable by the outcome of the trial will undergothe greater associative change. Excitatory training of A in the first stage will ensure that itbecomes a good predictor of the US. As such it will have a small error term when it is pairedwith the US on Stage 2 AB+ trials and hence will undergo little associative change over thesetrials—its associative strength will already be near asymptote. B, on the other hand, is trainedto predict the absence of the US during Stage 1. Hence the occurrence of the US followingpresentation of B on AB+ trials will be very surprising (i.e., B will have a large error term), andas a result B will undergo a large increment in associative strength.

A model combining separable and summed error terms

So the results of Rescorla’s (2000) experiment provide support for a model incorporating sepa-rable error terms in its mechanism governing associative change (see also Rescorla, 2001). Butwe saw earlier that treating cues independently of one another in the determination of associa-tive change leads to problems in accounting for various phenomena of cue competition such asblocking, and it was for this reason that the Rescorla–Wagner model incorporated a summederror term. It is possible to resolve this apparent conflict within a US-processing model oflearning by effectively combining the Bush and Mosteller and Rescorla–Wagner theories, toyield a model of learning in which the effectiveness of a summed error term in governing theassociative change undergone by a cue is modulated by the separable error for that cue alone. Itis relatively straightforward to formulate such a model (cf. Brandon, Vogel, & Wagner, 2003).One of the simplest approaches is outlined here.

The associative change undergone by cue A on a trial on which (% – &V ) > 0 (i.e., a trial thatwill support excitatory learning) is given by:

"VA = #A$ ) (% – &V) ) |% – VA| 5

198 LE PELLEY

and the associative change on a trial on which (% * &V) < 0 (i.e., a trial that will support inhibi-tory learning) is given by:

"VA = #A$ ) (% – &V) ) |&V + + VA| 6

where &V+ is the summed associative strength of all excitatory cues (V > 0) present.

The separable error term “modulators” in these equations ensure that the associativechange undergone by a cue is influenced by the discrepancy between its own associativestrength and the current outcome. The |% – VA| term modulating excitatory learning inEquation 5 represents the discrepancy between the magnitude of the US occurring on thistrial and the current associative strength of Cue A. The better Cue A predicts the current US,the smaller this modulator and hence the smaller the increment in associative strength under-gone by A. On inhibitory trials, inhibitory learning is driven by the presence of cues predictingthe occurrence of the US while no US actually occurs. That is, the driving force behind inhibi-tory learning is the prediction of the US made by all presented excitatory cues, &V

+. The|&V

+ + VA| term modulating inhibitory learning in Equation 6 represents the discrepancybetween this “inhibitory potential” and the current associative strength of Cue A. In thismodel inhibitory cues have a negative associative strength. As such, the better Cue A correctlypredicts the absence of the US in the presence of excitatory cues, the smaller this modulatorand hence the smaller the decrement in A’s associative strength.

These separable error term modulators ensure that for a compound cue the element that isthe poorer predictor of the outcome will undergo the greater associative change. As a resultthis model is well equipped to account for the results of Rescorla’s (2000) investigation. Thiswas confirmed by computational simulation of this study. Parameters used for this simulationwere: # = .5, $ = .3, % (US present) = .8, % (US absent) = 0. It should be noted, however, thatthe predictions made by the model with regard to this study are parameter independent.Following the parameters of Rescorla’s experiment, Stage 1 training proceeded for 100 blocks,followed by eight AB+ trials in Stage 2. The results of the simulation are shown in Figure 1(data for Cues C and D of Rescorla’s experiment are omitted from this figure for clarity). As


Figure 1. Simulation of Rescorla’s (2000) Experiment la with a US-processing model that combines separable andsummed error terms. Data for Cues C and D are omitted for clarity. Left panel: Mean associative strengths for Stage 1.Right panel: Mean associative strengths for Stage 2. During Stage 2, B undergoes a greater increment in associativestrength than does A.

expected, over the course of Stage 1 the associative strength of A increases, and that of Bbecomes increasingly negative to counter the excitation caused by the presence of X on BX–trials. And, as expected, on Stage 2 AB+ trials there is a greater increase in the associativestrength of B than in that of A, a result of the fact that |% – VB| > |% – VA| on these trials.Following AB+ training, the associative strength for compound AD (given by VA + VD) standsat .18, while that for BC is .64. This is, of course, the pattern indicated by the empirical resultsof Rescorla’s experiment.

Further support for this model integrating a summed error term with separable error termmodulators comes from the results of Rescorla’s (2000) Experiment 2. This study employedexactly the same Stage 1 training regime as that for the experiment outlined above (A+, C+,X+, BX–, DX–), but the subsequent AB compound trials were nonreinforced. A and B mightbe expected to undergo a decrement in associative strength as a result of nonreinforcement;the question of interest is which undergoes the greater decrement. On test, conditionedresponding to compound BC was greater than that to AD, indicating that the excitatory Aunderwent a greater decrement in strength than the inhibitory B. Once again this rules out theRescorla–Wagner model’s assumption that path independence applies to compounds, andagain it fits better with a model employing separable error terms. During the first stage, A isconsistently paired with the US in isolation and hence will rapidly gain excitatory strength.The acquisition of inhibitory strength by B, on the other hand, is driven by its pairings withthe excitatory X. Given that X is paired with the reinforcer on only one third of its presenta-tions, it will be much slower than A to develop excitatory strength. As a consequence, inhibi-tory learning about B (driven by the excitatory strength of X) will be relatively slow, andcertainly slower than development of excitatory strength by A (as shown in Panel A of Figure1). On AB– trials, the US is not presented, and hence % = 0. Given that VB is relatively smalland negative, and VA is larger and positive, the separable error term modulator for B on ABtrials, |0 – VB|, will be smaller than that for A, |0 – VA|, with the result that A will undergothe greater decrement in associative strength on these trials. This in turn will lead to a greaterassociative strength for BC than for AD, as indicated by the empirical data. Computationalsimulation of this experiment confirms this prediction: Using exactly the same parameters asemployed above yields an associative strength for compound AD of .05, and a strength for BCof .10.

So this combined US-processing model is able to account for the results of Rescorla’s(2000) study as a result of its use of separable error term modulators. The presence of asummed error term also allows the model to explain cue competition effects such as blockingin much the same way as does the Rescorla–Wagner model—by attaching an importance to theassociative strength of the compound in the determination of associative change. Computa-tional simulation using the parameters above confirms this. Thus 20 A+ trials, followed by 8AB+ trials intermixed with 8 CD+ trials, result in VB = .12 and VD = .32. Thus prior condi-tioning of one element of a stimulus compound blocks learning about the other element whenthat compound is reinforced, as compared to a control compound made up of novel elements.

By combining separable and summed error terms in a single algorithm, we are able toresolve the potential conflict between (1) Rescorla’s (2000) finding that the distribution ofassociative change between the elements of a reinforced compound depends on the traininghistory of the separable elements making up that compound, and (2) demonstrations of cuecompetition effects such as blocking (indicating that it is insufficient to view separable

200 LE PELLEY

elements independently in the determination of associative change). The resulting model, likeits antecedents (Bush and Mosteller’s, 1951, standard linear operator and Rescorla andWagner’s, 1972, summed error term theory), specifies associative change to be governedsolely by changes in processing of the US.

CS-PROCESSING MODELS

The Mackintosh (1975) model

There is, however, another possibility, and this is to ascribe cue competition effects to compe-tition between CSs for processing power, rather than to variations in processing of the US.The amount of processing power secured by a given CS is reflected in its associability, #. Inthe Bush and Mosteller and Rescorla–Wagner models outlined above, the associability of a cueis simply a fixed parameter depending on its intensity or salience. In a CS-processing model oflearning, on the other hand, associability is a variable, able to change as a result of experiencewith a cue and with that cue’s predictive abilities.

One of the most influential models incorporating this notion of variable associability is thetheory of selective attention proposed by Mackintosh (1975). This states that:

"VA = S#A(% – VA) 7

where S is a learning rate parameter. This equation is clearly very similar to that for the Bushand Mosteller model outlined above. The difference is that Mackintosh allows the associ-ability of a cue, #A, to change as a result of experience of a cue’s “predictiveness”, with animalsproposed to devote more processing power to stimuli that are uniquely successful in theirpredictions. Specifically, Cue A maintains a high # to the extent that it is a better predictor ofthe outcome of the current trial than are all other cues present. Conversely, # will decrease ifthe outcome is predicted by other events at least as well as by A. The extent to which theoutcome is predicted by A is represented by the absolute value of the error term (% – VA).Hence on each learning episode n, following the adjustment of weights according to Equation7, the associability for each presented cue is updated according to the following rules:

"# % %A

n n

A

n n

X

nV V> * < ** *0 1 1if | | | |

"# % %A

n n

A

n n

X

nV V> * + ** *0 1 1if | | | |8

where "# A

n is the change in associability of Cue A on trial n, %n is the magnitude of theoutcome occurring on trial n,VA

n *1 is the associative strength of Cue A on trial n – 1 (i.e., theassociative strength of A before it was updated on trial n), andVX

n *1 is the associative strength ofall stimuli other than A present on trial n before associative strengths were updated on thattrial. In other words, the change in associability is determined by how good a cue was atpredicting the outcome of the current trial at the outset of that trial, before the associativestrengths were adjusted. Mackintosh suggested that the size of the change in #A on each trialshould be proportional to the magnitude of these two inequalities, but gave no specificalgorithm for computing this change.

According to Mackintosh (1975), then, there are two factors governing the change in asso-ciative strength of a cue. First there is the error term, given by the discrepancy between % andthe individual strength of that cue and hence driven by changes in processing of the US. The


influence of this error term, as we saw with the Bush and Mosteller model, is modulated by thecurrent associative strength of the cue, but not by how that associative strength was reached—the error term is path independent. Second there is #, reflecting changes in CS processing anddetermined by the past relative predictive power of the cue. As such the influence of # isentirely determined by the cue’s associative history. Allowing associability to vary in thismanner ensures that associative change is not solely determined by a cue’s associativestrength—it is also affected by how that associative strength was achieved. The Mackintoshmodel does not make the assumption of path independence.

Furthermore, it is this # term that allows for the interaction of cues in the determination ofassociative change: Mackintosh’s (1975) model specifies competition between cues in terms ofassociability, rather than error. Consider again the blocking paradigm. Pretraining of A willestablish it as a good predictor of the outcome. The novel cue, B, presented on subsequentAB+ trials will therefore be a poorer predictor of the outcome than will A, and so its associ-ability will fall over the course of these trials (according to Equation 8), such that changes in VB

on every trial but the first (when its associability has not yet had a chance to decline) will besmaller than those in a control group that has not had pretraining with A to establish it as agood predictor of the outcome.

By specifying competition between cues in terms of associability, rather than error, Mack-intosh (1975) removed the immediate need for a summed error term in his calculation of "V

(required implicitly by any US-processing model that purports to explain cue competitioneffects). Instead the model uses a separable error term of the form proposed by Bush andMosteller (1951), and as a result it is in principle able to account for the results of Rescorla’s(2000) study of the distribution of associative change amongst the elements of a reinforcedcompound. In the first stage of this study A is consistently followed by an event of significance(reinforcement), and so is B (the absence of a predicted reinforcer). A and B are both goodpredictors of their respective “outcomes” (which for B is actually nonreinforcement), and as aresult both will maintain a high # over the trials of Stage 1. On the initial AB+ trial in Stage 2,then, both A and B will be well processed (as the calculation to update associability isconducted after each trial). However, B will have a much larger separable error term on thistrial than will A (see the discussion of Bush and Mosteller’s model earlier). Indeed, if Stage 1training is sufficient to bring A’s associative strength near to asymptote (%), its error term willbe near zero, and as a result any increase in VA on this trial will be only very slight. Of course,following the updating of weights on this first trial, the modulation of associability outlinedabove will occur. The associability of A (a good predictor of occurrence of the US) will remainhigh, while that for B (a poor predictor of the occurrence of the US) will be reduced. So subse-quent changes in VB will become increasingly smaller as #B falls. However, given that VA wasalready near asymptote at the start of Stage 2, it will undergo little further increase over Stage 2trials. In other words, the effect of Stage 1 training outweighs any influence of attentionalmodulation in Stage 2, such that the effective changes in the model are dominated by the Bushand Mosteller type error term.

The extended Mackintosh model

So the Mackintosh model is, in principle, able to account for the results of Rescorla’s (2000)experiment. In practice, however, a major obstacle remains. How can Mackintosh’s model

202 LE PELLEY

account for the development of conditioned inhibition by B in the first place? Its use of a sepa-rable error term for each cue means that there is no way for the excitatory strength of X to drivethe development of inhibitory strength by B as a result of nonreinforcement of the BXcompound. It is hard to see how any theory of associative learning could provide a satisfactoryaccount of conditioned inhibition without the use, either implicitly or explicitly, of a summederror term. Schmajuk and Moore (1985; see also Moore & Stickney, 1985) attempted tomodify the Mackintosh model to allow it to account for conditioned inhibition and variousrelated phenomena by adding a summed error term to the expression governing associativechange and by allowing for the development of “antiassociations”, representing the predictionof nonreinforcement by a CS. This idea stems from Konorski’s (1967) suggestion that inhibi-tory learning reflects development of an association between a representation of the CS and a“no-US centre” (US), and that there exists an inhibitory relationship between US and UScentres such that if both are activated simultaneously, activity in the US centre will inhibitactivity in the US centre. It is these CS–US associations that Schmajuk and Moore refer to asantiassociations. However, their formulation only allows for the development ofantiassociations on trials on which the US is not presented. As such, it is unable to account forthe well-established phenomenon of overexpectation (Khallad & Moore, 1996; Kremer, 1978;Lattal & Nakajima, 1998; Rescorla, 1970; Wagner, 1971): When two stimuli that have beenseparately paired with a US are presented in compound along with the same US, there is areduction in the associative strength of each element. This reduction occurs despite the factthat these compound trials are reinforced, and hence it is outside the scope of Schmajuk andMoore’s model. Moreover, Kremer (1978) demonstrated that a novel cue presented on thereinforced compound trials of an overexpectation study would take on inhibitory properties,again indicating that reinforced trials can lead to the development of antiassociations.Furthermore, the model is unable to account for demonstrations of inhibitory conditioningresulting from a reduction in the magnitude of reinforcement. That is, if A is paired with astrong outcome, and AB is paired with a weak outcome, B is typically seen to acquire inhibi-tory properties despite the fact that it is consistently paired with an outcome, albeit a weak one(Cotton, Goodall, & Mackintosh, 1982; Mackintosh & Cotton, 1985; Wagner, Mazur,Donegan, & Pfautz, 1980). It is not easy to modify Schmajuk and Moore’s model to explainthese findings without leading to problems with its account of other phenomena, such asblocking. As a result a new formulation is proposed here that is loosely based on Schmajuk andMoore’s model but borrows from the approach to inhibition offered by Pearce and Hall (1980)in their own model of associative learning (discussed later in this paper).

In this “extended Mackintosh model”, the effective strength of the reinforcer, R, on a trialis given by:

R = % – (&V – &V ) 9

where &V is the summed associative strength of all presented stimuli for the US representa-tion. On the compound trials of a standard inhibitory conditioning procedure, where the US isnot presented, this value will of course reduce to &V – &V .

If R is positive (i.e., the predicted intensity of reinforcement is less than the actual magni-tude of reinforcement such that this is a trial that will support excitatory learning), thestrength of the CS–US association is increased according to the equation:


"VA = #A$E ) (1 – VA +V A) ) |R| 10

where $E is a learning-rate parameter for excitatory learning.If R is negative (i.e., the predicted intensity of reinforcement is greater than the actual

magnitude of reinforcement such that this is a trial that will support inhibitory learning), thestrength of the CS–US antiassociation is increased by:

"V A = #A$I ) (1 –V A + VA) ) |R| 11

where $I, is a learning-rate parameter for inhibitory learning. In order to account for demon-strations of conditioned responding resulting from lean schedules of partial reinforcement, itmust be assumed that $E > $I such that increases in associative strength on reinforced trialsoutweigh decreases on nonreinforced trials.

The net associative strength of a cue, VNET, which determines the level of conditioned

responding to that cue, is then given by:

V V VA

NET

A A= * 12

Note that this new formulation (like that of Schmajuk & Moore, 1985) moves away from theschema offered by Mackintosh (1975) in that it employs a form of summed error term in thevalue of R: Equation 9 involves a comparison of the extent to which the US is predicted by allcurrently present cues, and the strength of the US actually presented. This summed errorterm determines the type of learning that occurs on a given trial [by dictating whether learningis excitatory (Equation 10) or inhibitory (Equation 11), and it also modulates the amount oflearning undergone (the |R| factor in Equations 10 and 11)]. In common with Mackintosh’soriginal formulation, though, this extended model also employs a separable error term in theequations governing associative change (1 – VA +V A in Equation 10 and 1 –V A + VA in Equa-tion 11). This ensures that the net associative strength of each individual element of acompound, and not just the summed strength of that compound, has an effect on the associa-tive change undergone by that element.2

All that remains is to specify the algorithm governing the change in associability of a cue ona given trial. Equation 8 sets out the general framework suggested by Mackintosh (1975), inwhich Cue A’s associability increases if it is a better predictor of the outcome of the currenttrial than are all other cues present and decreases if the outcome of the current trial is predictedby other stimuli at least as well as by A. There are of course a huge number of different algo-rithms for manipulating associability that are consistent with these principles, and there islittle direct evidence to support one approach over another. One of the simplest practicablealgorithms that remains close to the principles suggested by Mackintosh, and the oneemployed in the present formulation, is outlined below.

204 LE PELLEY

2Note that the specific separable error terms employed here require that the maximum possible value of % for any

US is 1. As such, the simulations presented with this model employ % (US present) = .8, reflecting the fact that, whilethe USs typically used in these experiments (shock, food etc.) are certainly potent, more potent USs are surely possi-ble. For the sake of consistency, this parameterization is used in all other simulations employed here where “standard”USs are employed.

After updating the associative strengths on a given trial according to Equations 10 or 11above, the associability of each cue is adjusted. When R > 0, the change in associability of CueA on trial n is defined by:

"# , % %A

n

E

n

A

n

A

n n

X

n

X

nV V V V= * ) * + * * +* * * *(| | | |)1 1 1 1 13

and when R < 0 the change in associability is given by:

"# ,A

n

I

n

A

n

A

n n

X

n

X

nR V V R V V= * ) * + * * +* * * *(|| | | || | |)1 1 1 1 14

The n – 1 suffixes are intended to indicate that the values of the associative strengths beforethey are updated on trial n are used to determine the change in associability of Cue A on trial n.VX represents the combined associative strengths of all stimuli other than A presented on trialn (andV X represents the combined inhibitory strength of all stimuli other than A). ,E and ,I

are learning-rate parameters for changes in # on excitatory and inhibitory trials respectively.Consider the A+, AB– trials of a standard conditioned inhibition contingency. #A will rise onA+ trials, where A is the best predictor of reinforcement, and will fall on AB– trials, where it isa poorer predictor of the absence of reinforcement than is B. Hence in order to allow A tomaintain a high associability across these trials, we must assume that ,E > ,I, such that theincreases in #A on A+ trials outweigh the decreases on AB– trials.

Finally, we limit # to be in the range from .05 to 1. In other words, following the calculationin Equation 13 or 14:

If #A + 1 then #A = 1If #A - .05 then #A = .05

15

A lower limit of .05, rather than zero, is used in order to prevent stimuli from ever becomingcompletely “frozen out” of the learning process.

This modified Mackintosh model behaves very much like the original with regard tocontingencies that would not allow for the development of antiassociations (e.g., simple acqui-sition, overshadowing, blocking). This was confirmed by computational simulation. Figure 2shows the results of a simulation of a blocking experiment (20 A+ trials, followed by 8 AB+trials intermixed with 8 CD+ trials) using the model outlined above. Parameters used for thissimulation, and all other simulations with this model presented in this paper, were: $E = .3, $I

= .1, ,E = .3, ,I = .1, starting value of # = .8, % (US present) = .8, % (US absent) = 0. Note,however, that within the constraint that $E > $I and ,E > ,I, the predictions made by the modelare parameter independent. Panel A of Figure 2 shows the changes in associative strength forCues A, B and C/D (these latter cues are equivalent and hence have been averaged). Note thatthis graph shows the net associative strength of these cues—that is, VNET for Cue A is given byVA –V A. As outlined above, prior conditioning of A establishes it as a better predictor of theUS than is B on compound trials, causing a rapid decline in B’s associability across Stage 2AB+ trials and hence reducing its ability to engage the learning process. This can be comparedwith Cues C and D, which are novel on Stage 2 CD+ trials. Neither cue is a poorer predictor ofthe US than the other, and so both maintain a high associability. The result is a greaterexcitatory associative strength for C/D than for B.

Crucially, this extended Mackintosh model is now able to provide a satisfactory account ofthe results of Rescorla’s (2000) study of associative change in excitors and inhibitors condi-tioned in compound, as it is equipped to explain the development of conditioned inhibition.


Figure 3 shows results of a simulation of this experiment: The data for Cues C and D ofRescorla’s experiment are omitted from this figure for clarity. During Stage 1 (left panels), X+trials result in this cue taking on net excitatory strength, and this drives the development of netinhibitory strength by Cue B. As A is the best predictor of the US on A+ trials, it maintains ahigh associability. Likewise, on X+ trials the associability of X will increase. However, on BX–and DX– trials, X is a poorer predictor of the outcome (nonreinforcement) than is B or D, andhence on these trials its associability will fall (while that for B and D will rise, such that #B and#D remain high throughout Stage 1). Given the parameterization outlined above (with ,E >,I), once X starts to become reasonably excitatory and B/D reasonably inhibitory, theincreases in #X tend to outweigh the decreases, and X comes to maintain a high associability.On the initial AB+ trials of Stage 2 (right panels), the greater separable error term for B (1 – VB

+V B in Equation 10) results in a greater increment in the net associative strength of this cuethan for A. Given that B is a poorer predictor of reinforcement than is A on AB+ trials,however, its associability undergoes a rapid decline, greatly reducing subsequent incrementsin its excitatory strength. The overall result, though, is a greater associative change for B thanfor A: Net associative strength for the AD compound is .26, while that for BC is .58.

206 LE PELLEY

Figure 2. Simulation of a blocking experiment using the extended Mackintosh model. Upper panel: Net associa-tive strength (VNET = V –V ) for Cues A, B, and C/D (Cues C and D undergo identical training and hence are equiva-lent to one another—consequently the results for these two cues have been averaged). During Stage 2, B gains lessexcitatory strength than the control cues, C/D, thus demonstrating blocking. Lower panel: Associability changes forA, B, and C/D. #B falls during Stage 2 as it is a poorer predictor of the US than is A.

This extended Mackintosh model is able to account for the results of Rescorla’s (2000)Experiment 2, in which the AB compound was nonreinforced in Stage 2, in similar fashion.On these Stage 2 AB– trials, A, fuelled by its greater separable error term, undergoesgreater associative change than does B (while #A declines as it is a poorer predictor ofnonreinforcement than is B). Simulation confirms this pattern of results—following Stage 2,compound AD yields net associative strength of .06, while BC yields .14.

Unlike Schmajuk and Moore’s (1985) model, this implementation is also able to accountfor experiments demonstrating reductions in net associative strength of cues on reinforcedtrials. For instance, it provides a good account of overexpectation. Panel A of Figure 4 showssimulation results for an overexpectation preparation, in which 100 blocks of intermixed A+and B+ trials are followed by 8 AB+ trials.3 On compound trials, the combined associativestrength of A and B is greater than that supported by the outcome of the trial (i.e., % – &V – &V )< 0), such that these trials will support inhibitory learning (development of CS–USantiassociations). Hence compound trials lead to a reduction in net associative strength of A


Figure 3. Simulation of Rescorla’s (2000) Experiment la with the extended Mackintosh model. Left panels showdata for Stage 1; right panels for Stage 2. Upper panels show net associative strength; lower panels show associability.During Stage 2, B undergoes a greater increment in net associative strength than does A.

3As a result of space constraints, the data for associability changes are not shown for any of the experiments whose

simulation results are shown in Figure 5. In an overexpectation paradigm, the associability of A and B remains highthroughout as both are good predictors of reinforcement in the first stage, and neither is a better predictor than theother in the second stage. In Kremer’s (1978) overexpectation preparation, the associability of C falls on Stage 2ABC+ trials, as it is a poorer predictor of reinforcement than A or B. In a “reduced reinforcement inhibition” design,both A and B will ultimately maintain a high associability, although #A will take longer to achieve this as it is onlyinconsistently followed by a strong reinforcer. Finally, in a superconditioning design (A+, AB–, then AB+) theassociability changes are much as for Cues A and B in Rescorla’s (2000) experiment, as shown in Figure 3.

and B, despite the fact that these trials are reinforced. Panel B shows data from a simulation ofKremer’s (1978) overexpectation experiment, in which 100 intermixed A+ and B+ trials arefollowed by 8 ABC+ trials. The negative summed error term on these latter trials causes C totake on net inhibitory properties, again despite the fact that this cue is consistently paired withthe US, in agreement with the results of Kremer’s empirical study. In addition, the model isable to account for demonstrations of inhibitory conditioning resulting from a reduction inreinforcer magnitude. Panel C shows data from a simulation of a typical study, employing 100trials on which A is paired with a strong outcome (% = .8) intermixed with 100 trials on whichAB is paired with an intermediate outcome (% = .4). B can be seen to take on net inhibitorystrength as a result.

The use of a summed error term also allows this extended Mackintosh model to accountfor the related phenomena of superconditioning and supernormal conditioning. Super-

208 LE PELLEY

Figure 4. Further simulations using the extended Mackintosh model. Panel A: Overexpectation (A+ B+ thenAB+). Panel B: Kremer’s (1978) study of overexpectation (A+ B+ then ABC–). Panel C: Conditioned inhibitionresulting from a reduction in reinforcer magnitude on compound trials (A ( strong outcome, AB ( intermediateoutcome). Panel D: Supernormal conditioning, following Pearce and Redhead (1995).

conditioning refers to the observation that the effectiveness of excitatory conditioning isenhanced when it takes place in the presence of a conditioned inhibitor. Thus greater condi-tioned responding is observed following AB+ training if B has previously been trained as aninhibitor of that US, compared to the situation in which B is neutral (Navarro, Hallam,Matzel, & Miller, 1989; Pearce & Redhead, 1995; Rescorla, 1971; Wagner, 1971; Williams &McDevitt, 2002). Supernormal conditioning refers to the observation that, in certain circum-stances, reinforcement of a CS in the presence of a conditioned inhibitor can result in that CSacquiring a greater-than-asymptotic level of associative strength (Pearce & Redhead, 1995).If A has asymptotic excitatory strength (i.e., VA

NET = %), and B has net inhibitory strength(VB

NET < 0), the overall strength of the AB compound will be less than %. Thus the inhibitorystrength of B ensures that the summed error term (R) on AB+ trials is greater than zero,allowing excitatory learning on these trials such that the associative strength of A will grow to alevel that exceeds the asymptote set by the US. Once again the model’s ability to account forsuperconditioning and supernormal conditioning was assessed by means of computationalsimulation. In the first stage, simulated subjects received trial types A+, AB–, C+, DE–. Thisshould endow B with net inhibitory strength, while leaving E neutral. Stage 2 comprised AB+and CE+ trials. The presence of the inhibitor, B, should enhance A’s associative strength onAB+ trials as compared to C, trained in the presence of a neutral cue. Moreover, reinforce-ment in the presence of B should allow A to develop an associative strength greater than %.Following the parameters of Pearce and Redhead’s study, Stage 1 training proceeded for 120blocks, while Stage 2 comprised 18 blocks. Panel D of Figure 4 shows the averaged results for 8simulated subjects (VD

NET remained at 0 throughout). Superconditioning is demonstrated inthat, following Stage 2, A has achieved a higher excitatory strength than C. Supernormalconditioning is demonstrated in that the net associative strength of A (.90) at this point isgreater than % (.8).

Finally, the combination of summed and separable error terms in the extended Mack-intosh model allows it to account for the results of a further study of the distribution ofassociative change among the elements of a compound by Rescorla (2002). In the first stageof this study, A was paired with a strong reinforcer, while AB was nonreinforced, such thatB became a conditioned inhibitor. Following this training, the AB compound was pairedwith a reinforcer of intermediate strength. Rescorla noted that this treatment caused bothA and B to undergo increases in excitatory associative strength, but that the increase for Bwas greater than that for A. Figure 5 shows simulation results for this study using theextended Mackintosh model. This simulation employed 64 blocks in each stage, as forRescorla’s empirical study; parameters were as described above, with % = .8 for the strongreinforcer and % = .4 for the intermediate reinforcer. During Stage l, A takes on excitatorystrength and B inhibitory strength. As a result, the overall associative strength of the ABcompound following Stage 1 training tends to an asymptote of 0 with increased training, asB counteracts the excitatory influence of A. Notably the associative strength of the ABcompound will rapidly fall below the asymptote of strength supported by the intermediatereinforcer (.4). Hence on Stage 2 trials the summed error term (.4 * *V VA

NET

B

NET ) ispositive, promoting excitatory learning about A and B. However, the separable error termis greater for B (the poorer predictor of reinforcement), and hence this cue undergoesgreater associative change on the initial Stage 2 trials. Given that B is a poorer predictorof reinforcement than A, its associability declines rapidly, leading to a reduction in


subsequent increments inVB

NET . Nevertheless, the impact of the early trials is such that,overall, a greater associative increment is seen for B than for A. So by combining separableand summed error terms, this model is again able to provide a better characterization of theeffects of associative history on the associative change undergone by the elements of acompound than a model employing only a summed error term (which would incorrectlypredict equal change for A and B) or only a separable error term (which could not explainthe development of conditioned inhibition in the first place).

So far, then, we have two options for reconciling the independence of the elements of acompound in the determination of associative change, as demonstrated by Rescorla (2000),with the interaction of cues demonstrated by blocking, conditioned inhibition, and relatedphenomena of cue competition. One is to combine a separable error term with a summed errorterm in a US-processing model of learning. The other is to allow for variations in theprocessing of the CS as an explanation of cue competition effects, as suggested by Mackintosh(1975). But we saw above that if the approach offered by Mackintosh is to provide a satisfac-tory account of conditioned inhibition, a summed error term must be built into the model, ifnot as explicitly as for the Rescorla–Wagner formulation. Given that the combination of asummed and separable error term (as employed by the extended Mackintosh model) is able toresolve the conflict presented by the results discussed so far, is there any reason to specify afurther parameter, #, allowing for competition between CSs for processing power?

EVIDENCE FOR ASSOCIABILITY PROCESSES

Studies of discrimination learning

One common approach to investigating the effects of previous learning on the associability ofstimuli has been to examine the extent to which the learning of one discrimination transfers to

210 LE PELLEY

Figure 5. Simulation of Rescorla’s (2002) study with the extended Mackintosh model. Upper panel shows net asso-ciative strength, lower panel shows associability. During Stage 2, both A and B undergo an increase in net associativestrength, but the change for B is greater than that for A.

a subsequent novel discrimination in which the same, or similar, stimuli remain relevant. Theproblem with this approach lies in ensuring that any positive transfer (better performance)observed cannot be accounted for solely in terms of direct transfer of learning about the stimuliinvolved in the original discrimination, rather than an increase in the associability of thosestimuli (i.e., their readiness to engage the learning process in the novel discrimination).

Among the earliest studies to address this problem successfully were those looking at therate of acquisition of reversals of previously learnt discriminations. Consider the situation inwhich rats are trained on a black–white discrimination in which black is the reinforced stim-ulus (S+), and white is nonreinforced (S–). This should result in black acquiring excitatoryassociative strength and white acquiring inhibitory strength. The discrimination is thenreversed, such that white becomes the S+ and black the S–. Any transfer of associativestrength from the original discrimination to this reversed version, it is argued, should lead tonegative transfer (poorer performance), as the stimuli predict inappropriate responses imme-diately following reversal. Reid (1953) conducted just such an experiment. Rats were trainedon a black–white discrimination until they reached a criterion of 9 correct responses in 10trials. One group was then immediately transferred to the reversed version of this contin-gency, while the other received a further 150 “overtraining” trials on the original discrimina-tion before reversal. Surprisingly, Reid found that rats overtrained on the originaldiscrimination were faster to learn its reversal than those rats trained only to criterion. Thisfacilitated learning clearly could not be a result of direct transfer of associative strength fromthe original discrimination: On this basis, extended training on the original discriminationwould be expected to result in increased negative transfer, rather than the positive transfer thatwas observed. A detailed empirical analysis of this overtraining reversal effect (ORE) byMackintosh (1969) established it as a reliable phenomenon of discrimination learning, whilstalso indicating that the circumstances under which it would be observed were rather specific.

Mackintosh (1969; see also Sutherland & Mackintosh, 1971) noted that the ORE could beaccounted for by a theory allowing for learned changes in the associability of stimuli. Duringthe first stage, black and white are established as good predictors of reinforcement andnonreinforcement, respectively, and hence their associability, or the attention paid to them(Mackintosh uses the terms “associability” and “attention” interchangeably), might beexpected to increase. Any increase in associability of these stimuli would be expected to facili-tate reversal, as the stimuli will develop associations to the new response assignments morerapidly. As long as the positive effects of associability changes outweigh the negative effects ofdirect transfer of associative strengths from the original discrimination to the reversal, netpositive transfer to the reversed discrimination will be observed. Overtraining might be onefactor influencing this balance—we need only assume that overtraining has a greater effect onstrengthening associability than it does on strengthening specific stimulus–response associa-tions. Hence the extra negative transfer produced by the slightly stronger stimulus–responseassociations in the overtrained group will be outweighed by the much higher associability ofS+ and S– in this group. Moreover, Sutherland and Mackintosh suggested that # might be“sticky”, such that high values of # can be reduced only slowly. This suggestion further aidsexplanation of the ORE. Immediately following reversal, the former S+ and S– are now poorpredictors of their respective outcomes and hence might be expected to lose associability untilthe new stimulus–response mappings have been learnt sufficiently. However, if the highervalues of # reached by the overtrained group are sticky, the reduction in associability on


reversal will be reduced for this group, thus facilitating more rapid learning of the reversal.Suret and McLaren (2003) present another model based in part on Mackintosh (1975), whichincorporates this notion of sticky alpha in order to account for their results concerningdiscrimination reversal learning following overtraining in humans.

Experiments on spatial learning can also be seen as studies of discrimination learning:Animals must learn that a certain location (defined by a particular configuration of landmarks)is rewarded, while other locations are not. And, just as for more “standard” discriminationlearning paradigms, learned associability seems to play an important role in spatial learning.Prados, Redhead, and Pearce (1999) demonstrated that rats were faster to learn the location ofa hidden platform in a Morris swimming pool when this location was defined by a configura-tion of landmarks if these landmarks had previously been consistent predictors of a differentplatform location than if they were novel. Likewise, Redhead, Prados, and Pearce (2001)demonstrated that rats were slower to learn the relationship between landmarks and platformlocation if the landmarks had previously been inconsistent predictors of platform location thanif they were novel. Taken together, these results provide strong evidence for a role of learnedpredictiveness in determining the rate of spatial learning. Moreover, this learned predictive-ness effect seems to operate in accordance with the Mackintosh (1975) model. Pretraining ofthe landmarks as consistent predictors of platform location would allow them to maintain ahigh associability, promoting learning about the relationship between the landmarks and thenew platform location on test. Pretraining of the landmarks as inconsistent predictors of plat-form location would cause their associability to decline, slowing learning about the relation-ship between these cues and the new platform location on test.

Further powerful evidence for a role of associability processes in discrimination learningcomes from studies of discriminational shifts, in which the response requirements are main-tained between training and transfer discriminations, but the stimuli are changed in such away that any influence of direct transfer of associative strengths is eliminated. Consider thecase of stimuli that can vary on two independent dimensions, say colour and shape. In theinitial, training, discrimination, the stimuli can be either blue or yellow and either a circle or atriangle. In the second, transfer, discrimination, the stimuli can be either red or green, andeither a square or a diamond. In an extradimensional (ED) shift, training and transfer discrim-inations are conducted along different dimensions. In the training discrimination the circularstimulus might serve as S+ and the triangular stimulus as S–, with the colour of the stimulusbeing irrelevant. In the transfer discrimination it is now the other dimension, colour, that isrelevant, with red as S+ and green as S–, while shape is irrelevant. In an intradimensional (ID)shift, the same dimension is relevant for both training and transfer discriminations. In thetraining discrimination the blue stimulus might serve as S+ and the yellow stimulus as S–,with the shape of the stimulus being irrelevant. The transfer discrimination is exactly the sameas that for the ED shift group (red as S+, green as S–). So for the ID shift group colour is rele-vant in both discriminations. The fact that different colours are used in training and transferdiscriminations (along with suitable counterbalancing of the stimuli serving as S+ and S–)ensures that direct transfer of associative strength cannot influence the rate of learning of thetransfer discrimination. Hence any advantage in learning the second discrimination shown bythe ID group, as compared to the ED group, can only be accounted for in terms of changes inthe associability of, or attention paid to, relevant or irrelevant stimuli. And such an advantagefor intradimensional shifts over extradimensional shifts has been observed in a large number

212 LE PELLEY

of studies, using monkeys (Roberts, Robbins, & Everitt, 1988; Shepp & Schrier, 1969), rats(Oswald, Yee, Rawlins, Bannerman, Good, & Honey, 2001; Schwartz, Schwartz, & Teas,1971), pigeons (George & Pearce, 1999; Mackintosh & Little, 1969), and humans (Whitney &White, 1993).

The ID/ED shift effect raises the question of the level at which associability applies tostimuli. In their “analyser model”, Sutherland and Mackintosh (1971) proposed that associ-ability applies to whole dimensions, such that animals may learn to attend to one dimension(e.g., colour) rather than another (e.g., shape) as a result of experience. This approach lendsitself easily to explanation of the ID/ED shift effect. During Stage 1 animals in the ID shiftgroup will learn to attend more to colour than to shape. Given that colour is also relevant in thetransfer discrimination, this learned attention will facilitate learning of the transfer discrimi-nation compared to an ED shift group for which initial training will result in shape (irrelevanton the transfer discrimination) receiving more attention than colour. We saw above that theMackintosh (1975) model, in contrast, states that associability applies to individual cues orfeatures, rather than whole dimensions. In order to account for the superiority of ID shifts(where the stimuli are changed between training and transfer discriminations) Mackintoshsuggested that associability could generalize from one stimulus to another as a function of theirsimilarity. It seems reasonable to assume that features from the same dimension will tend to bemore similar to one another than are features from different dimensions—red is more similarto yellow than it is to a triangle. As a result there will be a greater generalization of associabilityfrom the features relevant to the training discrimination to those relevant to the transferdiscrimination in the ID group (as the relevant cues from the two discriminations haverelatively high similarity) than in the ED group (as the relevant cues from training have lowersimilarity to the relevant cues on transfer).

Recent evidence from a study of spatial learning by Trobalon, Miguelez, McLaren, andMackintosh (2003) indicates that associability applies to individual stimuli rather than todimensions. In each of their experiments they trained two groups of rats in a radial arm maze.One group was trained on a visuo-tactile discrimination between two distinctive floor cover-ings. Reinforcement was always associated with a particular floor covering, while the directionthat the arm pointed to was irrelevant. The other group was trained on a spatial discrimina-tion, in which the reinforced (S+) and nonreinforced (S–) arms were defined by the directionsthey pointed to, regardless of the floor coverings. Both groups of rats were then transferred to anovel spatial discrimination, in which S+ and S– pointed in two new directions neverexperienced during training. Hence for rats originally trained on a visuo-tactile discriminationthis constitutes an ED shift, while for rats originally trained on a spatial discrimination it is anID shift.

In Experiment 1 rats in the ID group were trained to discriminate an arm pointing northfrom arms pointing east or west. These rats were faster to learn a subsequent spatial discrimi-nation between arms pointing south-east and south-west than rats from an ED group who hadreceived a visuo-tactile discrimination in the first stage. The superior performance of the IDgroup could result from the rats learning to attend to the dimension of spatial landmarksduring training (the approach taken by Sutherland & Mackintosh, 1971). Alternatively, therats may learn to attend to specific landmarks that define the positions of S+ and S– (thosedefining north, east, and west) and to ignore other cues that were irrelevant to the solution ofthe problem. These irrelevant cues would include any landmarks that do not serve to


differentiate between S+ and S– (e.g., those lying midway between the two arms, definingnorth-east and north-west). Greater attention to landmarks defining east and west might thenfacilitate subsequent discrimination of arms pointing south-east and south-west.

In order to discriminate between these alternative hypotheses, Trobalon et al. (2003)conducted a further experiment in which rats in the ID group were initially trained to discrim-inate an arm pointing south from arms pointing east or west. These rats were substantiallyslower to learn a subsequent discrimination between south-east and south-west arms thanwere ED rats initially trained on a visuo-tactile discrimination. It is hard to see how Suther-land and Mackintosh’s (1971) view of dimensional associability can account for this result.Just as in the previous experiment, initial training for the ID group should result in greaterattention paid to the spatial dimension, which should facilitate subsequent learning of thetransfer discrimination—Sutherland and Mackintosh’s view predicts that ID training shouldalways result in superior transfer performance to ED training. Mackintosh’s view of stimulus–specific associability, on the other hand, provides a ready explanation. During initial training,ID rats will learn to attend to landmarks defining south, east, and west, and to ignore thosedefining south-east, and south-west (as these landmarks do not serve to differentiate south,east, and west arms and are therefore irrelevant to this discrimination). These ignored land-marks, however, are exactly those that are relevant to solution of the transfer discriminationbetween south-east and south-west arms. Moreover the landmarks defining south, which therats will have learnt to attend to strongly in Phase 1, are common to both south-east and south-west arms in Phase 2 and so might be expected to interfere with learning of this transferdiscrimination. As a result Mackintosh (1975) is able to account for the finding that, undersome conditions, an ID shift will be learnt more slowly than an ED shift. These results implythat, in the spatial domain at least, associability applies to individual cues, rather than todimensions.

Studies of conditioning

We saw above that phenomena of discrimination learning such as the overtraining reversaleffect and intradimensional shift advantage provide strong evidence for a role of associabilityprocesses in learning—the processing received by a cue can be influenced by the associativehistory of that cue. Moreover the results support the view of associability suggested by Mack-intosh (1975), wherein good predictors of an outcome maintain high associability, while theassociability of poorer predictors declines. A number of studies of Pavlovian conditioninghave also indicated that it is insufficient to view learning as driven solely by changes inprocessing of the US—processing of CSs plays its part too.

One of the most well-known phenomena indicating a role of associability in conditioning isthat of learned irrelevance. The Rescorla–Wagner model predicts that if a CS and a US arepositively correlated, excitatory conditioning to the CS will occur; if they are negatively corre-lated, inhibitory conditioning will be observed. If the two are uncorrelated, such that the US isas likely to occur in the presence of the CS as in its absence, then the CS will end up with zeroassociative strength and hence be indistinguishable from a novel stimulus. Contrary to thisprediction, Mackintosh (1973; see also Baker & Mackintosh, 1977) demonstrated that ratsgiven uncorrelated exposure to a tone and water were slower to learn about a contingent tone–water relationship in a subsequent conditioning phase than rats given no preexposure to either

214 LE PELLEY

tone or water. Something must have been learnt during uncorrelated exposure to the CS andUS. Could it be that uncorrelated CS/US exposure leads to development of an inhibitory CS–US association, which would then slow acquisition of excitatory conditioning in Phase 2? Thisseems unlikely, as uncorrelated exposure also leads to a retardation in subsequent inhibitoryconditioning of the CS (Baker & Mackintosh, 1979; Bennett, Wills, Oakeshott, & Mackintosh,2000; but see Bonardi & Ong, 2003). It would seem that past experience of the tone’s irrele-vance with respect to the delivery of water results in the representation of the tone becomingless ready to enter into association (either excitatory or inhibitory) with the representation ofwater. This is in line with the predictions of Mackintosh’s (1975) model. During uncorrelatedexposure the tone is a poorer predictor of the delivery of water than is the training contextitself; as such its associability will fall, with the result that it will be slower to enter into subse-quent association with water than if it were novel. On this view, then, while uncorrelated expo-sure does not result in development of a CS–US association, it is effective in causing changesin the processing power devoted to that CS that will affect the subsequent development ofsuch an association.

There is, however, an alternative explanation of the learned irrelevance effect. It has longbeen known that prior exposure to a stimulus leads to a retardation in the rate of subsequentconditioning to that stimulus. This CS preexposure effect is known as latent inhibition, and ithas been reliably demonstrated in a large number of animal species (see Lubow, 1989). In addi-tion, preexposure to the US alone can also lead to retardation of subsequent conditioning(Randich & LoLordo, 1979). Could learned irrelevance be a simple consequence of the CSpreexposure and/or US preexposure effects, negating the need for explanation in terms ofassociability changes resulting from experience of a noncontingent relationship between CSand US? In their studies of learned irrelevance Mackintosh (1973) and Baker and Mackintosh(1977) also included groups exposed to either CS alone or US alone, and showed that animalsgiven uncorrelated CS/US exposure were much slower to condition than either. A number ofsubsequent studies have provided further evidence that learned irrelevance represents morethan the sum of CS and US preexposure effects, supporting the idea of a reduction in CS asso-ciability as a result of uncorrelated CS/US exposure (Baker & Mackintosh, 1979; Bennett,Maldonado, & Mackintosh, 1995; Bennett et al., 2000; Matzel, Schachtman, & Miller, 1988;but see Bonardi & Hall, 1996; Bonardi & Ong, 2003).

We saw earlier that the Rescorla–Wagner and Mackintosh models take very different viewsof the processes underlying blocking. Rescorla–Wagner explains blocking in US-processingterms—prior learning that A is a good predictor of the US reduces the surprise caused bypresentation of the US on AB trials, and so (given that it is surprise that drives learning) little islearnt about B. The Mackintosh model instead explains blocking in terms of associabilitychanges. Prior conditioning of A means that A is a better predictor of the US on AB trials thanis B. As a result the associability of B will decline, leading to a retardation in learning about B.So is there any evidence for a role of associability processes in blocking?

A number of experiments employing “unblocking” procedures indicate that the answer isyes. If A is followed by a single delivery of food (US1), and AB is then followed by two succes-sive food deliveries (US1 . . . US2), more excitatory conditioning to B is observed than for astandard blocking contingency, in which A and AB are both followed by a single US (or one inwhich A and AB are both followed by two food deliveries); (Dickinson & Mackintosh, 1979;Holland, 1984, 1988; a similar experiment employing shock instead of food USs is presented


by Dickinson, Hall, & Mackintosh, 1976). This unblocking as a result of an unexpectedupshift in food deliveries on compound trials is consistent with the view of blocking taken bythe Mackintosh model. The added US (US2) on the first compound trial is surprising and ispredicted by B as well as it is by A. Consequently, US2 will serve to maintain B’s associability,promoting its ability to enter into excitatory associations with US1.4 Unblocking by additionof a surprising reinforcer is, however, also in line with the predictions of the Rescorla–Wagnermodel. As US2 is surprising it will support additional excitatory conditioning, some of whichwill accrue to B. According to this approach, then, unblocking is a result of excitatory condi-tioning of B with respect to the surprising US2. Evidence against this latter view is providedby Holland’s (1988) study in which US2 was qualitatively different to US1 (sucrose solutionversus solid food pellets) and generated a conditioned response with a different responsetopography, such that on test it was possible to determine the nature of learning about theadded CS on compound trials. This experiment indicated that the major function of the addedUS2 was to enhance the development of associations between the added CS and the originalreinforcer, US1. This finding supports the idea that unblocking by addition reflects the opera-tion of associability processes, wherein the added US2 serves to maintain the associability ofthe added CS and hence promotes the formation of associations between that CS and US1.

Further evidence in support of a CS-processing approach to unblocking by addition isprovided by Mackintosh (1978), who demonstrated that inserting trials on which thecompound AB is followed by US1 alone before AB ( US1 . . . US2 trials resulted in very littleconditioning accruing to B on these latter trials. In other words, the inclusion of trials on whichthe AB compound was followed by the same reinforcement as was A alone attenuated theunblocking effect that would normally be expected as a result of the addition of US2 on thelatter compound trials. Quite why this should be so from the US-processing viewpoint typi-fied by the Rescorla–Wagner model is unclear. The occurrence of US2 on AB ( US1 . . .US2 trials is just as surprising following A ( US1, AB ( US1 training as it is following A (US1 trials alone, and hence it should support just as much learning about B. The CS-processing account of unblocking fares better. Over the course of AB ( US1 trials the associ-ability of B will decline (as it is a poorer predictor of US1 than is A). It will therefore begin AB( US1 . . . US2 trials with a low associability and as a result will be unable to enter into signifi-cant associations with either US1 or US2 over the course of these trials. Mackintosh andTurner (1971) obtained similar results when conditioning to B (i.e., unblocking) was achievedby increasing the strength of a single outcome on compound trials, rather than by adding asecond outcome. If weak shock is denoted ‘us’, and strong shock ‘US’, then significantconditioning to B is observed if A ( us trials are followed by AB ( US trials; if however, fourAB ( us trials are inserted before the compound trials with the stronger shock, littleresponding to B is observed. The associability of B must have declined during the course ofthese compound trials with the weaker shock, presumably as that shock was better predictedby the presence of A.

216 LE PELLEY

4In fact, unblocking is still predicted even if generalization of Stage 1 A–US1 learning means that A begins Stage 2

as a slightly better predictor of US2 than is B. While this will result in a decline in B’s associability, this decline will beslower than for a control group in which AB is followed by US1 only, this being well predicted by the presence of A.

But perhaps the most powerful evidence for a role of associability processes in blockingcomes from demonstrations of unblocking following the omission of an expected reinforcer(Dickinson et al., 1976; Dickinson & Mackintosh, 1979; Holland, 1984, 1988). In these studiestrials on which A is followed by US1 . . . US2 precede trials on which the compound AB isfollowed by US1 alone. Just as in the case of unexpected addition of US2 on compound trials,omission of the expected US2 on compound AB trials leads to unblocking (greater excitatoryconditioning to B than that observed in an A ( US1 . . . US2, AB ( US1 . . . US2 controlgroup). This is clearly incompatible with the US-processing view of learning taken by theRescorla–Wagner model: Omission of an expected outcome on compound trials shouldproduce inhibitory, not excitatory, conditioning. Instead it would seem that the surprisingomission of US2 somehow facilitates the association of B with US1. This is again predicted bythe view of associability taken by the Mackintosh model. On the initial AB trial, US2 is absentbut expected on the basis of A. B is a better predictor of the absence of this outcome than is A,and hence (according to Equations 8 or 14) its associability will be maintained at a higher levelacross compound trials than if US2 had been presented on those trials.

Demonstrations of unblocking by reinforcer omission are counterintuitive, in that mosttheories predict that a downshift in the overall magnitude of reinforcement occurring oncompound trials will endow the added CS, B, with inhibitory properties, whereas in thestudies outlined above the opposite result was seen. Even the extended Mackintosh modeldescribed earlier predicts that any positive effect of reinforcer omission on B’s associability(enhancing the formation of B–US1 associations) will be countered to some extent byformation of inhibitory B–US2 associations. Consistent with this idea, Holland (1988)demonstrated that downshifts in reinforcer number (from US1 . . . US2 to US1 alone) didindeed result in both enhancement of formation of B–US1 associations and development ofinhibitory B–US2 associations. Perhaps, then, in the studies mentioned so far it is theformer effect that dominates, leading to net excitation. This is not always the case. In thestudies described above, moving from two temporally distinct USs to one on compoundtrials resulted in net excitation. In contrast, we saw earlier that decreasing the intensity of asingle shock on compound trials established inhibition to the added cue (Cotton et al., 1982;Wagner et al., 1980). These results imply that temporal separation of the two USs is impor-tant if the added cue is to become an excitor. In line with this idea, Holland (1988) found thatdownshift procedures generated net inhibition with short US1–US2 intervals, but net exci-tation with longer US1–US2 intervals. This observation is consistent with the idea that the“indirect” effects of reinforcer omission on the added cue (enhancement of its ability toenter novel associations) and the “direct” effects (establishment of conditioned inhibition)are described by different temporal gradients. That is, the influence of reinforcer omissionon associability seems to be effective over a wider temporal window than does the influenceof omission on associative strength. These differing temporal gradients for associability andassociative strength are clearly not captured by any of the trial-based models presented here,in which temporal factors are disregarded in exchange for the simplicity of viewing eachlearning episode as a separate event. Nevertheless, these findings have important implica-tions for the characterization of associability and associative strength changes in any real-time model of learning.


The Pearce–Hall (1980) model

So far we have focused on phenomena of conditioning (learned irrelevance, unblocking byaddition or omission of reinforcers) providing evidence for a role of associability processes inthe determination of associative change, specifically phenomena in line with the approach toassociability taken by Mackintosh (1975). However, there exist other demonstrations of CS-processing modulation that present more of a challenge to this model. One such challenge isseen in the phenomenon of “Hall–Pearce negative transfer”. Hall and Pearce (1979) demon-strated that rats were slower to develop conditioned suppression to a tone paired with a strongshock if they had previously been exposed to consistent pairings of that tone with a weak shockthan if they had received no prior exposure to the tone. Once again, this finding is problematicfor theories specifying learning to be governed solely by changes in processing of the US. Ifanything, pairing of the tone with a weak shock would be expected to result in more rapidacquisition of conditioned responding as a result of pairing the same tone with a strong shock,as the tone will begin these latter trials with a certain amount of excitatory strength, thus givingit a “head-start”. Instead this effect seems to require that consistent pairings of the tone with aweak outcome result in changes in the processing of that tone, altering its ability to enter intoassociation with a stronger version of the same outcome. The view of learned associabilitytaken by the Mackintosh model seems to fare no better than the US-processing view, however.During the first phase, consistent pairings of the tone and weak shock should allow the tone tomaintain a high associability (as it is the best available predictor of this weak shock). As a result,processing of the tone on tone–strong-shock trials should be at least as strong as for the controlgroup given no preexposure to the tone (for whom the tone has not previously been establishedas a good predictor), with the attendant prediction that conditioned responding to the toneshould develop at least as rapidly, if not more so, in the experimental group.

Instead demonstration of Hall–Pearce negative transfer seems to imply a view of associ-ability processes that is diametrically opposite to that taken by Mackintosh (1975). That is, itimplies that processing of the tone is reduced as a result of consistent pairings of that tone withan outcome (weak shock). This makes contact with the phenomenon of latent inhibition,wherein repeated nonreinforced preexposure to a stimulus (establishing that stimulus as agood predictor of “no outcome”) produces a retardation in the rate of subsequent conditioningof that stimulus (the CS preexposure effect discussed earlier; see Lubow, 1989). Effects suchas Hall–Pearce negative transfer and latent inhibition led Pearce and Hall (1980) to developtheir own CS-processing model of learning. They argued that, while reliable predictors ofoutcomes should be able to control an animal’s behaviour, there is little point in allocating alarge proportion of the processing capacity for learning to events that are involved in stablerelationships. Instead, it would seem to make more sense to devote processing power to stimuliwhose predictive status is currently unclear in an attempt to learn more rapidly about the truesignificance of those stimuli. In other words, unlike the Mackintosh model, Pearce and Hallargue that stimuli that are unreliable predictors of outcomes will maintain a higher associ-ability than stimuli that are reliable predictors.

It should be noted, however, that the Pearce–Hall formulation is not the only CS-processing model that is able to account for latent inhibition. Wagner’s (1981) standard oper-ating procedures (SOP) model proposes that processing of a stimulus depends upon whetheror not that stimulus is surprising (see also McLaren, Kaye, & Mackintosh, 1989). Repeated

218 LE PELLEY

exposure to a CS in a given context will encourage the growth of associations between thecontext and that CS. Thus the context comes to predict the presence of the CS, rendering itless surprising and hence less able to engage in subsequent conditioning, yielding the latentinhibition result. The model is able to account for Hall–Pearce negative transfer in similarfashion (Swartzentruber & Bouton, 1986). A prediction of SOP’s account is that latent inhibi-tion should be context specific: If latent inhibition relies on the development of associationsbetween exposure context and CS, conducting conditioning trials in a different context shouldrestore the “surprisingness” of the CS and hence abolish the latent inhibition effect. Thisprediction, which lies beyond the scope of the Pearce–Hall model, has received empiricalsupport (e.g., Channell & Hall, 1983; Hall & Minor, 1984: Lovibond, Preston, & Mackintosh,1984; Rosas & Bouton, 1997). Wagner’s model, on the other hand, is unable to account for theeffects of predictive accuracy on associability—that is, it fails to provide a satisfactory accountof the way in which experience of relationships between stimuli and outcomes affects thesubsequent processing of those stimuli (Hall, 1991). The Pearce–Hall model is moresuccessful in this respect. Given that it is these effects of associative history on learning thatprovide the basis for the present article, in the following discussion I will focus on the approachto CS-processing offered by the Pearce–Hall model in preference to Wagner’s theory. For amore detailed discussion of the similarities and differences of the Pearce–Hall and Wagnermodels, see Hall (1991).

Specifically, the Pearce–Hall model5 states that, following the adjustment of associativestrengths on trial n (see below), the associability of each presented stimulus is updatedaccording to the equation:

# . % . # *A

n n

NET

n nV= * + **| | ( )& 1 11 16

. is a parameter (which can vary between 0 and 1) that determines the extent to which # isdetermined by the events of the immediately preceding trial. If . ' 1 then # is determinedalmost solely by the events of the immediately preceding trial, with earlier trials having littleeffect. Conversely, if . ' 0 then # is determined largely by earlier trials, with the immediatelypreceding trial having little effect. &VNET

n *1 is defined by:

& = & &V V VNET

n n n* * **1 1 1 17

whereV , as earlier, refers to the strength of a CS–no-US association (i.e., an inhibitory associ-ation or antiassociation). In other words, &VNET

n *1 represents the extent to which the US occur-ring on the current trial was predicted by all stimuli presented on that trial.


5In the original Pearce–Hall (1980) model, the associability of a stimulus was determined entirely by events occur-

ring on the immediately preceding trial. Pearce, Kaye, and Hall (1981) noted that this analysis had a number of unfor-tunate consequences and therefore amended the model such that associability was determined by the average of the #values for a number of preceding trials, thereby damping changes in associability. Specifically their inclusion of a .parameter implemented an exponentially weighted moving average across trials for #. So the current value of # isstrongly influenced by the events of the immediately preceding trial, less so by the trial before that, less still by the trialbefore that and so on. Pearce, Kaye, and Hall also proposed inclusion of a reinforcer-related learning-rate parameter,$, which could take different values depending on the nature of the reinforcer used (as employed by the extendedMackintosh model in Equations 10 and 11). It is this amended version of Pearce and Hall’s original model that ispresented here.

Now we need the equations governing the change in associative (and antiassociative)strength on trial n (note that these changes occur immediately preceding the change in associ-ability according to Equation 16 above). Just as for the extended Mackintosh model presentedearlier, the associative change undergone on a given trial depends on the nature of learning thatthat trial will support. So as before we calculate:

R V Vn n n n= * ** *% ( )& &1 1 18

If R is positive (i.e., this is a trial that will support excitatory learning), the strength of the CS–US association is increased according to the equation:

"VA

n

A

n n= ) )*$ # %E

1 19

If R is negative (i.e., this is a trial that will support inhibitory learning), the strength of theCS–US association is increased according to the equation:

"V RA

n

I A

n n= ) )*$ # 1 | | 20

where $E and $I, are learning-rate parameters for excitatory and inhibitory learning,respectively.

We can see from Equation 19 above that the Pearce–Hall model goes one step further thanMackintosh (1975) in its view of the role of associability processes in learning. For excitatorylearning at least, this model places the entire burden of modulation of associative change onprocessing of the CS—there is no error term in the calculation of associative change. That is tosay, processing of the US does not change (as a result, say, of changes in its surprisingness)over the course of learning.6

In order to see how the Pearce–Hall model can function in the absence of an error term inthe equation for associative change, it is perhaps easiest to apply it to a simple example.Consider pairing a novel Cue, A, with an outcome. On the initial conditioning trials A is apoor predictor of the US. As a result the value of |% – &VNET| in Equation 16 will be high,such that A maintains a high associability, which drives increases in VA according toEquation 19. However, as conditioning proceeds the associability of A will decline (as VA

increases, |% – &VNET| decreases), with changes in associative strength becoming increasinglysmall. Ultimately, when VA = % the associability of A will be negligible, and hence learning willeffectively come to a stop (if VA should “overshoot” and increase slightly above % then theinhibitory process described by Equation 20 will kick in to reduce it again). At this point, then,A is able to control the animal’s behaviour (in that it will give rise to strong conditionedresponding), but little processing power is devoted to learning about it. An account of Hall–Pearce negative transfer follows naturally from this idea. Consistent pairing of the tone andweak shock will establish the tone as a good predictor of this weak shock and will hence lead to adecline in its associability, such that little is learnt on subsequent pairings of the tone andstrong shock. Consistent with this idea, Hall and Pearce (1982) found that inserting two trialson which the tone was followed by no shock immediately prior to tone–strong-shock pairings

220 LE PELLEY

6The same principle could be seen as applying to inhibitory learning. However the R term in Equation 20 is a

summed error term. As such, changes in processing of the US will affect the change in strength of CS–no-USassociations.

reduced the negative transfer observed (i.e., led to more rapid conditioning on these tone–strong-shock trials). The Pearce–Hall model views these inserted trials as restoring the associ-ability of the tone by virtue of the fact that the “outcome” on these trials is surprising (no shockoccurs where a weak shock is predicted).

We saw earlier that the Mackintosh model specifies changes in a cue’s associability to begoverned by a separable error term (% – VA in Equation 8, with analogues in Equations 13 and14). Hence the associability of each element of a stimulus compound is determined by howwell that element alone predicts the current outcome. In contrast, the Pearce–Hall modelspecifies associability changes to be governed by a summed error term (|% – &VNET| in Equa-tion 16). This has important ramifications for compound conditioning, as it means that theassociability of each element of a compound is determined by how well that compound predictsthe current outcome.

For example, the use of a summed error term to determine associability allows the Pearce–Hall model to account for blocking. During the first phase, A+ trials will establish A as a goodpredictor of the outcome (VA ' %), and #A will decline correspondingly. In the second phase,the AB compound is followed by the US. The presence of A will ensure that the US is alreadywell predicted on these trials, however; the outcome following AB trials is not surprising.Therefore little processing power will be devoted to the elements of this compound—the lowvalue of |% – (VA + VB)| will cause #B to decline rapidly. As a result little will be learnt about B,compared to a control group not given pretraining with A such that the occurrence of the USon AB compound trials is more surprising. Figure 6 shows the results of a simulation of ablocking experiment with the Pearce–Hall model. The experimental design employed wasexactly the same as that modelled earlier by the extended Mackintosh model: 20 A+ trials,followed by 8 AB+ trials intermixed with 8 CD+ trials. Parameters used for this simulationwere: $E = . 1, $I = .01, . = .4, starting value of # = .8, % (US present) = .8, % (US absent) = 0,although the prediction of blocking is parameter independent. Blocking is observed in that theassociative strength attained by B following Stage 2 is lower than that for C or D.

The Pearce–Hall model is also able to account for the various manipulations describedearlier that produce an “unblocking” effect. Both addition of an unexpected reinforcer andomission of an expected reinforcer on Phase 2 compound trials mean that the outcome occur-ring on these trials is in some way surprising (i.e., |% – &VNET| will be greater than if theoutcome had remained the same in the two cases), and hence B’s associability will be main-tained as compared to a control group experiencing the same outcome in both phases.

Further evidence supporting the Pearce–Hall view of associability is provided by theobservation that, under certain conditions, learning about a stimulus is more rapid when thatstimulus is an inaccurate predictor of the events that follow it than when it is an accuratepredictor (Kaye & Pearce, 1984; Swan & Pearce, 1988; Wilson, Boumphrey, & Pearce, 1992).For example, Wilson et al. trained rats on a partially reinforced serial conditioning procedure.A light was followed by a tone; on half of the trials this tone terminated with delivery of a foodUS, and on the other half no food US was provided. This procedure resulted in only minimalconditioned responding (magazine orienting) to the light. These rats were then divided intotwo groups. For group consistent training continued as before. For group shift, the tone wasomitted on all nonreinforced trials. Hence while the relation between the light CS and the foodUS was maintained for group shift, its relation to the tone was changed from 100% to 50%.According to the Pearce–Hall model, this will result in a higher associability for the light in


group shift, where the event following the light (the tone) is unpredictable, as compared togroup consistent, where the tone consistently followed the light. In line with this idea Wilsonet al. found that, when the light was subsequently paired directly with food, rats in group shiftwere faster to develop conditioned responding to the light than those in group consistent. Thisseems to run against the principles of the Mackintosh model, which would, if anything,predict that degrading the light–tone contingency would lead to a decrement in the associ-ability of the light.

Human causal learning

Studies of animal conditioning are in some sense analogous to studies of causal learning inhumans. Both involve arranging for a contingent relationship between stimuli (CSs and USsin animal conditioning; cues and outcomes in causal learning), and the measurement oflearning about that relationship (conditioned responding or a judgement of the “causalstrength” of a cue). In support of the idea of a parallel between these two preparations, similarfactors are seen to influence both in similar ways—animal conditioning and human causal

222 LE PELLEY

Figure 6. Simulation of a blocking experiment using the Pearce–Hall model. Upper panel: Net associative strength(V –V ) for Cues A, B, and C/D. During Stage 2, B gains less excitatory strength than the control cues, C/D, thusdemonstrating blocking. Lower panel: Associability changes for A, B, and C/D. #B falls during Stage 2 as the occur-rence of the US on AB+ trials is unsurprising, being already predicted by A.

judgement show similar sensitivity to temporal contiguity and contingency, for instance.These parallels led Dickinson, Shanks, and Evenden (1984) to suggest that similar processesmight underlie animal conditioning and human causal judgement, with the attendant implica-tion that models of animal conditioning might be used to describe the acquisition of humancausal judgements (see De Houwer & Beckers, 2002, for a review of recent studies of parallelsbetween conditioning and causal learning).

Given this possibility of a common process underlying animal conditioning and humancausal learning, and the weight of evidence indicating a role for associability processes inanimal learning, we might expect human causal learning to show a similar sensitivity to asso-ciative history. In line with this idea, several recent studies have demonstrated that theprocessing of cues in a causal learning task can vary depending on the training history of thosecues.

We saw earlier that the US-processing account typified by the Rescorla–Wagner modeland the CS-processing account offered by the Mackintosh model differ in their explanation ofthe blocking effect. The former explains blocking as an effect of reduced processing of the US(the outcome is unsurprising on AB compound trials as it is already predicted by A), the latteras an effect of reduced processing of the added CS (B’s associability falls as it is a poorerpredictor of the US than is A). Kruschke and Blair (2000) investigated the source of theblocking effect in human causal learning and found evidence favouring this latter approach.They employed a multiple-outcome medical diagnosis paradigm, in which participants had todecide which of several possible diseases (outcomes) a patient was suffering on the basis of thesymptoms exhibited by that patient (cues). In the blocking contingency, A was established as agood predictor of Outcome 1 before pairings of an AB compound with Outcome 1. In thecontrol contingency, a compound CD was paired with Outcome 2, neither element havingreceived prior training. A blocking effect was observed in that, following this training, B wasperceived to be a weaker predictor of Disease 1 than was D of Disease 2. This blocking effect is,of course, predicted by either class of theory mentioned above. However, Kruschke and Blairdemonstrated that participants were slower to learn a subsequent relationship between B and anovel outcome (Disease 3) than they were to learn an identical relationship between a novel cue(E) and a novel outcome (Disease 4), indicating that the blocking treatment had led to a declinein B’s associability as compared to this novel control cue. This finding would seem to liebeyond any theory that attempts to explain the blocking effect solely in terms of changes inoutcome processing—it requires that the processing of the blocked stimulus must change as aresult of the training it undergoes.

It should be acknowledged, however, that it may be possible to explain the results ofKruschke and Blair’s (2000) experiment in terms of differential proactive interference actingon experimental (B) and novel control (E) cues. A further study providing evidence for a roleof associability processes in human causal learning by Le Pelley and McLaren (2003; see alsoLochmann & Wills, 2003) is harder to interpret in terms of proactive interference. The basicdesign of their experiment is shown in Table 1. This experiment employed an allergy predic-tion paradigm, in which participants play a food allergist looking at the causes of allergicreactions in fictitious patients. In this design, Cues A–Y represent different foods, and thenumbers 1–4 represent different types of allergic reaction that patients could suffer as a resultof eating these foods. On each trial of Stage 1, participants were told the contents of a mealeaten by “Mr. X”, and they were asked to predict the type of allergic reaction that he would


suffer as a result, given a choice of Allergy 1 or Allergy 2. In this stage Cues A and D consis-tently indicated the occurrence of Allergy 1, Cues B and C consistently indicated the occur-rence of Allergy 2, and Cues V–Y provided no basis for discrimination between the twooutcomes—they were paired with Allergies 1 and 2 an equal number of times. According tothe Mackintosh model, then, Cues A–D should maintain a high associability over Stage 1trials, as they are better predictors of the outcome on each trial than are the cues with whichthey are paired. Conversely, the associability of Cues V–Y will decrease over the course ofStage 1, as they are poorer predictors of the outcome on each trial than are the cues with whichthey are paired. In Stage 2, participants were given information regarding foods and allergiesfor a new patient, Mr. Y. On each of the Stage 2 trial types shown in Table 1, a “goodpredictor” from Stage 1 (A, B, C, or D) is paired with a “poor predictor” (V, W, X, or Y), andthis compound is paired with a novel outcome: Compounds AX and CV are paired withAllergy 3, while BY and DW are paired with Allergy 4. A subsequent causal judgement testrevealed that participants had learnt more about the relationships between the good predictors(“good” with respect to Stage 1 outcomes) and the novel outcomes than between the poorpredictors and the novel outcomes during Stage 2. That is, following Stage 2, compound AC(composed of two good predictors from Stage 1 that were paired with Allergy 3 in Stage 2) wasrated as being strongly causative of Allergy 3, and BD (good predictors paired with Allergy 4 inStage 2) was rated as being strongly causative of Allergy 4, while VX (composed of poorpredictors paired with Allergy 3 in Stage 2) and WY (poor predictors paired with Allergy 4 inStage 2) were rated as being only weak causes of their respective Stage 2 outcomes. In line withthe Mackintosh view, it would seem that the cues that were experienced as being good predic-tors during Stage 1 (A–D) commanded more processing in Stage 2 than did the cues that wereexperienced as being poor predictors during Stage 1 (V–Y).

The outcome specificity of learned associability

Kruschke and Blair (2000), Le Pelley and McLaren (2003), and Lochmann and Wills (2003)employ a common fundamental approach to investigating associability processes in humans:All train stimuli as being predictive or nonpredictive of certain outcomes and then examine

224 LE PELLEY

TABLE 1

Basic design of study by Le Pelley and

McLaren (2003)

Stage 1 Stage 2 Test

AV ( 1BV ( 2AW ( 1 AX ( 3 ACBW ( 2 BY ( 4 BDCX ( 2 CV ( 3 VXDX ( 1 DW ( 4 WYCY ( 2DY ( 1

Note: A–Y = foods; 1–4 = allergic reactions. Fillertrials omitted for clarity.

changes in the associability of those stimuli by studying how rapidly these cues will enter intoassociations with novel outcomes. The fact that these studies do show associability effectsindicates that a cue’s associability is not entirely outcome specific. That is, changes in a cue’sassociability brought about by experience of its predictive relationship with one outcome will,under some circumstances at least, affect subsequent learning about a relationship betweenthat cue and a different outcome. This concept of the outcome specificity of associabilitymakes contact with earlier studies of animal conditioning. For instance, Mackintosh (1973)found that giving rats uncorrelated exposure to a tone CS and water US led to retardation (i.e.,a learned irrelevance effect) in subsequent conditioning of a tone–water relationship, but hadno effect on conditioning of a tone–shock relationship. Likewise, uncorrelated exposure totone and shock led to a retardation in tone-shock conditioning but had no effect on learningabout a tone–water relationship. In this case, then, changes in the associability of the tone as aresult of experience of its relationship with one US were able to modulate its ability to enterinto associations with that same US, but did not affect learning about the relationship betweenthat cue and a different US. This experiment indicates that associability is an outcome-specific property. It is tempting to compare the human and animal studies in an attempt toelucidate the true specificity of associability. In Le Pelley and McLaren’s human experimentthe outcomes used in the two stages of the experiment, while qualitatively different, had manysimilarities (both were types of allergic reaction; more generally, both were aversive), whilst inthe Mackintosh study of animal conditioning, the USs used were very different (oneappetitive, the other aversive). It is possible that, while associability is not completely outcomespecific, the associability developed by a cue with respect to a particular outcome will gener-alize only to similar outcomes, perhaps only to those of the same affective class. Evidenceconsistent with this idea comes from Holland’s (1988) study of unblocking caused by thesurprising addition of a post-trial reinforcer. As mentioned earlier, the added US2 oncompound trials was qualitatively different to the original US1 (sucrose solution vs. solid foodpellets) but, in contrast to Mackintosh’s experiment, was from the same affective class (bothappetitive). Holland found that addition of this different US2 enhanced the development ofassociations between the added CS and US1 at least as well as if US1 and US2 were identical.In line with the idea that associability will “transfer” between outcomes from the same affec-tive class (as indicated by the human studies of Kruschke & Blair and Le Pelley & McLaren),experience of the predictive relationship between the added CS and appetitive US2 was able tomodulate learning about that CS with respect to appetitive US1, despite the fact that theseoutcomes were qualitatively different.

RECONCILING EFFECTS OF ASSOCIATIVE HISTORY:A HYBRID MODEL

In summary, then, there is a wealth of evidence in support of the idea that the amount ofprocessing received by a CS on a given learning episode depends to at least some extent onthe associative history of that CS. Phenomena of discrimination learning such as the over-training reversal effect and intradimensional shift advantage, and phenomena of condi-tioning such as learned irrelevance, unblocking by surprising omission of a reinforcer, andHall–Pearce negative transfer, rule out the view taken by the Rescorla–Wagner model—that


associative change on a given trial is determined solely by changes in the processing receivedby the US. Moreover, similar effects of training history are seen to affect the learning under-gone by cues in studies of human causal learning, in line with Dickinson et al.’s (1984)argument that common associative processes may underlie animal conditioning and humancausal learning.

The question now becomes one of how best to interpret this mass of evidence supportinga role of associability processes in learning. The problem faced in attempting to construct a“unified” theory of associability is that the experiments outlined above often conflict in theview of associability that they support. For example, the overtraining reversal effect,intradimensional shift advantage, and learned irrelevance all support an approach whereingood predictors of an outcome maintain a high associability, while the associability of poorerpredictors falls; this is the approach developed in the Mackintosh model. The results of LePelley and McLaren’s (2003) study of human learning also provide unique support to thisview. However, Hall–Pearce negative transfer and the fact that, under some circumstances,learning is seen to proceed faster with stimuli that are inaccurate rather than accuratepredictors of the events that follow them (Kaye & Pearce, 1984; Swan & Pearce, 1988;Wilson et al., 1992) support an opposing view, wherein poor predictors maintain a higherassociability than good predictors; this is the approach developed in the Pearce–Hall model.The Pearce–Hall model also provides a more satisfactory account of latent inhibition (thedetrimental effect of nonreinforced preexposure to a stimulus on the rate of subsequentconditioning of that stimulus) than does the Mackintosh model (see Hall, 1991, for anexhaustive discussion of Pearce–Hall and Mackintosh models in relation to latent inhibitionand other phenomena of exposure learning). In addition, there exist other bipartisanphenomena that are in line with both views of associability processes—for example, both canaccount for unblocking by addition or omission of post-trial events, although they do so inquite different ways.

Given this conflict in the empirical data, with reliable evidence supporting two opposingviews of associability, it seems unlikely that either the Mackintosh or the Pearce–Hall modelalone will be able to provide a full account of the way in which the processing afforded to a CSchanges as a result of experience. Maybe a more fruitful approach would be to combine theideas encapsulated in those two models in an attempt to capture the strengths of each.

Perhaps the simplest way to reconcile the two theories of associability mentioned here is tosee them as describing two different properties of a cue, rather than being rival descriptions ofthe same property. The Mackintosh model can be viewed as measuring the weight that shouldbe afforded to a particular stimulus in the learning process as compared to other potentialstimuli. In some sense, then, the Mackintosh model describes an “attentional associability”,determining which stimuli should have access to the learning process and which should not.The Pearce–Hall model, on the other hand, can be seen as measuring the rate with which eachstimulus will be learnt about on the basis of the exposure history of that stimulus, regardless ofits attentional weight. As such, the Pearce–Hall formulation indexes a property that might becalled “salience associability”. So we can see the Mackintosh alpha as allowing an animal topick out which stimuli it should learn about and the Pearce–Hall alpha as determining howmuch should be learnt about those stimuli. Rather than being at loggerheads, on the contrary itseems that these theories could be made to work in concert and, in doing so, provide a moresatisfactory account of the effects of associative history on the associative change undergone by

226 LE PELLEY

a given stimulus on a given trial. Below I present one way in which this combination could beachieved, but clearly many others are possible.

Given the proposed difference in the quantities described by the Mackintosh and Pearce–Hall models, it makes little sense to label them both with the same symbol. Hence in thefollowing discussion I refer to the “attentional associability” of the Mackintosh model as #,and the “salience associability” of the Pearce–Hall model as /. The simplest way to incorpo-rate these two properties into a model of associative learning is simply to insert them as multi-plicative factors in the equation for associative change, yielding a “hybrid” model ofassociability. The idea of a hybrid model of associability has previously been suggested byPearce, George, and Redhead (1998) and Rodriguez, Lombas, and Alonso (2002). The modeloffered in the current paper represents an advanced implementation of this earlier suggestion.The approach taken here is to add the Pearce–Hall / into the extended Mackintosh modeldescribed earlier—the opposite approach (adding the Mackintosh # into the Pearce–Hallmodel) would presumably also be possible. Insertion of a multiplicative / factor into Equa-tions 10 and 11 yields the following equations.

If R is positive (i.e., this is a trial that will support excitatory learning), the strength of theCS–US association is increased according to the equation

"VA = #A/A$E ) (1 – VA +V A) )|R| 21

If R is negative (i.e., this is a trial that will support inhibitory learning), the strength of the CS–US antiassociation is increased by:

"V A = #A/A$I ) (1 –V A + VA) )|R| 22

where $E and $I are learning-rate parameters for excitatory and inhibitory learning,respectively.

The equations for changing # are exactly the same as those of the extended Mackintoshmodel (Equations 13 and 14), and the equation for changing / is exactly the same as that of thePearce–Hall model (Equation 16). Crucially, in this hybrid model attentional associabilityoperates over a wider range than does salience associability. Attentional associability, #, isconstrained to lie between 1 and .05, while salience associability, /, is constrained to liebetween 1 and .5. That is:

If #A + 1 then #A = 1If #A - .05 then #A = .05

23

If /A + 1 then /A = 1If /A - 1 then /A = .5

24

These constraints afford attentional associability a potentially greater importance thansalience associability. Reductions in attentional associability can effectively halt learning abouta stimulus regardless of its salience associability (if # = .05 in Equations 10 and 11 there can berelatively little change in associative strength, even for a stimulus with high /). Reductions insalience associability, on the other hand, while attenuating the rate of learning about a stim-ulus, will not prevent that learning to nearly the same extent (a stimulus with / = .5 can stillundergo relatively large changes in associative strength, especially if it has high #). Rememberthat, according to the Mackintosh model, stimuli that are good predictors of outcomes will


maintain high attentional associability, while the attentional associability for poorer predictorsdeclines. The Pearce–Hall model effectively states the opposite—stimuli will maintain a highsalience associability to the extent that the events that follow them are surprising. Given theconstraints on # and / as explained above, this means that a stimulus that is a poor predictor ofan outcome (low Mackintosh #, high Pearce–Hall /) will be learnt about only very slowly,while a stimulus that is a good predictor of an outcome (high Mackintosh #, low Pearce–Hall/) will be learnt about more rapidly. This assumption seems intuitively plausible—a stimulusthat is “unattended” should receive very little processing regardless of the accuracy of thepredictions that it makes, while a stimulus that is “attended to” but whose predictions areunsurprising should be given greater weight in the learning mechanism. Nevertheless, in theabsence of experimental evidence bearing on the issue of the minimum values of attentionaland salience associability, the assumption that #min < /min should perhaps be treated with somecaution.

The final assumption made in the model presented here is that the starting values of # and/ for a novel stimulus are near their maximum: A value of .9 is used for each in the simulationsdiscussed below. As a result there is the potential for a stimulus to undergo a large decline in itsability to activate the learning process relative to this starting point, but the potential for only avery small increase in this ability. That is to say, this parameterization assumes that associ-ability effects seen empirically will generally reflect reductions in the ability of stimuli toengage the learning process relative to their ability when novel. With regard to the salienceassociability envisaged by the Pearce–Hall model, this is not a particularly contentious issue—in general that model views the ability of a stimulus to engage the learning process as startingfrom a high value and falling as a result of any kind of exposure to that stimulus (though somekinds of exposure—e.g., partial reinforcement training—will tend to maintain a high salienceassociability for longer). Indeed, Pearce and Hall (1980) do not mention the possibility of anincrease in the associability of a novel stimulus as a result of any kind of exposure to thatstimulus (although following a decline a stimulus’s associability may rise again as a result ofmanipulations making the events following it more surprising—e.g., addition of an un-expected post-trial reinforcer).

From the point of view of the Mackintosh model, however, the suggestion that empiricalassociability effects generally reflect decreases, and not increases, in the attentional associ-ability afforded to stimuli is perhaps more debatable. Certain phenomena seem to demand thatthe attentional associability of stimuli can fall as a result of experience of the predictive historyof those stimuli. For instance, it is hard to account for learned irrelevance without allowing fora decrease in the associability of the CS as a result of uncorrelated CS/US preexposure. Recallalso Mackintosh and Turner’s (1971; see also Mackintosh, 1978) study of unblocking byincreasing reinforcer magnitude. A ( us (weak shock) training followed by AB ( US (strongshock) trials results in greater conditioned responding to B than that seen in a standardblocking group (A ( us then AB ( us, or A ( US then AB ( US). Mackintosh and Turnerfound that insertion of four compound trials, during which the AB compound was paired withweak shock, before compound training with the strong shock attenuated this unblockingeffect, revealing little evidence of conditioning to B. The associability of B must have declinedduring the course of the four compound trials on which AB was paired with weak shock,presumably because it was a poorer predictor of this weak shock than was A. Seraganian (1979)

228 LE PELLEY

provides further evidence demonstrating decreases in attentional associability in a study ofdiscrimination learning.

So there exists persuasive evidence supporting the idea that the attentional associability ofstimuli can fall from some starting value as a result of experience of those stimuli. But can wetake this idea one step further and state that all associability effects reflect decreases in associ-ability of nonpredictive stimuli, or must we allow for increases in the associability of predictivestimuli over and above this starting value? Consider the superior discrimination learning typi-cally observed following an intradimensional (ID) shift in the rewarded and nonrewardedstimuli, as compared to following an extradimensional (ED) shift (George & Pearce, 1999;Mackintosh & Little, 1969; Oswald et al., 2001; Schwartz et al., 1971; Shepp & Schrier, 1969).Let us imagine a study in which the stimuli can vary on two dimensions, colour and shape. Forthe ID shift group, colour is the relevant dimension for both training and transfer discrimina-tions, while for the ED shift group shape is relevant during training, and colour is relevantduring transfer. Typically, more rapid learning of the transfer discrimination is observed foranimals undergoing an ID shift than for those undergoing an ED shift. One might view this IDshift advantage as resulting from an increase in the attentional associability of the predictivefeatures (colours) of the original training stimuli: This increased associability of the trainedcolours then generalizes to the colours of the transfer stimuli, facilitating acquisition of thislatter discrimination as these predictive features are more ready to engage the learningprocess. On this view, the ID shift advantage reflects an increase in the associability of predic-tive features during training of the original discrimination. An alternative view, however, isthat the ID shift advantage reflects a decrease in the associability of nonpredictive features.During training for the ED subjects, the colour of the stimuli is uncorrelated with reinforce-ment. If the associability of the colours used (or the general dimension of “colour”, followingSutherland & Mackintosh, 1971) declines as a result, this will have a detrimental effect onsubsequent acquisition of a discrimination in which colour is relevant. Likewise duringtraining for the ID subjects, the shape of the stimuli is irrelevant: A decline in the associabilityof this feature will facilitate acquisition of a subsequent discrimination in which shape is againirrelevant. Thus any difference in rate of learning of the transfer discrimination could beascribed to decreases in the associability of irrelevant stimuli. In fact, there is empiricalevidence to support this latter approach to the ID shift advantage: Turrisi, Shepp, and Eimas(1969) found no difference in acquisition of a discrimination following an ID or ED shift if the“irrelevant” dimension had not been present in the first problem.

Likewise, it is possible to account for all of the empirical phenomena described here insupport of the view of attentional associability envisaged by the Mackintosh model asreflecting decreases in the associability of nonpredictive stimuli, as opposed to increases in theassociability of predictive stimuli. In fact, while evidence supporting decreases in attentionalassociability for nonpredictive cues is well established, as yet there exists no evidenceproviding clear and unequivocal support for an increase in the associability of a predictivestimulus. It is, of course, quite possible that such evidence will be revealed in future. Hence themodel presented here does not completely rule out the idea that the associability of predictivecues might increase above some starting value—the associability of a predictive cue ispermitted to increase from .9 to a maximum of 1. Nevertheless, the potential for decreases inassociability (and consequently the potential for these decreases to effect empirical


observations) is much greater—the associability of a nonpredictive cue can decrease from .9 toa minimum of .05.

Simulations reveal that the hybrid model outlined above can account for all of the “stan-dard” phenomena of associative learning. Thus acquisition of simple conditioning follows anegatively accelerated function, as does subsequent extinction of that conditioning. Thismodel combines separable error terms (1 – VA + V A in Equation 10 and 1 – V A + VA inEquation 11) with a summed error term, |R| in a similar fashion to the extended Mackintoshmodel presented earlier. As such it is able to account for the development of standard condi-tioned inhibition, conditioned inhibition from a reduced reinforcer, overexpectation, super-conditioning, and supernormal conditioning in a very similar way to this earlier model. Inaddition the model explains the results of Rescorla’s (2000, 2002) studies of the distribution ofassociative change between the elements of a compound in the same way as this earlier model,again by virtue of combining separable and summed error terms. These predictions have allbeen confirmed by computational simulations employing exactly the same model parametersas outlined below—space constraints preclude the inclusion of the results of those simulationshere.

These are phenomena that any model attempting to provide a satisfactory account of asso-ciative change must address. Nevertheless, any model incorporating both separable andsummed error terms could account for these phenomena in terms of changes in processing ofthe US. The true power of the hybrid model lies instead in its ability to account for a widevariety of associability effects: that is, phenomena reflecting changes in processing of CSs as afunction of the associative history of those CSs. Here I describe the application of the model toa subset of the most diagnostic associability effects as described earlier; application of themodel to the remaining preparations, while in some cases perhaps requiring extensions (e.g.,inclusion of a temporal component in a real-time implementation), would be a matter ofrelative simplicity.

Learned irrelevance refers to the finding that uncorrelated exposure to a CS and USresults in slower subsequent conditioning to that CS when it is consistently paired with theUS than if it were novel. The hybrid model provides a clear account of the learned irrele-vance effect. During uncorrelated CS/US exposure, the CS is a poorer predictor of the USthan is the experimental context, and hence the attentional associability (#CS, determined bythe Mackintosh equations) of the CS will fall. Given that even the “CS + context”compound is a relatively poor predictor of the US, the salience associability of the CS (/CS,determined by the Pearce–Hall equation) will tend to be maintained at a relatively highvalue. Nevertheless, the low # of the CS following uncorrelated CS/US exposure willensure that, when that CS is subsequently paired with the US, learning about it will be slow(although #CS will increase steadily over the course of conditioning training as the CS isestablished as a good predictor of reinforcement). This can be compared with a controlgroup for whom the CS is novel at the outset of conditioning, such that it will enter condi-tioning trials with high attentional and salience associabilities (#CS = /CS = .9), promotingrapid learning about that stimulus.

The earlier discussion of learned irrelevance raised the possibility that it reflected nothingmore than the sum of CS and US preexposure effects. The balance of experimental evidencepresented there suggested that learned irrelevance is more than the sum of its parts. It isimportant to verify, then, that the hybrid model sees learned irrelevance as more than a

230 LE PELLEY

preexposure effect. None of the components of this model (summed error term, separableerror term, Mackintosh, and Pearce–Hall) allow for changes in the processing of a US as aresult of simple exposure to that US, and hence the hybrid model has no scope for explaining aUS preexposure effect. It does, however, predict that CS preexposure will have an effect onsubsequent conditioning. While preexposure to a CS in the absence of reinforcement will notaffect #CS, it will lead to a decline in /CS, as the events following the CS (nothing) are notsurprising. Hence nonreinforced exposure will have a detrimental effect on the ability of theCS to enter into associations with the US on subsequent conditioning trials, as compared to anovel CS that has not undergone this decrease in salience associability. In other words, inclu-sion of a variable salience associability (following the Pearce–Hall approach) allows this hybridmodel to account for the well-established empirical phenomenon of latent inhibition (seeLubow, 1989). Note, however, that the retardation in learning resulting from nonreinforcedexposure to a stimulus will not be as great as that resulting from uncorrelated CS/US expo-sure. In the case of nonreinforced exposure the attentional associability of the CS remains high(#CS = .9), while its salience associability declines to the lower limit of .5. During uncorrelatedCS/US exposure, however, the attentional associability of the CS will decline towards itslower limit of .05, while salience associability will also decline, if only slightly (as the CS +context compound is a relatively poor predictor of the outcome). Hence in the former case, thefactor limiting the rate of conditioning will be salience associability; in the latter case, it will beattentional associability. And as attentional associability is more potent than salience associ-ability in terms of its ability to slow learning (as discussed earlier), uncorrelated CS/US expo-sure will result in slower conditioning than nonreinforced CS exposure, with both givingslower conditioning than a novel CS.

These predictions were confirmed by simulation of a simplified learned irrelevance experi-ment. For the irrelevance condition, preexposure was to randomly intermixed X+, X–, AX+,and AX- trials: In this design, the occurrence of the US is not correlated with the presence orabsence of A. This preexposure continued for 20 blocks, with each of the four trial types occur-ring once per block. The CS-exposure condition employed the same trial types, but with noUSs occurring. In the novel control condition there was no preexposure. All groups thenreceived eight A+ conditioning trials. Figure 7 shows the results of this simulation—the datafor each group represent the average values for eight simulated subjects. Parameters used forthis simulation, and for all other simulations run with this model, were: $E = .5, $I = .1, ,E = .8,,I, = .1, . = .1, starting value of # = .9, starting value of / = .9, % (US present) = .8, % (USabsent) = 0. It should be noted, however, that the model’s predictions are robust against evenlarge changes in these parameters. Panel A shows the changes in attentional and salience asso-ciability respectively undergone by A during the preexposure phase for the irrelevance andCS-exposure groups. Panel B shows the changes in attentional and salience associabilityduring the subsequent conditioning phase for all three groups, and Panel C shows the changesin associative strength undergone by A during this phase.

The results of this simulation confirm the predictions set out above. Nonreinforcedpreexposure results in a steady decrease in / while leaving # unaffected, while uncorrelatedCS/US exposure drives # to a very low value, with / maintained at a higher level for longer.As a result, nonreinforced preexposure to a CS results in slower conditioning of that CS than ifit were novel, but uncorrelated CS/US exposure leads to slower conditioning still (a result ofthe fact that, at the outset of conditioning, # is lower for the irrelevance group than is / for the


CS-exposure group). The value of #A rises for all conditions over the course of conditioning,as A is the best predictor of the outcome on these trials; /A rises slightly, as the occurrence ofthe US on these trials is initially surprising, and then falls again as A comes to predict the US.

The hybrid model also provides a clear account of blocking. On AB+ trials of an A+, AB+blocking procedure, three separate mechanisms combine in the hybrid model to yield blockingof learning about the added cue, B. The |R| term in Equation 10 is a summed error term anddictates that the associative change undergone by a cue is, in part, influenced by processing ofthe US. In a blocking contingency, the level of reinforcement occurring on compound trials isalready predicted by A; hence |R| will be low, and so little learning about B will occur.Second, as A is a better predictor of the level of reinforcement than is B on compound trials,

232 LE PELLEY

Figure 7. Simulation of a simplified learned irrelevance study with the hybrid model. Panel A: Associabilitychanges for irrelevance and CS-exposure groups during Stage 1. Panel B: Associability changes for irrelevance, CS-exposure, and novel control groups during Stage 2. In Panels A and B, open symbols represent salience associability(/), and closed symbols represent attentional associability (#). Panel C: Net associative strength changes during Stage2. Development of A ( US association is slowest in group irrelevance, followed by group CS-exposure, with fastestlearning in group novel control. Note that A does not enter Stage 2 with zero associative strength for group irrele-vance, despite having a noncontingent relationship with the US during Stage 1: This maintenance of weak excitatorystrength by a noncontingent cue is a general problem for any model containing a separable error term.

the attentional associability of B, #B, will decrease rapidly, drastically reducing the extent towhich it engages the learning process. Third, the fact that the outcome is well predicted oncompound trials by the presence of A means that the salience associability of B will declineover the course of compound training, as this associability is determined by a summed errorterm (Equation 16), again slowing learning about B. In a sense, then, the blocking predicted bythis model is an effect of Rescorla–Wagner, Mackintosh, and Pearce–Hall models combined.

In addition, the hybrid model is able to account for unblocking by an increase in reinforcermagnitude, again as an effect of Rescorla–Wagner, Mackintosh, and Pearce–Hall componentscombined. That is, the increased magnitude of the reinforcer on compound trials (1) producesan increased magnitude of the summed error term, |R| (Rescorla–Wagner), (2) leads to anincrease in the salience associability of A and B, as the level of reinforcement on compoundtrials is only poorly predicted by the AB compound (Pearce–Hall), and (3) ensures that thedifference in predictiveness of A and B for the intensified US on compound trials will besmaller than that for a control condition employing the same reinforcer magnitude in bothstages, such that #B in the unblocking condition will fall more slowly than that in the controlcondition (Mackintosh). Moreover, the hybrid model can account for Mackintosh andTurner’s (1971) demonstration that insertion of trials on which the AB compound is followedby the weaker reinforcer before AB ( strong US trials results in very little conditioningaccruing to B on these latter trials. On these added trials, A is a much better predictor of thelevel of reinforcement than is B, leading to a sharp decline in #B, thus preventing B fromengaging the learning process to any great degree on subsequent trials with the stronger US.Computational simulations have confirmed the ability of the hybrid model to account forblocking, unblocking by increase in reinforcer magnitude and the “reblocking” effect demon-strated by Mackintosh and Turner.

So far we have looked at the ability of the hybrid model to account for phenomena thatcould be just as easily explained by the extended Mackintosh model, presented earlier. Theadvantage of the hybrid model is that it permits explanation of results that lie beyond the scopeof this earlier model. For instance, the inclusion of a variable salience associability allows thehybrid model to account for Hall–Pearce negative transfer (Hall & Pearce, 1979). Figure 8shows simulation results for a typical Hall–Pearce experiment. Group pretrained receives 20A ( us trials followed by 8 A ( US trials; group control receives 8 A ( US trials in the


Figure 8. Simulation of Hall–Pearce negative transfer with the hybrid model. This figure shows the net associativestrength of A over the course of Stage 2 A ( US training. Learning of the A ( US association is more rapid if A isnovel (group control) than if it has previously been consistently paired with a weak US (group pretraining).

absence of any pretraining. For the former group, during the course of pretraining /A willdecline as the occurrence of the outcome comes to be predicted by the presence of A. A is thebest available predictor of the outcome, and hence its attentional associability will rise: Giventhat the starting value of #A is near its maximum, however, the effect of this increase will beonly slight. On subsequent A ( US trials, group pretraining will have a considerably lower /A

than group control, and only a slightly higher #A. Consequently, the model predicts thatconditioned responding to A should be slower to develop following A ( us pretraining than ifA were novel: This is, of course, the result seen in Hall and Pearce’s (1979) empirical study.

One area in which the hybrid model might, at first sight, appear to run into trouble is withrespect to the various studies demonstrating that, under certain conditions, learning about astimulus is more rapid when that stimulus is an inaccurate predictor of the events that follow itthan when it is an accurate predictor (Kaye & Pearce, 1984; Swan & Pearce, 1988; Wilson et al.,1992). This would seem to run against the idea that the poor predictors maintain a low #, effec-tively “turning off’ learning about them. However, let us consider again the study by Wilson etal., which is fairly typical of this type of preparation. To recap, this study employed a partiallyreinforced serial conditioning procedure in which a light was followed by a tone, and on half ofthe trials this tone terminated with delivery of the food US. Following this training, subjectswere split into two groups: group consistent, for whom training continued as before, andgroup shift, for whom the tone was omitted on all nonreinforced trials. In the former group thelight remained a perfect predictor of the tone, whereas in the latter group the contingencybetween light and tone moved from 100% in Stage 1 to 50% in Stage 2. On a subsequent teststage in which the light was paired directly with food, Wilson et al. found more rapid condi-tioning in group shift than in group consistent, indicating that degrading the contingencybetween light and tone had enhanced the ability of the light to engage the learning processrelative to the situation in which that contingency was maintained.

Given the current status of the hybrid model (and indeed the original Pearce–Hall andMackintosh models) it is not possible to simulate this experiment: These theories are designedto model the development of associations between CSs and USs, not between CSs and otherCSs (and it must be the difference in relationship between light and tone that generates thepattern of results seen in Wilson et al.’s, 1992, experiment). Nevertheless, it is possible to makecertain observations regarding this experiment from the point of view of the hybrid model.During the first stage, experience of the consistent relationship between light and tone willlead to a decline in the salience associability of the light, as it is an accurate predictor of theevents that follow it, but will allow the light to maintain a high attentional associability, as it isthe best available predictor of the tone. In the second stage, degrading the contingencybetween light and tone will act to restore the salience associability of the light to some extent, asit is no longer such an accurate predictor of the events that follow. It is, however, still the bestavailable predictor of the tone on trials on which light and tone occur (note that the tone neveroccurs in the absence of the light), and hence it will still maintain a high # throughout thislatter stage. Consequently at the outset of the test phase, for group consistent the light willhave a high # and a low /, while for group shift the light will have a similar # but a higher / (assalience associability has been restored in this group). The result will be more rapid learning ofthe light–food relationship in group shift, as observed experimentally. The issue of theoutcome specificity of associability is also germane to this analysis. Recall Mackintosh’s (1973)learned irrelevance study, indicating that attentional associability is, to some extent, outcome

234 LE PELLEY

specific. Suppose, then, that the attentional associability developed by the light with respect tothe tone does not transfer to subsequent learning about the light with respect to food, as toneand food are such dissimilar outcomes. This will attenuate any effect of the difference in light–tone relationship for the two groups on the # of the light that is effective on light–food trials. Inother words, even if shifting from a 100% to a 50% contingency between light and tone doeshave an impact on the # of the light with respect to tone, this will not be manifest in the rate oflearning of a light–food association. And as long as we assume that the salience associability of astimulus is not outcome specific, but is instead a general property of that stimulus, we are stillable to account for the more rapid learning of light–food for group shift.

All of these results taken in combination reveal the power of the hybrid model: By inte-grating two different approaches to changes in CS processing as a result of experience, themodel is able to reconcile a number of seemingly opposing demonstrations of the effects ofassociative history on subsequent learning. Moreover, combining these associability-basedprocesses with mechanisms allowing for modulation of learning in terms of changes inprocessing of the US extends the model still further. In general, then, the model can be brokendown as follows:

1. Attentional associability, # (cf. Mackintosh, 1975). CS-processing, allows explanationof learned irrelevance, Mackintosh and Turner (1971) “reblocking”, overtrainingreversal effect, ID shift advantage, etc.

2. Salience associability, / (cf. Pearce & Hall, 1980). CS-processing, allows explanation oflatent inhibition, Hall–Pearce negative transfer, and better learning, under certain con-ditions, when a stimulus is a poor predictor of following events (e.g., Wilson et al.,1992).

3. Separable error term (1 – VA +V A in Equation 10, 1 –V A + VA in Equation 11); (cf. Bush& Mosteller, 1951). US-processing, allows explanation of greater associative change forpoorer predictor of outcome as shown by Rescorla (2000, 2001, 2002).

4. Summed error term, |R| (cf. Rescorla & Wagner, 1972). US-processing, allows expla-nation of conditioned inhibition, overexpectation, superconditioning, and supernormalconditioning.

Future developments of the hybrid model

Alternative or additional components

It should he noted that the precise formulation of the hybrid model offered here is relativelyunimportant. That is to say, there are a number of other ways in which these ideas of changesin CS- and US-processing could be integrated. Moreover, the approach taken by the model isnot tied to the mechanisms for associability change outlined in this review. I have used theMackintosh and Pearce–Hall models as components of the hybrid model to illustrate howreconciliation could be achieved simply for convenience: A number of alternative componentmodels are available, and each may bring its own advantages. For example, it would beperfectly possible to employ Wagner’s (1981) SOP model or McLaren et al.’s (1989) elementalmodel instead of the Pearce–Hall theory to describe changes in salience associability. Either ofthese components would allow the resulting hybrid model to account for the US-preexposureeffect (habituation), the context specificity of latent inhibition, and demonstrations of


perceptual learning (see Hall, 1991): All of these phenomena lie beyond the present imple-mentation integrating Mackintosh and Pearce–Hall models. That said, the use of a Pearce–Hall component allows the hybrid model to account for effects of predictive accuracy onsalience associability that lie beyond Wagner’s model. Similarly, the attention-shifting mech-anisms employed by Kruschke’s (2001) ADIT or “mixture of experts” models could beadopted in place of the Mackintosh theory for modelling changes in attentional associability.These models differ from Mackintosh’s model in that attentional associability is updatedbefore associative strengths are adjusted. In addition, in Kruschke’s models attention influ-ences performance as well as learning. That is, while in Mackintosh’s model responding to acue is simply given by the cue’s associative strength, in Kruschke’s model responding to a cueis modulated by the attention paid to that cue.

The purpose of the current paper is not to support a particular implementation of thehybrid model over any other. Instead, my aim in presenting a hybrid model is to demonstratethat a number of seemingly opposing effects of associative history on later learning need not beseen as irreconcilable. While attempting to account for all of these effects with a single associ-ability mechanism may prove impossible, if we abandon this extreme parsimony and acceptthat these preparations may engage a number of different mechanisms, each of which plays itsown part in the learning process, a successful reconciliation may well be possible to achieve.That said, the discussion above indicates that a clear direction for future development of thehybrid model is in the identification and refinement of the most suitable components to use.Should associative strengths and associabilities interact to determine performance? Shouldthe model employ a Pearce–Hall module, a Wagner (1981) module, or separate modulesimplementing the mechanisms of associability change suggested by both theories? This lattersuggestion raises the question of when the increased explanatory power offered by the hybridapproach justifies the added complexity and reduction in parsimony that goes with it. Giventhe conflicting empirical evidence presented in the current paper, the theoretical distinctionbetween attentional and salience associabilities employed in the current hybrid model seemswell justified on these grounds. Whether the available evidence demands further distinctionswithin the domain of CS-processing mechanisms remains to be seen.

Novel predictions

This agnosticism with regard to the best possible implementation of the hybrid modelmeans that novel predictions derived from the current implementation should be regardedwith some caution. That is, a failure of the specific Mackintosh / Pearce–Hall hybrid modeloffered here need not undermine the idea of a hybrid model of associability processes ingeneral. For instance, the current implementation makes a novel prediction with regard to thephenomenon of relative validity and related contingencies (Wagner, Logan, Haberlandt, &Price, 1968; Wasserman, 1974). Consider the three training conditions below (all trials in eachcondition are intermixed, +/– indicates partial reinforcement—i.e., reinforcement on 50% ofpresentations):

1. AX + BX–2. AX+/– BX+/–3. AX+ BX+

236 LE PELLEY

The question of interest is what effect this training has on the processing of cue X. In Condi-tions 2 and 3, X is the best available predictor of reinforcement (reinforcement sometimesoccurs in the absence of A or B, but never in the absence of X; as such the contingency betweenX and US is higher than that between A and US or B and US). In Condition 1, A and B arebetter predictors of the following events (reinforcement and nonreinforcement, respectively)than is X. As a result, in Conditions 2 and 3 X will maintain a high attentional associability,while in Condition 1 this will fall. What about salience associability? In Condition 3 all of A, B,and X are consistently reinforced, and as such the salience associability of all three will fallrapidly. In Condition 1, A and B are consistent predictors of following events, but the fact thatX is inconsistently reinforced will mean that the compounds are slower to become accuratepredictors than in Condition 3, and as such salience associability will fall more slowly inCondition 3 than in Condition 1. Finally, partial reinforcement in Condition 2 will ensure thatthe events following AX and BX compounds remain relatively poorly predicted throughouttraining. Hence X will maintain a higher salience associability in this condition than in eitherof the others.

According to the Mackintosh model, processing of X is solely a function of its attentionalassociability. As such this model predicts greatest processing of X following training in Condi-tions 2 and 3, and less processing in Condition 1. The Pearce–Hall model, relying solely onsalience associability, predicts greatest processing of X following training in Condition 2, lessin Condition 1, and less still in Condition 3. The hybrid model, on the other hand, employsboth attentional and salience associability to determine processing of X. As a result it predictsthat while a cue that is a better predictor of following events than are other presented cues willmaintain strong processing, stronger processing will be maintained if these following eventsare surprising. This seemingly paradoxical situation is manifest in Condition 2: X is a betterpredictor of following events than is either A or B, but these events are themselves surprising.As such the hybrid model predicts greatest processing of X following Condition 2, lessfollowing Condition 3 and less still following Condition 1 (the ordering of these latter condi-tions being determined by the greater importance afforded to attentional associability thansalience associability—see above). Hence the three theories considered (Mackintosh, Pearce–Hall, and hybrid model) make different predictions about the ordering of processing of Xfollowing training in these three conditions. If some way of testing this processing of Xfollowing training, independent of X’s current associative strength, could be found (perhapsby training with a novel reinforcer or a reinforcer of increased magnitude), this novel predic-tion of the hybrid model would be open to test.

A further novel prediction of the hybrid model can be made with regard to the trainingconditions shown in Table 2: The two conditions in this table bear similarities to the


TABLE 2

Basic design of a study to test a novel prediction of the hybrid

model

Condition Stage 1 Stage 2 Stage 3

1 A ( us A ( us AB ( US2 A ( us AB ( us AB ( US

Note: us = weak outcome; US = strong outcome.

experimental contingencies employed by Hall and Pearce (1979) and Mackintosh and Turner(1971), respectively. The focus of this design is the relative magnitude of associative changeundergone by A and B during Stage 3. In Condition 1, A is trained as a consistent predictor of aweak outcome in Stages 1 and 2. According to the Mackintosh model, this will allow A tomaintain a high attentional associability. In Stage 3 a compound of A and a novel cue, B, ispaired with a stronger outcome. Given A’s high attentional associability at the end of Stage 1,the Mackintosh model is constrained to predict that the associative change undergone by Awill be at least as great as that undergone by B (which, as a novel cue, will have the startingvalue of # at the outset of Stage 3) on these AB ( US trials. In Condition 2, during Stage 2 B isa poorer predictor of the weak outcome than is A. This should lead to a decline in #B, while #A

is maintained at a higher level. As a result, A should undergo greater associative change than Bon Stage 3 trials.

The Pearce–Hall model makes different predictions. In Condition 1, consistent pairings ofA with the weak outcome will lead to a decline in A’s salience associability. Thus on the firstStage 3 AB ( US trial (before the associabilities of A and B are updated) B will undergogreater change than will A. Recall that a summed error term is used to determine salience asso-ciability in the Pearce–Hall model (Equation 16). If . = 1 (i.e., associability is dictated only byevents on the immediately preceding trial), then following this first trial A and B will haveequal associabilities, such that all subsequent changes in the associative strengths of A and Bwill be equal. If . < 1 then B’s advantage in terms of associability will persist for longer, gradu-ally decreasing as the “moving window” average of associability moves along. So unlike theMackintosh model, Pearce–Hall predicts if anything a greater associative change in B than inA over Stage 3 as a whole (with smaller values of . giving rise to greater differences between Band A). In Condition 2, A will begin Stage 2 AB ( us trials with a lower salience associabilitythan B. Again bearing in mind the summed error term determining associability, if . = 1 thenthe salience associability of A and B will be equal following Stage 2, and hence both cues willundergo identical associative change during Stage 3. If . < 1, then B’s higher associability atthe outset of Stage 2 will persist for longer, such that B could begin Stage 3 with a highersalience associability than A and hence will undergo greater associative change than A onAB ( US trials.

To summarize, in Conditions 1 and 2 of Table 2 the Mackintosh model predicts. ifanything, "VA > "VB, while in both conditions the Pearce–Hall model predicts, if anything,"VB > "VA. The hybrid model, on the other hand, predicts different results for Conditions 1and 2. Recall that in the hybrid model as presented here, the attentional associability of goodpredictors does not rise greatly from its starting value. Consequently, in Condition 1, A and Bwill begin Stage 3 with similar attentional associabilities, but the salience associability of A willbe lower than that of B (see discussion of Pearce–Hall model with regard to this conditionabove). Hence the hybrid model predicts "VB > "VA during Stage 3 of Condition 1. Just as forthe Mackintosh model as discussed above, Stage 2 AB ( us trials of Condition 2 will lead to arapid decline in #B. While the salience associability of B may end Stage 2 slightly higher thanthat of A (as for the Pearce–Hall model), this will be easily outweighed by the difference inattentional associabilities, with #A > #B. Accordingly the hybrid model predicts "VA > "VB

during Stage 3 of condition 2. So by combining Mackintosh and Pearce–Hall components in asingle model, the hybrid model is able to predict a pattern of results that neither theory alonecould generate. Once again, if the associative change undergone by A and B during Stage 3

238 LE PELLEY

could be measured (perhaps using the technique developed by Rescorla, 2000), this novelprediction would be open to test.

Associability and representation

Both Mackintosh and Pearce–Hall models of CS processing employ elemental schemes ofrepresentation. That is, a compound AB is viewed as being composed of separable A and Belements, each of which has its own associability (and associative strength). This elementalview of representation is also incorporated into the hybrid model as presented here. In recentwork, however, George and Pearce (1999; see also Oswald et al., 2001) have demonstrated thatan extreme elemental view of associability (in which each cue develops its own, independent,associability) may not provide the most satisfactory approach to modelling empirical effects ofassociative history on subsequent learning. Their experiment employed a biconditionaldiscrimination of the form AB+ CD+ AC– BD–, in which the outcome of a trial is not deter-mined by the presence or absence of individual stimuli. According to the elemental view ofassociability outlined above, this discrimination should result in a low attentional associabilityto each of the four stimuli. Nevertheless, all four stimuli are relevant to the solution of thediscrimination in that they belong to configurations that are predictive of trial outcome; whileneither A nor B alone is predictive of outcome, for example, the “AB” configuration certainlyis. Contrary to the predictions of the elemental view George and Pearce (1999) found evidencethat training on a biconditional discrimination allowed stimuli A–D to maintain a highattentional associability. This finding instead supports the idea that configurations of stimulican develop their own associability, and this associability will then generalize to the constit-uent elements of that configuration. Thus it would seem that, at some level, associability is aproperty of configurations, rather than (or, perhaps more likely, in addition to) individualcues. Clearly these results are beyond the scope of the purely elemental hybrid model as itcurrently stands. As such, future development of the hybrid model will need to address theissue of the proper characterization of stimulus representation with regard to associability (seeBuhusi & Schmajuk, 1996; Kruschke, 1992; Pearce et al., 1998, for potential solutions to theproblem of “configural associability” that might guide future development of the hybridmodel in this regard).

CONCLUSION

Associative learning theory has come a long way since the early linear operator models proposedby Bush and Mosteller (1951), Estes (1950), and their contemporaries. In developing newmodels, the (very sensible) tendency has been to focus on one aspect of learning and build amodel centred on that aspect. For instance, the Rescorla–Wagner model focuses exclusively onchanges in US-processing, while the Pearce–Hall model focuses exclusive on changes in CS-processing. These models, while able to provide an account of phenomena within their relativelynarrow scope of expertise, fail to provide a full and satisfactory account of the varying effects ofassociative history on the associative change undergone by cues on a given learning episode.With the wealth of experimental evidence that has now built up on this subject, it may be time totake the next step in the development of associative learning theories. That is, perhaps it is timeto begin integrating the domain-specific models of learning developed over the last 50 years in an


attempt to generate “holistic” models that are consequently able to capture more of the data. Thehybrid model presented here represents one effort to do just that.

REFERENCES

Baker, A. G., & Mackintosh, N. J. (1977). Excitatory and inhibitory conditioning following uncorrelatedpresentations of CS and UCS. Animal Learning & Behavior, 5, 315–319.

Baker, A. G., & Mackintosh, N. J. (1979). Preexposure to the CS alone, or CS and US uncorrelated: Latent inhibition,blocking by context or learned irrelevance? Learning and Motivation, 10, 278–294.

Bennett, C. H., Maldonado, A., & Mackintosh, N. J. (1995). Learned irrelevance is not the sum of exposure to CS andUS. Quarterly Journal of Experimental Psychology, 48B, 117–128.

Bennett, C. H., Wills, S. J., Oakeshott, S. M., & Mackintosh, N. J. (2000). Is the context specificity of latent inhibitiona sufficient explanation of learned irrelevance? Quarterly Journal of Experimental Psychology, 53B, 239–253.

Bonardi, C., & Hall, G. (1996). Learned irrelevance: No more than the sum of CS and US preexposure effects?Journal of Experimental Psychology: Animal Behavior Processes, 22, 183–191.

Bonardi, C., & Ong, S. Y. (2003). Learned irrelevance: A contemporary overview. Quarterly Journal of Experimental

Psychology, 56B, 80–89.Bouton, M. E. (1993). Context, time, and memory retrieval in the interference paradigms of Pavlovian learning.

Psychological Bulletin, 114, 80–99.Bouton, M. E. (1994). Conditioning, remembering, and forgetting. Journal of Experimental Psychology: Animal

Behavior Processes, 20, 219–231.Brandon, S. E., Vogel, E. H., & Wagner, A. R. (2003). Stimulus representation in SOP: I. Theoretical rationalization

and some implications. Behavioural Processes, 62, 5–25.Buhusi, C. V., & Schmajuk, N. A. (1996). Attention, configuration and hippocampal function. Hippocampus, 6, 621–

642.Bush, R. R., & Mosteller, F. (1951). A mathematical model for simple learning. Psychological Review, 58, 313–323.Channell, S., & Hall, G. (1983). Contextual effects in latent inhibition with an appetitive conditioning procedure.

Animal Learning & Behavior, 11, 67–74.Cotton, M. M., Goodall, G., & Mackintosh, N. J. (1982). Inhibitory conditioning resulting from a reduction in the

magnitude of reinforcement. Quarterly Journal of Experimental Psychology, 34B, 163–180.De Houwer, J., & Beckers, T. (2002). A review of recent developments in research and theories on human

contingency learning. Quarterly Journal of Experimental Psychology, 55B, 289–310.Dickinson, A., Hall, G., & Mackintosh, N. J. (1976). Surprise and the attenuation of blocking. Journal of Experimental

Psychology: Animal Behavior Processes, 2, 313–322.Dickinson, A., & Mackintosh, N. J. (1979). Reinforcer specificity in the enhancement of conditioning by posttrial

surprise. Journal of Experimental Psychology: Animal Behavior Processes, 5, 162–177.Dickinson, A., Shanks, D. R., & Evenden, J. L. (1984). Judgement of act–outcome contingency: The role of selective

attribution. Quarterly Journal of Experimental Psychology, 36A, 29–50.Estes, W. K. (1950). Toward a statistical theory of learning. Psychological Review, 57, 94–107.George, D. N., & Pearce, J. M. (1999). Acquired distinctiveness is controlled by stimulus relevance not correlation

with reward. Journal of Experimental Psychology: Animal Behavior Processes, 25, 363–373.Hall, G. (1991). Perceptual and associative learning. Oxford: Oxford University Press.Hall, G., & Minor, H. (1984). A search for context–stimulus associations in latent inhibition. Quarterly Journal of

Experimental Psychology, 36B, 145–169.Hall, G., & Pearce, J. M. (1979). Latent inhibition of a CS during CS–US pairings. Journal of Experimental

Psychology: Animal Behavior Processes, 3, 31–42.Hall, G., & Pearce, J. M. (1982). Restoring the associability of a pre-exposed CS by a surprising event. Quarterly

Journal of Experimental Psychology, 34B, 127–140.Hearst, E. (1972). Some persistent problems in the analysis of conditioned inhibition. In M. S. Halliday (Ed.),

Inhibition and learning (pp. 5–39). London: Academic Press.Holland, P. C. (1984). Unblocking in Pavlovian appetitive conditioning. Journal of Experimental Psychology: Animal

Behavior Processes, 10, 476–497.

240 LE PELLEY

Holland, P. C. (1988). Excitation and inhibition in unblocking. Journal of Experimental Psychology: Animal Behavior

Processes, 14, 261–279.Kamin, L. J. (1969). Predictability, surprise, attention and conditioning. In B. A. Campbell & R. M. Church (Eds.),

Punishment and aversive behavior. New York: Appleton-Century-Crofts.Kaye, H., & Pearce, J. M. (1984). The strength of the orienting response during Pavlovian conditioning. Journal of

Experimental Psychology: Animal Behavior Processes, 10, 90–109.Kendler, T. S. (1971). Continuity theory and cue dominance. In J. T. Spence (Ed.), Essays in neobehaviorism: A

memorial volume to Kenneth W. Spence. New York: Appleton-Century-Crofts.Khallad, Y., & Moore, J. (1996). Blocking, unblocking, and overexpectation in autoshaping with pigeons. Journal of

the Experimental Analysis of Behavior, 65, 575–591.Konorski, J. (1967). Integrative activity of the brain. Chicago: University of Chicago Press.Kremer, E. F. (1978). The Rescorla–Wagner model: Losses of associative strength in compound conditioned stimuli.

Journal of Experimental Psychology: Animal Behavior Processes, 4, 22–36.Kruschke, J. K. (1992). ALCOVE: An exemplar-based connectionist model of category learning. Psychological

Review, 99, 22–44.Kruschke, J. K. (2001). Towards a unified model of attention in associative learning. Journal of Mathematical

Psychology, 45, 812–863.Kruschke, J. K., & Blair, N. J. (2000). Blocking and backward blocking involve learned inattention. Psychonomic

Bulletin & Review, 7, 636–645.Lattal, K. M., & Nakajima, S. (1998). Overexpectation in appetitive Pavlovian and instrumental conditioning. Animal

Learning & Behavior, 26, 351–360.Le Pelley, M. E., & McLaren, I. P. L. (2003). Learned associability and associative change in human causal learning.

Quarterly Journal of Experimental Psychology, 56B, 68–79.Lochmann, T., & Wills, A. J. (2003). Predictive history in an allergy prediction task. Proceedings of EuroCogSci 03:

The European Conference of the Cognitive Science Society (pp. 217–222). Mahwah, NJ: Lawrence ErlbaumAssociates Inc.

Lovibond, P. F., Preston, G. C., & Mackintosh, N. J. (1984). Context specificity of conditioning and latent inhibition.Journal of Experimental Psychology: Animal Behavior Processes, 10, 360–375.

Lubow, R. E. (1989). Latent inhibition and conditioned attention theory. Cambridge, UK: Cambridge University Press.Mackintosh, N. J. (1969). Further analysis of the overtraining reversal effect. Journal of Comparative and Physiological

Psychology, 67, No. 2, Part 2.Mackintosh, N. J. (1973). Stimulus selection: Learning to ignore stimuli that predict no change in reinforcement. In

R. A. Hinde & J. S. Hinde (Eds.), Constraints on learning (pp. 75–96). London: Academic Press.Mackintosh, N. J. (1975). A theory of attention: Variations in the associability of stimuli with reinforcement.

Psychological Review, 82, 276–298.Mackintosh, N. J. (1978). Cognitive or associative theories of conditioning: Implications of an analysis of blocking. In

H. Fowler, W. K. Honig, & S. H. Pulse (Eds.), Cognitive processes in animal behavior (pp. 155–175). Hillsdale, NJ:Lawrence Erlbaum Associates, Inc.

Mackintosh, N. J., & Cotton, M. M. (1985). Conditioned inhibition from reinforcement reduction. In N. E. Spear(Ed.), Information processing in animals: Conditioned inhibition. Hillsdale, NJ: Lawrence Erlbaum Associates, Inc.

Mackintosh, N. J., & Little, L. (1969). Intradimensional and extradimensional shift learning by pigeons. Psychonomic

Science, 14, 5–6.Mackintosh, N. J., & Turner, C. (1971). Blocking as a function of novelty of CS and predictability of UCS. Quarterly

Journal of Experimental Psychology, 23, 359–366.Matzel, L. D., Schachtman, T. R., & Miller, R. R. (1988). Learned irrelevance exceeds the sum of CS-preexposure

and US–preexposure deficits. Journal of Experimental Psychology: Animal Behavior Processes, 14, 311–319.McLaren, I. P. L., Kaye. H., & Mackintosh, N. J. (1989). An associative theory of the representation of stimuli:

Applications to perceptual learning and latent inhibition. In R. G. M. Morris (Ed.), Parallel distributed processing:

Implications for psychology and neurobiology (pp. 102–130). Oxford, UK: Oxford University Press.Miller, R. R., & Matzel, L. D. (1988). The comparator hypothesis: A response rule for the expression of associations.

The Psychology of Learning and Motivation, 22, 51–92.Moore, J. W., & Stickney, K. J. (1985). Antiassociations: Conditioned inhibition in attentional–associative networks.

In R. R. Miller & N. E. Spear (Eds.), Information processing in animals: Conditioned inhibition. Hillsdale, NJ:Lawrence Erlbaum Associates, Inc.


Navarro, J. I., Hallam, S. C., Matzel, L. D., & Miller, R. R. (1989). Superconditioning and overshadowing. Learning

and Motivation, 20, 130–152.Oswald, C. J. P., Yee, B. K., Rawlins, J. N. P., Bannerman, D. B., Good, M., & Honey, R. C. (2001). Involvement of

the entorhinal cortex in a process of attentional modulation: Evidence from a novel variant of an IDS/EDSprocedure. Behavioral Neuroscience, 115, 841–849.

Pearce, J. M. (1994). Similarity and discrimination: A selective review and a connectionist model. Psychological

Review, 101, 587–607.Pearce, J. M., & Bouton, M. E. (2001). Theories of associative learning in animals. Annual Review of Psychology, 52,

111–139.Pearce, J. M., George, D. N., & Redhead, E. S. (1998). The role of attention in the solution of conditional

discriminations. In N. A. Schmajuk & P. C. Holland (Eds.), Occasion setting: Associative learning and cognition in

animals. Washington, DC: American Psychological Association.Pearce, J. M., & Hall, G. (1980). A model for Pavlovian conditioning: Variations in the effectiveness of conditioned

but not of unconditioned stimuli. Psychological Review, 87, 532–552.Pearce, J. M., Kaye, H., & Hall, G. (1981). Predictive accuracy and stimulus associability. In M. L. Commons, R. J.

Herrnstein, & A. R. Wagner (Eds.), Quantitative analyses of behavior: Acquisition (Vol. 3). Cambridge, MA:Ballinger.

Pearce, J. M., Nicholas, D. J., & Dickinson, A. (1982). Loss of associability by a conditioned inhibitor. Quarterly

Journal of Experimental Psychology, 34B, 149–162.Pearce, J. M., & Redhead, E. S. (1995). Supernormal conditioning. Journal of Experimental Psychology: Animal

Behavior Processes, 21, 155–165.Prados, J., Redhead, E. S., & Pearce, J. M. (1999). Active preexposure enhances attention to the landmarks

surrounding a Morris swimming pool. Journal of Experimental Psychology: Animal Behavior Processes, 25, 451–460.

Randich, A., & LoLordo, V. M. (1979). Associative and non-associative theories of the UCS preexposurephenomenon: Implications for Pavlovian conditioning. Psychological Bulletin, 5, 25–28.

Redhead, E. S., Prados, J., & Pearce, J. M. (2001). The effects of pre-exposure on escape from a Morris pool. Quarterly

Journal of Experimental Psychology, 54B, 353–367.Reid, L. S. (1953). The development of noncontinuity behavior through continuity learning. Journal of Experimental

Psychology, 46, 107–112.Rescorla, R. A. (1969a). Conditioned inhibition of fear resulting from negative CS–US contingencies. Journal of

Comparative and Physiological Psychology, 67, 504–509.Rescorla, R. A. (1969b). Pavlovian conditioned inhibition. Psychological Bulletin, 72, 77–94.Rescorla, R. A. (1970). Reduction in the effectiveness of reinforcement after prior excitatory conditioning. Learning

and Motivation, 1, 372–381.Rescorla, R. A. (1971). Variations in the effectiveness of reinforcement and nonreinforcement following prior

inhibitory conditioning. Learning and Motivation, 2, 113–123.Rescorla, R. A. (2000). Associative changes in excitors and inhibitors differ when they are conditioned in compound.

Journal of Experimental Psychology: Animal Behavior Processes, 26, 428–438.Rescorla, R. A. (2001). Unequal associative changes when excitors and neutral stimuli are conditioned in compound.

Quarterly Journal of Experimental Psychology, 54B, 53–68.Rescorla, R. A. (2002). Effect of following an excitatory-inhibitory compound with an intermediate reinforcer.

Journal of Experimental Psychology: Animal Behavior Processes, 28, 163–174.Rescorla, R. A., & Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of

reinforcement and non-reinforcement. In A. H. Black & W. F. Prokasy (Eds.), Classical conditioning II: Current

research and theory (pp. 64–99). New York: Appleton-Century-Crofts.Roberts, A. C., Robbins, T. W., & Everitt, B. J. (1988). The effects of intradimensional and extradimensional shifts on

visual discrimination learning in humans and non-human primates. Quarterly Journal of Experimental Psychology,40B, 321–341.

Rodriguez, G., Lombas, S., & Alonso, G. (2002, March). Previous blocking trials impede conditioning of the added

element when US intensity is increased. Poster presented at the Sixth Associative Learning Symposium, Gregynog,Wales.

242 LE PELLEY

Rosas, J. M., & Bouton, M. E. (1997). Additivity of the effects of retention interval and context change on latentinhibition: Toward resolution of the context forgetting paradox. Journal of Experimental Psychology: Animal

Behavior Processes, 23, 283–294.Schmajuk, N. A., & Moore, J. W. (1985). Real-time attentional models for classical conditioning and the

hippocampus. Physiological Psychology, 13, 278–290.Schwartz, R. M., Schwartz, M., & Teas, R. C. (1971). Optional intradimensional and extradimensional shifts in the

rat. Journal of Comparative and Physiological Psychology, 77, 470–475.Seraganian, P. (1979). Extradimensional transfer in the easy-to-hard effect. Learning and Motivation, 10, 39–57.Shepp, B. E., & Schrier, A. M. (1969). Consecutive intradimensional and extradimensional shifts in monkeys. Journal

of Comparative and Physiological Psychology, 67, 199–203.Suret, M. B. S., & McLaren, I. P. L. (2003). Representation and discrimination on an artificial dimension. Quarterly

Journal of Experimental Psychology, 56B, 30–42.Sutherland, N. S., & Mackintosh, N. J. (1971). Mechanisms of animal discrimination learning. New York: Academic

Press.Swan, J. A., & Pearce, J. M. (1988). The orienting response as an index of stimulus associability in rats. Journal of

Experimental Psychology: Animal Behavior Processes, 4, 292–301.Swartzentruber, D., & Bouton, M. E. (1986). Contextual control of negative transfer produced by prior CS–US

pairings. Learning and Motivation, 17, 366–385.Thorndike, E. L. (1898). Animal intelligence: An experimental study of the associative processes in animals.

Psychological Review Monograph Supplements (No. 8) [entire issue].Trobalon, J. B., Miguelez, D., McLaren, I. P. L., & Mackintosh, N. J. (2003). Intradimensional and extradimensional

shifts in spatial learning. Journal of Experimental Psychology: Animal Behavior Processes, 29, 143–152.Turrisi, F. D., Shepp, B. E., & Eimas, P. D. (1969). Intra- and extra-dimensional shifts with constant- and variable-

irrelevant dimensions in the rat. Psychonomic Science, 14, 19–20.Wagner, A. R. (1971). Elementary associations. In J. T. Spence (Ed.), Essays in neobehaviorism: A memorial volume to

Kenneth W. Spence. New York: Appleton-Century-Crofts.Wagner, A. R. (1981). SOP: A model of automatic memory processing in animal behaviour. In N. E. Spear & R. R.

Miller (Eds.), Information processing in animals: Memory mechanisms (pp. 5–47). Hillsdale, NJ: Lawrence ErlbaumAssociates, Inc.

Wagner, A. R. (2003). Context-sensitive elemental theory. Quarterly Journal of Experimental Psychology, 56B, 7–29.Wagner, A. R., & Brandon, S. E. (2001). A componential theory of Pavlovian conditioning. In S. B. Klein (Ed.),

Handbook of contemporary learning theories. Mahwah, NJ: Lawrence Erlbaum Associates, Inc.Wagner, A. R., Logan, F. A., Haberlandt, K., & Price, T. (1968). Stimulus selection in animal discrimination

learning. Journal of Experimental Psychology, 76, 171–180.Wagner, A. R., Mazur, J. E., Donegan, N. H., & Pfautz, P. L. (1980). Evaluation of blocking and conditioned

inhibition to a CS signalling a decrease in US intensity. Journal of Experimental Psychology: Animal Behavior

Processes, 6, 376–385.Wasserman, E. A. (1974). Stimulus–reinforcer predictiveness and selective discrimination learning in pigeons.

Journal of Experimental Psychology, 103, 284–297.Whitney, L., & White, K. G. (1993). Dimensional shift and the transfer of attention. Quarterly Journal of Experimental

Psychology, 46B, 225–252.Williams, B. A., & McDevitt, M. A. (2002). Inhibition and superconditioning. Psychological Science, 13, 454–459.Wilson, P. N., Boumphrey, P., & Pearce, J. M. (1992). Restoration of the orienting response to a light by a change in its

predictive accuracy. Quarterly Journal of Experimental Psychology, 44B, 17–36.

Original manuscript received 30 June 2003

Accepted revision received 1 September 2003


Documents

The role of associative history in models of associative ... · The role of associative history in models of associative learning: A selective review and a hybrid model M. E. Le Pelley