Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Increased conditioned response to the CS as a result of being paired with the US.
Acquisition
CSUS or CS+
Eyeblink Conditioning Example
0
20
40
60
80
100
1 2 3 4 5 6 7 8 9
Blocks of 10 Trials
% C
Rs
Formation of a CS-US association
What Causes Acquisition?
CS US
Decreased conditioned response to a previously reinforced CS as a result of nonreinforced presentations (i.e., in the absence of the US).
Extinction
CSno US or CS-
Acquisition Extinction
0
20
40
60
80
100
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Blocks of 10 Trials
% C
Rs
CS+ CS-
• Unlearning – weakening the CS-US association
• Inhibition – new learning that interferes with the expression of the CS-US association
What Causes Extinction?
Prediction – if extinction is due to inhibition of an intact CS-US association, then certain stimulus manipulations should recover the CR.
Increased conditioned response to CS if delay is interpolated between extinction trials. (Pavlov, 1927)
Spontaneous Recovery
Acquisition Extinction Sessions
0
20
40
60
80
100
1 3 5 7 9 11 13 15 17 19 21 23
Blocks of 10 Trials
% C
Rs
Disinhibition
• Following light-food pairings, Pavlov initiated extinction:– During the extinction trials the dog stopped
salivating to the light CS.
• Pavlov now presented a new, novel stimulus, e.g., a clicker during the light CS.
• The dog salivated, suggesting a release from inhibition, or Disinhibition.
Disinhibition
LightFood Light-
Clicker
Conditioned Inhibition
• Pavlov discovered conditioned inhibition.
• A conditioned inhibitor is a stimulus that inhibits the conditioned response.
• Interspersed lightfood trialswith light-toneno US trials.
• Abbreviated A+/AX-. A = light, X = tone, + and – represent food and no food, respectively.
Conditioned Inhibition Procedure
A US AX A US
A USA US
A US AX A US
AX
Conditioned Inhibition
Trials
CR
A AX
A+/AX-
Summation Test for CI
Trials
CR
A AX BX
A+/AX-, B+
Retardation Test for CI
Trials
CR
A+/AX-, X+/Y+
X+
Y+
Second-Order Conditioning• Pavlov discovered the phenomenon of Second-Order Conditioning (SOC) which uses a similar procedure as that for conditioned inhibition.
• A+/AX- training.
• However, number of AX- trials is critical- Few AX- trials leads to SOC- Many AX- trials leads to conditioned inhibition
• SOC also typically produced in two phases.- A+ training followed by AX- training.
Second-Order ConditioningDesign of Conditioned InhibitionPhase 1 Test XA+/AX- CIMany AX- trials (tens to hundreds)
Design of Second-Order ConditioningPhase 1 Phase 2 Test XA+ AX- CRFew AX- trials (typically not more than 8-10)
Factors in conditioningContiguity: The closer two stimuli are in space
and time, the stronger the conditioned response.
Salience: More intense or noticeable stimuli condition more rapidly.
ContiguityDelay conditioning typically stronger than trace conditioning.
0
10
20
30
40
50
60
Delay Trace Control
Group
CR
Delay CSUS
Trace CS------------->US
Control CS ~US
SalienceMore intense CS conditions faster than less intense CS.
CR
Trials
80 Hz Tone
60 Hz Tone
Simple Model of Associative Learning
∆VCS =ab(-VCS(n))
Bush & Mostellar (1955)
∆VCS = change in associative strength of CS VCS = associative strength of CSλ = Asymptote of learning
Learning rate parametersα= CS salience (0-1; 0 = no CS)β = US salience (0-1; 0 = no US)
VCS = Vn + ∆Vn+1
Bush & Mostellar
λ = 1.0, β= .5 α60db = .2 α80db = .4
∆V60db V60db ∆V80db V80db .2 X .5(1-0) = .10 .10 .4 X .5(1-0) = .20 .20.2 X .5(1-.10) = .090 .19 .4 X .5(1-.20) = .16 .36.2 X .5(1-.19) = .081 .271 .4 X .5(1-.36) = .128 .488.2 X .5(1-.271) = .073 .344 .4 X .5(1-.488) = .102 .590.2 X .5(1-.344) = .066 .410 .4 X .5(1-.590) = .082 .672.2 X .5(1-.410) = .059 .469 .4 X .5(1-.672) = .066 .738
∆VCS = αβ(λ−VCS)VCS = Vn + ∆Vn+1
Bush & Mostellar - Acquisition
0.00
0.20
0.40
0.60
0.80
1.00
1 6 11 16 21 26 31
Trial
Ass
oci
ativ
e S
tren
gth
60 db
80 db
Bush & Mostellar - Extinction
0.00
0.20
0.40
0.60
0.80
1.00
1 6 11 16 21 26 31
Trial
Ass
oci
ativ
e S
tren
gth
60 db
80 db
∆VCS = αβ(λ-VCS)0
Factors in conditioningContiguity: The closer two stimuli are in space
and time, the stronger can be the association between them.
Salience: More intense or noticeable stimuli condition more rapidly.
Contingency: The higher the correlation between two stimuli, the stronger the conditioned response.
Cue-Competition EffectsWhen multiple CSs are presented together during conditioning.
Reduced response to one (or more) CS when it is presented alone on a probe test.
Overshadowing effectOvershadowing – Reduced CR to a CS if it is
paired with the US in the presence of a
more salient CS. i.e., AXUS, where A is more salient than X.
Design:Group Treatment Test xOvershadow Ax+ crAcq. Control x+ CR
Pavlov, 1927
Overshadowing (Blaisdell et al., 1998)
Group
→
cr
CR
→Overshadow
Control
Training Test
+
Blocking effectBlocking – Reduced CR to a CS if it is paired
with the US in the presence of a previously established CS.i.e., AUS followed by AXUS.
Design:Group Phase 1 Phase 2 Test XBlocking A+ AX+ crAcq. Control B+ AX+ CR
Kamin, 1968
Kamin’s (1968) interpretationof blocking
Group Phase 1 Phase 2Block AUS AXUSAcq BUS AXUS
US has to be “surprising” to the animal for learning of the blocked CS-US association to occur.
Because A predicts the US in the Blocking group, the US is not surprising during Phase 2 trials.
∆VCS = αβ(λ-VSUM)
Rescorla and Wagner (1972) modelFormalized the notion of “surprise” as a learning factor.
Blocking group∆VX = αβ(λ -VA+X)∆VX = 1(1 –[1+0]) = 0
Acq group∆VX = αβ(λ -VA+X)∆VX = 1(1 – [0+0]) = 1
Group Ph. 1 Ph. 2 VA VXBlock A+ AX+ 1 0 Acq B+ AX+ 1 1
Phase 2
Rescorla and Wagner (1972) model
Accounts for Contingency effects:
Blocking (Kamin)
Overshadowing (Pavlov)Ax+, A-US association develops faster than X-US
Conditioned inhibition (Pavlov)A+/AX-, (-VA+X) = (0-[1+0]) = -1X develops negative associative strength!
E.L. Thorndike
The Invention of S-R psychology: Connectionism ala ThorndikeInfluenced by Darwin, Associationismand Pavlov – but was interested in associations based on outcome or consequenceAnimal Intelligence? Anecdote insteadof careful observation and experiment. See---Hans the Clever Horse
The “Puzzle Box”
Learning: accident, insight or effect?
“When put into the box, the cat would show evident signs of discomfort and impulse to escape from confinement. It tries to squeeze through any opening; it claws and bites at the wire; it thrusts its paws out through any opening and claws at everything it reaches…. It does not pay very much attention to the food outside but seems simply to strive instinctively to escape from confinement…. The cat that is clawing all over the box in her impulsive struggle will probably claw the string or loop or button so as to open the door. And gradually all the other unsuccessful impulses will be stamped out and the particular impulse leading to the successful act will be stamped in by the resulting pleasure, until, after many trials, the cat will, when put in the box, immediately claw the button or loop in a definite way" (Thorndike, 1913:13).
Watching cats in the William Jame's Attic.
The Three “Laws”Thorndike's theory consists of three primary laws:
(1) law of effect - responses to a situation which are followed by a rewarding state of affairs (SATISFYERS) will be strengthened and become habitual responses to that situation, --the opposite for ANNOYERS (2) law of readiness - a series of responses can be chained together to satisfy some goal which will result in annoyance if blocked (expectancy), and (3) law of exercise - connections become strengthened with practice and weakened when practice is discontinued. A corollary of the law of effect was that responses that reduce the likelihood of achieving a rewarding state (i.e., punishments, failures) will decrease in strength.
Thorndike's legacy to education
Frequency not sufficent –Drill and rote memory fell out of favor after Thorndike's Law of effect was popularized
Punishment not as effective as RewardCorporal Punishment seen as less justified.
Thorndike Time-Line
1874 The birth of Edward Lee Thorndike
1897 Applied for graduate program at Columbia University
1898 Awarded his doctorate
1899 Instructor in Psychology at Teachers College, Columbia
1905 Formalized the Law of Effect
1911 Published "Animal Intelligence"
1912 Elected President of American Psychological Association
1917 One of the first psychologist admitted to the National Academy of Sciences
1921 Ranked #1 as an American Men of Science.
1934 Elected President of the American Association for the Advancement of Science
1939 Retired
1949 Thordike died
B F Skinner1904-1990
Behavior of Organisms (1938)
Generic nature of stimulus and response (1935)
The State of WorldStimulus Field
Response Equivalence set
Skinner
■ Respondent behavior - elicited by a known stimulus
■ Operant behavior - emitted by the organism
■ Type S or respondent conditioning■ Type R or operant conditioning
Skinner: The 3-term Contingency
{Stimulus}---->RESPONSE---->SR+
Environment EMITTED BEHAVIOR CONSEQUENCE
foraging context Search of area food
Skinner■ Deprivation State■ Magazine training■ shaping - reinforcement of successive
approximations toward a goal response– differential reinforcement– successive approximation
■ extinction - removal of the reinforcer maintaining a response such that the response is eliminated
Contingencies of Reinforcement
Positive
Negative
HEDONIC
ACCESSPresent Remove
Reward learning“praise”, “food” etc..
Punishment
“physical pain”“criticism”
Negative Reinforcement
“token economies”
Positive Punishment “time out”
“Corner search”“biting bar”“lever pressing”
Underlying Response Dimensions
Probability of response increases if it produces reinforcement
Pro
babi
lity
of R
espo
nse
Skinner onPunishment
■ Negatives of Punishment– emotional byproducts– Ineffective –not long lasting– states what organism should not do, not
what it should do– justifies inflicting pain - punishment is often
administered for the gradification of the punisher
– punishment elicits aggressive behavior
Skinner - Schedulesof Reinforcement
■ There are a large number of reinforcement schedules (Ferster & Skinner, 1957)
■ Seven we will address– continuous – fixed ratio (FR)– fixed interval (FI)– variable interval (VI)– variable ratio (VR)– concurrent– drl and dro
Skinner - Schedulesof Reinforcement
■ Continuous - every response is reinforced.
■ Fixed ratio is same but reinforcement occurs every nth time. Can be one to one or five to one etc.
Skinner - Schedulesof Reinforcement
0
2
4
6
8
10
12
14
16
18
20
Post Reinforcement Pause
Step Ladder Effect
Skinner - Schedules of Reinforcement
0
5
10
15
20
25
Variable RatioHigh Productivity
Skinner - Schedules of Reinforcement
0
10
20
30
40
50
60
70
80
Fixed Interval
Scalloping Effect
Skinner - Schedules of Reinforcement
0
5
10
15
20
25
30
Variable Interval
cummulative record
Herrnstein’s Matching Law■ For a concurrent reinforcement
schedule (rat responds to two levers each on a different reinforcement schedule) the relative frequency of behavior matches the relative frequency of reinforcement
Matching Law
B1
B1 + B2
=R1
R1 + R2
Skinner
■ Teaching machines■ programmed learning■ contingency contracts■ Skinner as social theorist■
■ Skinner introducing Teaching Machines.
Skinner- Verbal Behavior (1957)
■ Language– mand– tact– echoic– autoclitic
■ Critic Noam Chomsky■ Skinner framing the debate.
Anomalies, misbehavior and confusion about Operant conditioning and Pavlovian Conditioning.
Polydipsia (Adjunctive Behavior)
Superstitious Pigeons
Staddon & Simelhag (interim and terminal behaviors)
Premack Principle
Optimality
In the Brain.
Premack Principle
■ A less frequently engaged in activity can be reinforced by the opportunity to engage in a more frequently engaged in activity.
Timberlake’s Disequilibrium hypothesis
■ Given a free choice situation an organism will distribute its time in different behavior. Those behaviors set an equilibrium. If the animal falls below the equilibrium it will be motivated to increase the behavior to maintain the behavior. Behaviors moving to high creates the reverse situation
Misbehaviors
■ Breland and Breland – instinctual drift
■ Autoshaping - an animal will automatically condition itself instinctually if motivated. Pigeons peck at an illuminated disc prior to eating. This learn to do this automatically.