13
Psychological Review 1971, Vol. 78, No. 1, 58-70 PUNISHMENT: METHOD AND THEORY' PHILIP J. DUNHAM 2 Dalhousie University A methodological framework for the analysis of punishment is outlined. The methodology, which is called a multiple-response base-line procedure, serves two purposes. First, it raises a number of new questions about the properties of punishment. Second, it permits the examination of some untested assumptions found in traditional punishment theory. Initial evidence obtained with the multiple-response methodology questions the validity of traditional theoretical assumptions and suggests two simple rules for predicting the properties of various punishment operations. When an aversive stimulus is contingent upon the occurrence of a particular response, a decrement in the probability of the re- sponse is usually observed. This procedure is typically called punishment and the decre- ment in response probability is called punish- ment suppression. The basic purpose of this paper is to delineate some fundamental prob- lems with existing punishment theory and to suggest an alternative approach to the prob- lem of punishment. An Overview of Punishment Theory Historically, there have been two funda- mental assumptions used to explain the phe- nomenon of punishment suppression. The first of these assumptions was the strong version of the negative Law of Effect pro- posed by Thorndike (1913). Thorndike assumed that any painful or unpleasant event would weaken the response (or assumed S-R bond) which preceded that event. Thorndike (1932) subsequently rejected this notion and it has not enjoyed any serious attention since that time. The second funda- mental assumption suggested to account for 1 Research reported in this paper was supported by Project Grant APA-194 from the National Research Council of Canada. Many valuable suggestions have been made by different individuals during the preparation of this manuscript. The author is particularly grateful to N. J. Mackin- tosh, C. J. Brimer, and B. Moore for their critical comments. 2 Requests for reprints should be sent to Philip J. Dunham, Department of Psychology, Dalhousie University, 1460 Oxford Street, Halifax, Nova Scotia, Canada. the punishment suppression phenomenon has been referred to as the alternative-response assumption (cf. Dunham, Mariner, & Adams, 1969). In its simplest form, the assumption states that the decrement in a punished response is caused by an increment in some alternative behavior. All contemporary explanations of punish- ment suppression are specific elaborations of this alternative-response assumption. Those specific elaborations which have been most formalized fall into two major cate- gories. These categories are referred to as single-process and two-process theories of punishment (cf. Solomon, 1964). The ear- mark of the single-process theory is the assumption that only one type of learning mechanism is involved in the development and maintenance of the alternative response during punishment training. Two types of single-process theory have been suggested and are differentiated in terms of suggesting either a classical or an instrumental con- ditioning mechanism. Estes and Skinner (1941), for example, suggested that emo- tional responses elicited by the punishing event are classically conditioned to stimuli which precede the punishing event. The classically conditioned behavior is assumed to compete with the punished response and cause the suppression. Miller and Dollard (1941) exemplify the instrumental condi- tioning version of single-process theory. They suggested that any response which is associated with the termination of the punishing stimulus will be instrumentally 58

PUNISHMENT: METHOD AND THEORY'steelekm/.../psy5300/Documents/Dunham1971-punishmen… · mental assumption suggested to account for 1 Research reporte d in thi s paper wa supporte

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: PUNISHMENT: METHOD AND THEORY'steelekm/.../psy5300/Documents/Dunham1971-punishmen… · mental assumption suggested to account for 1 Research reporte d in thi s paper wa supporte

Psychological Review1971, Vol. 78, No. 1, 58-70

PUNISHMENT:

METHOD AND THEORY'

PHILIP J. DUNHAM 2

Dalhousie University

A methodological framework for the analysis of punishment is outlined.The methodology, which is called a multiple-response base-line procedure,serves two purposes. First, it raises a number of new questions aboutthe properties of punishment. Second, it permits the examination of someuntested assumptions found in traditional punishment theory. Initialevidence obtained with the multiple-response methodology questionsthe validity of traditional theoretical assumptions and suggests two simplerules for predicting the properties of various punishment operations.

When an aversive stimulus is contingentupon the occurrence of a particular response,a decrement in the probability of the re-sponse is usually observed. This procedureis typically called punishment and the decre-ment in response probability is called punish-ment suppression. The basic purpose of thispaper is to delineate some fundamental prob-lems with existing punishment theory and tosuggest an alternative approach to the prob-lem of punishment.

An Overview of Punishment Theory

Historically, there have been two funda-mental assumptions used to explain the phe-nomenon of punishment suppression. Thefirst of these assumptions was the strongversion of the negative Law of Effect pro-posed by Thorndike (1913). Thorndikeassumed that any painful or unpleasant eventwould weaken the response (or assumedS-R bond) which preceded that event.Thorndike (1932) subsequently rejected thisnotion and it has not enjoyed any seriousattention since that time. The second funda-mental assumption suggested to account for

1 Research reported in this paper was supportedby Project Grant APA-194 from the NationalResearch Council of Canada. Many valuablesuggestions have been made by different individualsduring the preparation of this manuscript. Theauthor is particularly grateful to N. J. Mackin-tosh, C. J. Brimer, and B. Moore for their criticalcomments.

2 Requests for reprints should be sent to PhilipJ. Dunham, Department of Psychology, DalhousieUniversity, 1460 Oxford Street, Halifax, NovaScotia, Canada.

the punishment suppression phenomenon hasbeen referred to as the alternative-responseassumption (cf. Dunham, Mariner, &Adams, 1969). In its simplest form, theassumption states that the decrement in apunished response is caused by an incrementin some alternative behavior.

All contemporary explanations of punish-ment suppression are specific elaborationsof this alternative-response assumption.Those specific elaborations which have beenmost formalized fall into two major cate-gories. These categories are referred to assingle-process and two-process theories ofpunishment (cf. Solomon, 1964). The ear-mark of the single-process theory is theassumption that only one type of learningmechanism is involved in the developmentand maintenance of the alternative responseduring punishment training. Two types ofsingle-process theory have been suggestedand are differentiated in terms of suggestingeither a classical or an instrumental con-ditioning mechanism. Estes and Skinner(1941), for example, suggested that emo-tional responses elicited by the punishingevent are classically conditioned to stimuliwhich precede the punishing event. Theclassically conditioned behavior is assumedto compete with the punished response andcause the suppression. Miller and Dollard(1941) exemplify the instrumental condi-tioning version of single-process theory.They suggested that any response which isassociated with the termination of thepunishing stimulus will be instrumentally

58

Page 2: PUNISHMENT: METHOD AND THEORY'steelekm/.../psy5300/Documents/Dunham1971-punishmen… · mental assumption suggested to account for 1 Research reporte d in thi s paper wa supporte

PUNISHMENT: METHOD AND THEORY 59

conditioned as a response which escapespain and competes directly with the punishedresponse.

Two-process punishment theories specifytwo different learning mechanisms which aresequentially involved in the development andmaintenance of the assumed alternative re-sponse. Dinsmoor (1954, 1955) andMowrer (1947) are formal examples of thetwo-process explanation of punishment sup-pression. Dinsmoor suggested that theproprioceptive stimulus feedback from thepunished response acquires secondary aver-sive properties via classical pairings with theprimary aversive event. Any responsewhich is instrumental in disrupting this chainof conditioned aversive stimulation will de-velop and be maintained as a response whichcompetes with the punished response.Hence, two learning mechanisms, first classi-cal then instrumental, are assumed to operatein the development of the alternative be-havior. Mowrer's (1947) version of two-process theory substitutes the notion of con-ditioned fear for the notion of aversivestimulation conditioned in the Dinsmoortheory.

There are two basic implications of anyversion of the alternative response theory.First, there is the implication that somealternative behavior will develop and bemaintained during punishment training.Second, there is the implication that thisalternative response causes the reductionin the punished response. Presumably theformer implication could be confirmed inde-pendent of the latter. But the latter couldnot be confirmed if the former were false.

In spite of the substantial amount of re-search on punishment in the last decade,there is no direct evidence to support eitherof the above implications of the alterna-tive-response assumption. As Azrin andHolz (1966) have stated, the typical pro-cedure in punishment research has beento infer the presence of the alternative re-sponse 'from the absence of the punishedresponse. A minimal requirement for test-ing the assumption is the measurement ofthe alternative response independent of thephenomenon which it seeks to explain.

Two reasons can be suggested for thelack of direct evidence bearing on the alter-native-response assumption. In the case ofsingle-process theories, contingencies arespecified which make the response elicitedby the punishing event a prime candidate forconditioning during the suppression of thepunished response. It is not a profound ob-servation to note that punishment procedureswhich have traditionally been employed in-volve organisms, aversive stimuli, and ap-paratus which make it difficult to measurethose responses which are suspected to par-ticipate in the relevant contingency. These"emotional" behaviors have not been re-corded on impulse counters and this has pre-vented the accumulation of any evidence con-cerning changes in their probability duringpunishment training. The major problemwith single-process versions of the alterna-tive-response assumption at this point wouldappear to be the lack of an adequate method-ology to test what would appear to be verytestable implications.

With respect to the two-process theories,the problem is more serious. As Schusterand Rachlin (1968) have suggested:

Because both the reinforcer and the response areunobserved and unobservable, the two factor theoryof punishment poses a serious problem for theexperimenter who wishes to test it: how can it bedisproved? All the critical events are assumed tooccur within the organism being punished [p. 784].

In addition to the specification of the criticalcontingencies "inside" the organism, itshould be noted that any increase in somealternative behavior observed during punish-ment can be taken as support for the opera-tion of the assumed internal contingency.Hence, the measurement of the unobservedbehavior referred to by Schuster and Rachlindoes not make the positions any more sus-ceptible to disproof.

The picture which emerges from this briefoverview of punishment theory and researchis that there has been a lack of interactionbetween punishment theory and punishmentdata. The lack of any data relevant to themost fundamental theoretical assumptionshas permitted the alternative response inter-pretations to persist as originally formulated.

Page 3: PUNISHMENT: METHOD AND THEORY'steelekm/.../psy5300/Documents/Dunham1971-punishmen… · mental assumption suggested to account for 1 Research reporte d in thi s paper wa supporte

60 PHILIP J. DUNHAM

In the next section of the discussion, amethodological approach to punishment willbe described which can serve two functionsin the context of the existing punishmentliterature. First, the methodology permitsus to examine some questions about theproperties of punishment which have notpreviously been considered. Second, itpermits us to examine some previously un-tested implications of existing punishmenttheory.

A Methodological Approach to Punishment

The methodological approach to be sug-gested is called a multiple-response base-lineprocedure. It can be applied to a variety ofproblems in addition to punishment, andwhen viewed in the context of traditionalmethodology, it falls between a typical oper-ant procedure, in which a single response isshaped under constraints, and typical etho-logical procedures, in which behavior is ob-served without external laboratory con-straints. The most convenient way to de-scribe the essential features of the methodol-ogy is to elaborate on a specific hypotheticalexample which is representative of severalobvious variations on the basic approach.The hypothetical example should be notedwith some care since it will be approximatedin reality when experimental evidence is sub-sequently discussed.

Consider a small animal chamber withgrid floor and three sources of enjoymentfor the small rodent commonly called a Mon-golian gerbil (Meriones unquiculates). Thethree items of interest in the chamber are afood bin with an unlimited supply of stan-dard Noyes pellets, a drinking tube withunlimited supply of water, and adding ma-chine paper which is threaded through aslot in the wall of the chamber.

The reader familiar with the behavior ofthis curious rodent will not be surprised tofind that the gerbil will spend much of ahalf-hour daily session in the chamber shred-ding the adding machine paper, eating thefood pellets, and drinking from the tube,in that order of preference. It is relativelyeasy to record the duration of each of thesethree behaviors during the session and con-vert these duration measures to a scale of

response probability by dividing the amountof time spent in a particular response stateby the total time possible (cf. Premack,1965). Any type of behavior can be mea-sured in terms of its duration, including theclass of behavior which is labeled "doingnothing." With the appropriate manipu-landa for a particular organism and controlof the relevant parameters, a multiple-re-sponse steady-state base line of behavior,in which several measurable responses areobserved to fill experimental time, can beestablished. No shaping or active manipula-tion of contingencies is assumed to be nec-essary. Assume, for the purposes of discus-sion, that our gerbil cooperated and filledthe half hour with drinking (p = .1), eating(p = .3), and paper shredding (p = .6).

Once the multiple-response base line hasstabilized, we are in the interesting positionof actively manipulating the organism's en-vironment and assessing the effects of suchmanipulations on all of the responses in therepertoire. Obviously, the manipulations ofmost interest in the present context are thosewhich we call punishment operations; how-ever, it is instructive to digress briefly andconsider a simple manipulation like makinga running wheel available. The most visibleeffect of making the running wheel availablewill be introduction of a running responseinto a repertoire of behavior which alreadyfills experimental time. If running occurs,by definition, there will be a decrement inthe observed probability of one or more ofthe existing responses in the repertoire.Curiously, there is little more than intui-tion to tell us how the organism will re-organize his response hierachy to accommo-date the running response. Will he sacrificea little bit of each of the existing behaviors ?Will he select one response and sacrifice itfor running privileges? If the latter, whatis the rule of response selection?

If an aversive stimulus like electric shockis introduced, one is faced with roughly thesame problem as that posed by the introduc-tion of the running wheel. Shock, as an un-conditioned stimulus, will define a certainprobability of unconditioned behavior whichmust be assimilated into a response hier-archy. By definition, some decrement in the

Page 4: PUNISHMENT: METHOD AND THEORY'steelekm/.../psy5300/Documents/Dunham1971-punishmen… · mental assumption suggested to account for 1 Research reporte d in thi s paper wa supporte

PUNISHMENT: METHOD AND THEORY 61

probability of one or more of the existingresponses must take place. Again, we arenot sure how the organism alters the existingpreference structure to accomodate this addi-tional behavior, but the rules which describesuch alterations would be of importance inpredicting the immediate effects of any va-riety of punishment operation on responsesin the repertoire.

The point to be made with the two pre-ceding examples is that some manipulationswhich can be used in the context of a multi-ple-response base line have the property ofadding a response to the existing repertoireof behavior. Punishment procedures are onesuch manipulation, and it is suggested thatprocedures be arranged which permit thea priori specification of the response whichwill be added and subsequent measurementof that response during punishment training.In this respect, the selection of the gerbilwas fortuitous. The response which we haveobserved to be associated with the introduc-tion of shock is a vigorous biting and chew-ing of the grid floor. As reliable, if notas desirable, is an aggressive attack onanother gerbil if the target animal is pro-vided.

When one considers the variety of waysin which an aversive event like electric shockcan be introduced into a multiple-responseprocedure, a number of empirical questionsare generated which have not received pre-vious experimental attention. A brief con-sideration of a few of these questions willillustrate the heuristic value of the multiple-response base-line procedure.

The traditional literature dealing withshock provides us with two major classesof operation which are called punishmentand avoidance, with variations on eachtheme. In the context of the multiple-re-sponse base-line procedure, an alternativeoperational dichotomy is suggested. Spe-cifically, the shock event can be deliveredaccording to a program which makes refer-ence to one or more responses in the reper-toire, or according to a program withoutreference to the organism's response reper-toire. The former includes traditional re-sponse-contingent punishment and Sidmanavoidance operations; the latter includes

various temporal schedules which arearranged independent of the organism's be-havior. In future discussion, the termsresponse contingent and noncontingent willbe used to describe these generic operationalcategories. Some measure of the compre-hensiveness of any theoretical framework isprovided by its ability to subsume data inboth operational categories. Several of theseoperations, most often employed in single-response operant research, reveal interest-ing new dimensions when considered in themultiple-response context.

Consider the response-contingent punish-ment operation where one response is se-lected from the repertoire and the onset ofthat response is followed immediately byan instance of shock. The suppressiveeffects of this operation on the referent be-havior are well known. However, the opera-tion defines a version of Sidman avoidancecontingency for all other responses in theorganism's repertoire, including the responseelicited by the shock event. In the gerbil ex-ample, response-contingent shock for eatingalso defines a very effective Sidman avoid-ance contingency for paper shredding. Themore time spent paper shredding, the longerthe interval between shocks and the fewershocks received. Does the gerbil adjust hisbehavior in a manner suggested by thecontingency ?

Consider a fixed-interval noncontingentshock procedure. For a given density ofshock, the probability of each response inthe organism's repertoire will determine thenumber of shocks which are associated withthat particular response. In the gerbil ex-ample, paper shredding occupies over halfthe experimental session, hence more shockswill arrive during paper shredding than dur-ing any other response in the repertoire.Of more interest, with shocks delivered atfixed intervals of time, that response whichmost consistently follows the shock eventwill most consistently predict the longestinterval of "safety." Again, one must ask ifthe organism changes his behavior accord-ing to these implicit contingencies defined bythe fixed-interval noncontingent schedule.A phenomenon reported initially by Morse,Mead, and Kelleher (1967) is perhaps rele-

Page 5: PUNISHMENT: METHOD AND THEORY'steelekm/.../psy5300/Documents/Dunham1971-punishmen… · mental assumption suggested to account for 1 Research reporte d in thi s paper wa supporte

62 PHILIP J. DUNHAM

vant. These investigators reported thatmonkeys which are exposed to fixed-inter-val noncontingent shock will aggressivelybite a rubber tube made available after eachshock is delivered. The postshock elicitedbiting behavior would be the response in theorganism's repertoire which most consist-ently predicts the longest interval of safety.As suggested by the preceding analysis, theanimal changes its biting behavior accord-ingly. The amount of biting behavior in-creases during extended punishment train-ing and eventually fills the shock-shockinterval with a "scallop" in the rate of bitingwhich appears at the end of the interval.

Consider also a Sidman avoidance pro-cedure with a constant shock-shock intervaland constant response-shock interval. Oncewe select a single response from the multi-ple-response base line as an avoidance re-sponse, all other responses in the repertoireare initially punished according to a non-contingent shock schedule, with the shortestpossible intershock interval being the Sid-man shock-shock interval. Which of theresponses in the organism's repertoire shouldone select for the avoidance response forefficient learning? It would seem obviousthat the response elicited by the shock wouldenjoy some advantages not enjoyed by theother responses in the repertoire. First, thatresponse will predict the longest safe intervalbetween shocks even if the avoidance con-tingency were not in effect. Second, it hasa relatively high probability during initialtraining sessions, hence it will sample animplicit avoidance contingency quite often.Third, if it is explicitly given the avoidanceproperty of delaying shock, there should befew responses which will compete with it forrapid learning. This suggestion is in linewith Holies' (1970) view that the organism'sinnate species-specific defense reactions(SSDRs) to an aversive stimulus are mostreadily learned as avoidance responses. Iwould suggest, however, that the rapid learn-ing has nothing to do with their "innatedefensive" properties. It is the property ofimmediately following the shock event whichoptimizes the conditions for avoidance learn-ing—that is, not only does the responseeventually reduce shock frequency, but it

initially predicts safety for a longer periodof time than any other response in theorganism's repertoire.

The converse situation can also be sug-gested. If we give the elicited responsethe initial advantage of consistently pre-dicting the longest safe interval of time, yetattempt to train a different response as anavoidance behavior, it would not be sur-prising to observe the animal attempting ini-tially to develop the elicited behavior to thedetriment of the avoidance response. Thiswould be particularly true in the case ofconstant shock-shock intervals (e.g., theMorse, Mead, & Kelleher, 1967, phenome-non), as opposed to variable shock-shockintervals where the predictive propertiesof the elicited response are less evident.This also leads to the suggestion that avariable shock-shock interval will producefaster learning of avoidance behavior whenthe avoidance response is other than theshock-elicited response (cf. Bolles & Popp,1964).

Finally, it is of some interest to considerthe variations in temporal schedule whichare possible with noncontingent shock. Wehave already discussed the fixed-intervalcase; now consider some implicit contingen-cies established by variable-interval cases.When programming shocks to occur at vary-ing intervals in time, the distribution ofintervals may be an important consideration.The typical variable-interval tape programis a rectangular distribution of intervals withsome guaranteed minimum interval. It istypically described in terms of its arithmeticmean and delivers the shocks with a randomsequence of intershock intervals. Of moreinterest is an exponential distribution ofintervals which will deliver the same num-ber of shocks at varying intervals, but is acontinuous distribution. The probability ofshock at any "moment" in time is a fixedvalue and the animal has no programmedinterval which is guaranteed to be safe. Intypical punishment studies where a singleoperant response is shaped, trained, andmeasured, the effects of shocks deliveredaccording to fixed intervals, rectangular dis-tributions of variable intervals, and exponen-tial distributions of variable intervals, may

Page 6: PUNISHMENT: METHOD AND THEORY'steelekm/.../psy5300/Documents/Dunham1971-punishmen… · mental assumption suggested to account for 1 Research reporte d in thi s paper wa supporte

PUNISHMENT: METHOD AND THEORY 63

be the same—the operant is suppressed.However, in the multiple-response proced-ure, very different effects may be found withthe three distributions. As mentioned ear-lier, the elicited response enjoys the advant-age of predicting a safe period most con-sistently in the fixed-interval shock schedule.This would be true to a lesser degree ofvariable-interval schedules with rectangulardistributions. The elicited response wouldmost consistently predict the minimum inter-val of safety programmed in that distri-bution. In the exponential distribution,however, no response consistently predictsa minimum safe interval. According to theprogram, shock is equally probable at anypoint in time. This removes those implicitcontingencies for certain responses which aremost evident in fixed-interval schedules.Again the question which must be asked iswhether or not the organism adjusts his be-havior according to the presence and absenceof these implicit contingencies.

The preceding examples are intended toillustrate two points: first, the heuristicvalue of the multiple-response analysis interms of generating testable questions; sec-ond, the sparsity of evidence relevant tothese questions. This sparsity is understoodwhen one recognizes the emphasis which hastraditionally been placed on single-responsemeasurement in both free operant and dis-crete-trial punishment research.

The discussion of the multiple-responsemethodology can be concluded with a briefconsideration of the predictions which single-process and two-process punishment theorieswould make in the context of the hypo-thetical gerbil procedure. With respect tosingle-process theory, the use of any typeof punishment operation will introduce thegrid-biting response into the organism's re-sponse repertoire. This grid-biting responseshould be a prime candidate for both classicalconditioning (Estes & Skinner, 1941) andinstrumental escape conditioning (Miller &Bollard, 1941) since it follows the shockevent more often than other responses. Thepredictions of the two-process versions of thealternative-response assumption must be con-sidered separately in terms of predictionsabout the effects of contingent and noncon-

tingent shock procedures. Basically, thetwo-process theories suggest that any re-sponse which disrupts the chain of dis-criminative stimulus conditions which pre-cede the shock event will increase in prob-ability and be maintained during punishmenttraining. In the case of response-contingentpunishment, these theories make the generalprediction that any response other than thepunished response might increase in prob-ability. They do not provide us with a re-sponse-selection rule to tell us if one or allof the unpunished responses increases. Inthe case of noncontingent shock operations,the two-process theories fail to make anymeaningful prediction. When a nonconting-ent shock operation is employed, every re-sponse in the organism's response repertoirewill be part of a discriminative chain of stim-uli which terminates with shock on someoccasions. Hence, it is impossible to suggestthat any response in the repertoire partici-pates in an instrumental contingency whichdisrupts the chain of conditioned aversivestimulation.

Some Rules for Prediction

The predictions which the single-processand two-process explanations of punishmentsuppression make in the context of the multi-ple-response methodology have been dis-cussed in the preceding section. Prior toconsidering some evidence, I would like tosuggest some alternative rules for predictingthe effects of a variety of punishment opera-tions on the various behaviors measured inthe multiple-response procedure. The rulesare, at this point, tentative, and a systematicanalysis of punishment in the context of amultiple-response base line may modify orcontraindicate these initial suggestions.

Once the immediate change in perform-ance caused by the introduction of an uncon-ditioned behavior into the repertoire hastaken place (shock is introduced), the or-ganism has a hierarchy of responses de-scribed in terms of response probability. Thetwo rules which attempt to predict the fateof each of the responses in the organism'srepertoire during subsequent aversive train-ing are the following: (a) That particularresponse in the organism's repertoire which

Page 7: PUNISHMENT: METHOD AND THEORY'steelekm/.../psy5300/Documents/Dunham1971-punishmen… · mental assumption suggested to account for 1 Research reporte d in thi s paper wa supporte

PHILIP J. DUNHAM

is most frequently associated with shockonset and/or predicts the onset of shockwithin a shorter time than other responseswill decrease in probability and remain be-low its operant base line. (&) That par-ticular response in the organism's repertoirewhich is most frequently associated with theabsence of shock onset and/or predicts theabsence of shock onset for a longer periodof time than other responses will increasein probability and remain above its operantbase line.

Consider the spirit in which these tworules are formulated. I am suggesting thatthere are two basic contingencies of import-ance in any operation for delivering theshock. First, there is an instrumentalpunishment contingency. It has two im-portant dimensions. A response can be morefrequently associated with shock onset thanother responses, or it can predict a givenfrequency of shock onset within a shorterperiod of time, or both. Thus, the two di-mensions of the contingency are frequencyand time. Second, there is an instrumentalavoidance contingency. It also has twoimportant dimensions. A response can bemore frequently associated with the absenceof shock onset, or can predict the absence ofshock onset for a longer period of time, orboth.

In traditional single-response operant me-thodology, the time not spent on the operantis usually assigned the label of "nonresponse"and is assumed to be a homogeneous mass ofbehaviors, (cf. Rachlin & Herrnstein, 1969).In spite of their tentative nature, the tworules described above should make it obviousthat it may be of more value to recognizethe "heterogeniety" of the nonreponse class—at least in terms of response probabilitydifferences. Further empirical analysis mayreveal such factors as the sequential depend-encies between different responses to be animportant determinant of the organism's ad-justment to aversive contingencies. Thiscriticism of the single-response operant ap-proach is equally applicable to the procedureswhich employ appetitive contingencies.

Some EvidenceOver the past year, the research com-

pleted by the author and his associates con-

sists of a series of successive approxima-tions to the multiple-response methodologywhich has been discussed throughout thispaper. Many of the thoughts developed inthe preceding discussion of punishment orig-inated with a serendipitous observation madewhile studying the effects of punishment onkey pecking in pigeons (Dunham et al.,1969). Dunham et al. observed that pigeonstrained to peck a response key for grain ona variable-interval schedule missed the keyand hit the wall area adjacent to the key witha certain number of pecks. The introductionof key-peck response-contingent punishmentsuppressed key pecking and increased thefrequency of off-key pecks during punish-ment. These results suggested that an un-punished, response, other than the shock-elicited response, will increase in probabilityand be maintained during punishment train-ing.

Subsequently, a more deliberate approxi-mation to the multiple-response methodologywas attempted, taking advantage of the phe-nomenon of schedule-induced polydipsia (cf.Falk, 1966) in order to establish two steady-state responses which occupied a large por-tion of experimental time.3 Falk (1966)reported that rats trained to lever press forfood pellets on a variable-interval schedulewill indulge in an excessive amount of drink-ing during an experimental session if adrinking tube is made freely available. UsingFalk's standard procedure, two rats weretrained one hour each day to the point ofpolydipsic drinking, and a mild .2-milliamp-ere, .5-second shock was introduced whichwas contingent on the lever press response(Subject 1) or on the drinking response(Subject 2). With the relatively mild shockintensity to minimize the probability of theunconditioned (and unmeasured) responseselicited by shock, it was assumed that drink-ing would be the most probable unpunishedalternative if lever pressing were punished,and that lever pressing would be the mostprobable unpunished alternative if drinkingwere punished. According to the rules out-lined in earlier discussion, the punished re-sponse was predicted to be suppressed, and

3 The author thanks Jon Little for his assistancewith this experiment.

Page 8: PUNISHMENT: METHOD AND THEORY'steelekm/.../psy5300/Documents/Dunham1971-punishmen… · mental assumption suggested to account for 1 Research reporte d in thi s paper wa supporte

PUNISHMENT: METHOD AND THEORY 65

the most probable unpunished alternative inthe response repertoire was predicted to in-crease in probability following some degreeof initial disruption by the addition of shock-elicited behavior to the response repertoire.

The results of the initial polydipsia train-ing phase, the first punishment trainingphase, a recovery phase, and a secondpunishment training phase are illustrated inFigure 1. During initial training, theanimals developed polydipsic levels of waterintake similar to those usually reported byFalk (1966). In the initial punishmentphase, Subject 1 was shocked for lever press-ing and Subject 2 was shocked for drinking.In both cases, the punished response wasobserved to be suppressed immediately andthe unpunished alternative behavior was ob-served, following initial disruption, to in-crease to levels which exceeded the estab-lished prepunishment base-line level. Theresults obtained with Subject 1 were verysurprising. In spite of the aberrant (poly-dipsic) base line of water intake prior topunishment, the response was observed toexceed that base line during punishment ses-sions. In the third phase of the procedure,the punishment contingencies were removedand recovery was observed for 14 sessions.With the removal of shock, all responses in-creased to levels even higher than precedingbase lines. This overshoot following re-moval of punishment is typical of the recov-ery of punished responses (cf. Azrin & Holz,1966). It is interesting to note that over-shoot occurs with both the punished and un-punished alternatives, which were measuredin the present procedure. Following the 14days of recovery, punishment contingencieswere again introduced—this time for the re-sponse which had not been punished duringthe first punishment phase. Basically thesame phenomena observed during the firstpunishment phase were observed during thesecond. The punished responses suppressedand the unpunished alternatives were dis-rupted and then increased in probability.Unfortunately the increase in drinking ob-served in Subject 2 did not consistentlyexceed the prepunishment base line. Thismay be the result of the ceiling problem onwater intake referred to earlier.

The final line of evidence from the au-thor's laboratory work to date was anattempt to approximate the hypothetical ger-bil procedure used as an example throughoutthis article. There were several reasons toextend experimentation beyond the case oftwo highly probable responses to three ormore measured responses. First, the tenta-tive rules specify that only one of the severalunpublished alternative responses will in-crease in probability during punishmenttraining. In order to test this prediction,more than one alternative must be measured.Second, a test of single-process theories ispossible only if the unconditioned responseintroduced by the shock event is measured.Third, an alternative approach to the datain the polydipsia experiment would be thesuggestion that the shock has some generalarousal property which increases all un-published responding. The demonstrationof a single-response increase would makesuch an arousal mechanism less appealingthan the type of contingency analysis sug-gested by the tentative rules.

To answer these questions, KennedyMuyesu-Kaisha Munavi conducted a bur-densome experiment in the author's labora-tory which was a very close approximationto the hypothetical gerbil procedure de-scribed earlier. Nine gerbils were randomlyassigned to one of three groups. Each ger-bil was placed in a small response chamberwith food bin, drinking tube, and addingmachine paper for a daily half-hour session.The total duration of each of the three be-haviors was recorded on elapsed-time metersby an observer looking at the animalsthrough a one-way glass window. When agerbil entered the chamber, it had been per-mitted access to only 80% of its normalad lib intake of laboratory chow during thepreceding 23^ hours, and the only waterpermitted was that available during the ex-perimental session. Under these conditions,the gerbils managed to fill approximately25 of the 30 minutes available each sessionwith paper-shredding, eating, and drinking,in that order of preference. It should benoted that records were kept of the amountof food and water consumed during base-line observations. Subsequently, if eating

Page 9: PUNISHMENT: METHOD AND THEORY'steelekm/.../psy5300/Documents/Dunham1971-punishmen… · mental assumption suggested to account for 1 Research reporte d in thi s paper wa supporte

66 PHILIP J. DUNHAM

b .§*:o _

Z 1^

a. _29;

1 3:ft

S-l pre- shock

D ___Q ̂ -*° ̂ "-0 — ° — D— °

S-2 pre* shock

y~"

shocklever

/shockdrink

recovery

o^._o— b— .-«

recovery

.7jr-*H3_D_»0_,_D^a

shock °— D leverdrink •— • drink

^o__0^— - -- *-o

shocklever

yx./-._x

2 4 6 8 10 12 14 16 18 20 22 24 26

TWO SESSION BLOCKSFIG. 1. Mean number of punished and unpunished responses plotted in two-session blocks

for Subject 1 (S-l) and Subject 2 (S-2) in the polydipsia procedure.

was the punished behavior, the animal waspermitted to make up any deficit in foodintake during the experimental session inits daily ration one hour after the experi-mental session. Hence, there was no con-founding of changes in either the total dailyfood intake or water intake with the intro-duction of the shock (cf. Dunham & Kilps,1969).

After a base line of three behaviors wasestablished, Group E was punished for eat-ing, Group P for paper shredding, andGroup D for drinking. The shock was a.2-milliampere, .S-second shock delivered atthe onset of the referent behavior and con-tinued at 2-second intervals until the referentresponse was stopped. At this point in theexperimentation with gerbils, it was notknown what elicited behavior or behaviorswould be introduced with this shock in-tensity. After running four animals throughthe first session of punishment it was obviousthat all animals were attacking the grid barsthrough which the shock was delivered. Atthis point a record was started of the dura-tion of grid biting as a response which con-sistently occurred with the shock in all nineanimals. Hence, we have the probability ofgrid biting during the first punishmentsession for only five of the nine subjects inthis initial experiment. After the first

session, grid biting was measured along withthe three other responses during each session.

Prior to examining the results of the ex-periment, it is of some value to consider thepredictions made by the rules when appliedto this situation: (a) the rules suggest thatthe response most frequently and immedi-ately associated with shock will decrease inprobability and remain below its operantbase line; it is the referent punished responsein the response-contingent procedure; (b)there will be an immediate disruption of allresponses in the repertoire immediatelyupon the introduction of the grid-biting re-sponse into the three-response repertoirewhich already nearly fills time; ( c ) afterthe initial disruption, the response in therepertoire which is most frequently asso-ciated with the absence of shock for thelongest period of time will increase in prob-ability to levels which exceed the prepunish-ment base line. This response should be thatresponse which is most probable on theassumption that the most probable behaviorwill sample the implicit avoidance conting-ency most often.

Subsidiary predictions are also implicit inthe rules. For example, if grid biting is notthe most probable unpunished alternative be-havior, it will drop out of the repertoire asthe shock frequency declines (punished re-

Page 10: PUNISHMENT: METHOD AND THEORY'steelekm/.../psy5300/Documents/Dunham1971-punishmen… · mental assumption suggested to account for 1 Research reporte d in thi s paper wa supporte

PUNISHMENT: METHOD AND THEORY 67

paper-shreddinggrid-biting

eatingdrinking

1 3 5 7 9 11 13TWO SESSION BLOCKS

FIG. 2. Probability of punished and unpunished responses for the three subjects in each of the threegroups in the gerbil procedure. (Note that all data points are two-session blocks with the exceptionof the first punishment session which is plotted as a single-session point.)

sponse suppressed). The latter suggestionis contrary to the predictions of the tradi-tional single-process punishment theories.If, however, the grid biting response is themost probable of the unpunished alternatives,it will be maintained even in the absence ofthe shock UCS during punishment training.

The results of the gerbil experiment foreach subject in each of the three groups arepresented in Figure 2.

Consider first the two results which arecommon to all three groups and which areof primary interest in terms of the conting-ency rules. First, in all cases, the instru-mental punishment contingency was suf-cient to suppress the referent behavior.Second, and of more interest, in all cases,only one of the alternative responses in the

repertoire was observed to increase in prob-ability to levels which exceeded the base-line probability. These two results are takenas support for the operation of the con-tingencies outlined in preceding discussion.

Consider next the results specific to eachgroup. Of particular interest in this con-text is the response which is selected to in-crease in probability during the punishmenttraining. The subjects in Group E werepunished for eating, and all three of thesubjects were observed to increase the prob-ability of paper shredding following an initialperiod of disruption. The subjects in GroupP were punished for paper shredding, andin all three subjects the probability of gridbiting increased during punishment training.Group D was punished for drinking and re-

Page 11: PUNISHMENT: METHOD AND THEORY'steelekm/.../psy5300/Documents/Dunham1971-punishmen… · mental assumption suggested to account for 1 Research reporte d in thi s paper wa supporte

68 PHILIP J. DUNHAM

vealed inconsistent results. Subject 1 andSubject 2 increased the probability of papershredding, while Subject 3 increased theprobability of grid biting during punishment.

To what extent are the responses observedto increase in each group predicted by thecontingency rules? The rules suggest thatthe response which most frequently samplesthe avoidance contingency in early sessionswill be the most probable of the safe alterna-tives. In spite of the fact that the probabilityof the grid-biting response was not measuredduring the first session in four of the animalsstudied, it is safe to assume that two re-sponses were the most probable unpunishedbehaviors during early punishment sessions.These are grid-biting and paper-shreddingresponses. Both would sample the avoid-ance contingency more frequently, and thegrid-biting response which followed the shockevent would predict the absence of shock fora longer period of time more consistentlythan the other responses which were notsequentially dependent on the shock event.For these reasons, we would expect all sub-jects to reveal an increase in either gridbiting or paper shredding. However, therewere some instances in which grid biting wasinitially more probable than paper shredding,yet paper shredding was observed to increaseto levels above base line as grid biting drop-ped out (e.g., Group E, Subject 1). Fur-ther research, in which more precise mea-surements are obtained of the degree towhich various responses sample both thefrequency and time dimensions of the twocontingencies, will provide a more criticaltest of the suggested rules and perhaps sug-gest refinements.

In addition to suggesting that the me-thodology and the contingency analysis aretractable approaches to the punishment prob-lem, the gerbil data question the validity ofthe single-process versions of the alterna-tive-response assumption. Stated quitesimply, those cases in the present situationwhich revealed a gradual decline in the prob-ability of grid biting over the course ofpunishment training are exactly the op-posite to changes expected from traditionalsingle-process assumptions. This is con-sistent with observations of shock-associated

behaviors in other situations (cf. Dunham,et al., 1969). The emotional behaviors gen-erally drop out within a very few sessionsof response-contingent punishment training.Of course, this decrement in grid-biting be-havior would not be expected in the caseof noncontingent punishment in the sensethat the UCS is maintained at a particularfrequency independent of the changes in be-havior.

The data also have implications for thetwo-process interpretations of punishmentsuppression. Basically, the very generalprediction that some alternative behavior willincrease is supported. It is questionable,however, to think that the increase observedin the alternative behaviors observed in thepresent experiments could account for thesuppression in the punished response. Thetime course of the two transition processesare very different. The punished behavioris suppressed very rapidly relative to theinitial disruption and slow rise of the alterna-tive behavior.

The preceding evidence involves the re-sponse-contingent punishment operation.The author has not yet started an empiricalanalysis of the noncontingent punishmentprocedures. There are, however, severalstudies in the literature which are relevantto the predictions made by the rules dis-cussed in this article as applied to noncon-tingent procedures. As indicated in earlierdiscussion, the organism cannot, by defini-tion, reduce the shock frequency when anoncontingent operation is employed. Thisleaves the temporal dimension of the con-tingency in an important role. The responsewhich consistently follows the shock eventwill predict the absence of the next shockonset for a longer length of time than anyother response in the repertoire. The phe-nomenon reported by Morse et al. (1967) inwhich an elicited behavior develops andmodulates under fixed-interval nonconting-ent shock suggests that the organism issensitive to the temporal dimension of theavoidance contingency. More recent evi-dence reported by Powell and Creer (1969)can be subjected to a similar interpretation.These investigators delivered noncontingentshock at the rate of 100 shocks per session

Page 12: PUNISHMENT: METHOD AND THEORY'steelekm/.../psy5300/Documents/Dunham1971-punishmen… · mental assumption suggested to account for 1 Research reporte d in thi s paper wa supporte

PUNISHMENT: METHOD AND THEORY 69

to pairs of rats and recorded the frequencyof aggressive attacks per session. The au-thors did not specify the intershock intervalor the session length in their procedure.Ten successive days of noncontingent shocksessions were conducted and measures ofaggressive behavior indicated that an in-crease in the amount of aggression occurredover the course of the experiment. Althoughthe experiment was designed to determinethe effects of maturation (among otherthings) on aggressive behavior, it is sug-gested that the 10 days of maturation wereconfounded with 10 days of avoidance train-ing during the experiment. Specifically,the aggressive behavior of these rats pre-dicted the absence of shock onset for alonger period of time than any of the other:(unmeasured) behaviors in the organism'srepertoire. Similar to the monkeys in theMorse et al. procedure, the rats in thePowell and Creer experiment are assumedto have recognized this avoidance contin-gency and adjusted the behavior appro-priately.

Assuming for the moment that the avoid-ance interpretation of the Powell and Creerexperiment is correct, the present rules haveimplications for the development of aggres-sion; namely, that certain stimuli will elicitaggressive behaviors, and under a varietyof procedures used to deliver the stimuli,the aggressive behavior will develop as anavoidance response and be maintained inthe absence of the primary UCS. Underother conditions, the avoidance contingencyis not present and one should observe ag-gression at the unconditioned level. I knowof no theoretical account of aggression whichimplicates an avoidance contingency in thedevelopment and maintenance of aggressionin animals. Additional work with temporalschedules of aversive stimulation whJch elim-inate this avoidance contingency should re-veal the extent to which it is involved in theconditioning of the aggressive behavior ofvarious species.

Some Concluding RemarksIn the absence of more data, it would be

premature to attempt to elaborate on thetype of theoretical mechanism which is im-

plied by the empirical rules which have beenoutlined in the preceding discussion. Iwould like, however, to conclude by suggest-ing briefly the general directions which onemight take in the development of such amechanism. First, the data and argumentswhich have been discussed suggest verystrongly that there is little to recommendthose traditional theoretical accounts ofpunishment which start from an alternativeresponse assumption. In place of thisassumption, the rules which have been out-lined imply a symmetrical conditioning me-chanism in which (a) those responses whichpredict the aversive event are actively inhi-bited and (b) those responses which predictthe absence of the aversive event are activelyexcited.

History provides at least two examples ofsymmetrical excitatory and inhibitory me-chanisms which deal with appetitive andaversive events. If one is biased towardPavlovian conditioning mechanisms, it ispossible to conceptualize the multiple-re-sponse methodology as a multiple-CS me-thodology in which different responses areviewed as different CSs which predict, tovarying degrees, the presence or the absenceof the unconditioned stimulus (aversive orappetitive). Once conceptualized in termsof Pavlovian operations, some version ofPavlov's (1927) concepts of inhibition andexcitation can be developed to explain theeffects of punishment on performance.Konorski's (1967) inhibitory and excitatoryprocesses and some recent modifications ofthese ideas (e.g., Maier, Seligman, & Solo-mon, 1969) should, for example,, be con-sidered in the context of the multiple-re-sponse punishment procedure.*

4 A basic assumption in Konorski's theorizing isthe notion that noxious stimuli such as shock elicita drive state, fear, which has general inhibitoryeffects on "preservative" drive states such as hun-ger, thirst, etc. Punishment would be predicted tohave suppressive effect on all responses motivatedby "preservative" drive states. In the experimentsconducted thus far, the inhibitory effects whichwe have observed are specific to the punishedresponse. For example, when the gerbils werepunished for eating they continued to drink theusual amount each session. Hence, if Konorski'sinhibition mechanism were to be considered in thecontext of punishment, the inhibitory properties

Page 13: PUNISHMENT: METHOD AND THEORY'steelekm/.../psy5300/Documents/Dunham1971-punishmen… · mental assumption suggested to account for 1 Research reporte d in thi s paper wa supporte

70 PHILIP J. DUNHAM

Alternatively, if one is biased toward in-strumental conditioning mechanisms, it ispossible to conceptualize the multiple-re-sponse methodology in terms of instrumentalcontingencies in which each response haseither rewarding or punishing consequences.Once conceptualized in terms of instrumentaloperations, some version of Thorndike's(1913) positive and negative Law of Effectcan be elaborated upon as a symmetrical re-inforcement mechanism, Rachlin andHerrnstein (1969) have recently made sucha suggestion.

In either case, it is hoped that the multi-ple-response base-line procedure will help torestore the interaction between punishmenttheory and punishment data which has beenlacking in the punishment literature.

of shock would have to be more "drive" or "re-sponse" specific than he has assumed. Estes (1969,p. 69), in his more recent theorizing, makes anassumption very similar to Konorski's. Specifically,he suggests that the activation of negative drivesystems (e.g., shock-produced attack or flight) re-sults in the inhibition of positive drive systems,which, in turn, accounts for the decrement inpunished responding. Again, it should be notedthat the gerbil data suggest that the only positivedrive system which appears to be permanentlyinhibited during punishment is the specificallypunished drive system—or its associated response.

REFERENCES

AZRIN, N. H., & HOLZ, W. C. Punishment. InW. K. Honig (Ed.), Operant behavior: Areasof research and applications. New York: Ap-pleton-Century-Crofts, 1966.

BOLLES, R. C. Species-specific defense reactionsin avoidance learning. Psychological Review,1970, 77, 32-48.

BOLLES, R. C., & POPP, R. J., JR. Parametersaffecting acquisition of Sidman avoidance. Jour-nal of the Experimental Analysis of Behavior,1964, 7, 315-321.

DINSMOOR, J. A. Punishment: I. The avoidancehypothesis. Psychological Review, 1954, 61, 34-46.

DINSMOOR, J. A. Punishment: II. An interpre-tation of empirical findings. Psychological Re-view, 1955, 62, 96-105.

DUNHAM, P. J., & KILPS, B, Shifts in magnitudeof reinforcement: Confounded factors or contrasteffects. Journal of Experimental Psychology,1969, 79, 373-374.

DUNHAM, P. J., MARINER, A., & ADAMS, H. En-hancement of off-key pecking by on-key punish-

ment. Journal of the Experimental Analysis ofBehavior, 1969, 1, 156-166.

ESTES, W. K. Outline of a theory of punish-ment. In B. A. Campbell & R. M. Church(Eds.), Punishment and amrsive behavior.New York: Appleton-Century-Crofts, 1969.

ESTES, W. K., & SKINNER, B. F. Some quantita-tive properties of anxiejty. Journal of Experi-mental Psychology, 1941, 29, 390-400.

FALK, J. L. The motivational properties of sched-ule induced polydipsia. Journal of the Experi-mental Analysis of Behavior, 1966, 9, 19-25.

KONORSKI, J. Integrative activity of the brain.An interdisciplinary approach. Chicago: Uni-versity of Chicago Press, 1967.

MAIER, S. F., SELIGMAN, M. E. P., & SOLOMON,R. L. Pavlovian fear conditioning and learnedhelplessness. In B. A. Campbell & R. M. Church(Eds.), Punishment and aversive behavior. NewYork: Appleton-Century-Crofts, 1969.

MILLER, N. E., & DOLLARD, J. Social learning andimitation. New Haven: Yale University Press,1941.

MORSE, W. H., MEAD, R. N., & KELLEHER, R. T.Modulation of elicited behavior by a fixed-in-terval schedule of electric shock presentation.Science, 1967, 1S7, 215-217.

MOWRER, O. H. On the dual nature of learninga reinterpretation of "conditioning" and "prob-lem-solving." Harvard Educational Review,1947, 17, 102-148.

PAVLOV, I. V. Conditioned reflexes. (Trans, byG. V. Anrep) New York: Dover, 1927.

POWELL, D. A., & CREER, T. L. Interaction of de-velopmental and environmental variables in shock-elicited aggression. Journal of Comparative andPhysiological Psychology, 1969, 69, 219-226.

PREMACK, D. Reinforcement theory. NebraskaSymposium on Motivation, 1965, 13, 123-188.

RACHLIN, H., & HERRNSTEIN, R. J. Hedonism re-visited: On the negative law of effect. In B. A.Campbell & R. M. Church (Eds.), Punishmentand aversive behavior. New York: Appleton-Century-Crofts, 1969.

SCHUSTER, R., & RACHLIN, H. Indifference be-tween punishment and free shock: Evidence forthe negative law of effect. Journal of the Ex-perimental Analysis of Behavior, 1968, 11, 777-786.

SOLOMON, R. L. Punishment. American Psycho-logist, 1964, 19, 239-253.

THORNDIKE, E. L. Educational psychology. Vol.2. The psychology of learning. New York:Columbia University, Teacher's College, Bureauof Publications, 1913.

THORNDIKE, E. L. The fundamentals of learning.New York: Columbia University, Teacher'sCollege, Bureau of Publications, 1932.

(Received April 21, 1970)