17
VOL. 63, No. 1 MARCH, 1956 THE PSYCHOLOGICAL REVIEW THE MAGICAL NUMBER SEVEN, PLUS OR MINUS TWO: SOME LIMITS ON OUR CAPACITY FOR PROCESSING INFORMATION 1 GEORGE A. MILLER Harvard University My problem is that I have been perse- cuted by an integer. For seven years this number has followed me around, has intruded in my most private data, and has assaulted me from the pages of our most public journals. This number as- sumes a variety of disguises, being some- times a little larger and sometimes a little smaller than usual, but never changing so much as to be unrecogniz- able. The persistence with which this number plagues me is far more than a random accident. There is, to quote a famous senator, a design behind it, some pattern governing its appearances. Either there really is something unusual about the number or else I am suffering from delusions of persecution. I shall begin my case history by tell- ing you about some experiments that tested how accurately people can assign numbers to the magnitudes of various aspects of a stimulus. In the tradi- tional language of psychology these would be called experiments in absolute 1 This paper was first read as an Invited Address before the Eastern Psychological As- sociation in Philadelphia on April IS, 19SS. Preparation of the paper was supported by the Harvard Psycho-Acoustic Laboratory un- der Contract NSori-76 between Harvard Uni- versity and the Office of Naval Research, U. S. Navy (Project NR142-201, Report PNR-174). Reproduction for any purpose of the U. S. Government is permitted. judgment. Historical accident, how- ever, has decreed that they should have another name. We now call them ex- periments on the capacity of people to transmit information. Since these ex- periments would not have been done without the appearance of information theory on the psychological scene, and since the results are analyzed in terms of the concepts of information theory, I shall have to preface my discussion with a few remarks about this theory. INFORMATION MEASUREMENT The "amount of information" is ex- actly the same concept that we have talked about for years under the name of "variance." The equations are dif- ferent, but if we hold tight to the idea that anything that increases the vari- ance also increases the amount of infor- mation we cannot go far astray. The advantages of this new way of talking about variance are simple enough. Variance is always stated in terms of the unit of measurement— inches, pounds, volts, etc.—whereas the amount of information is a dimension- less quantity. Since the information in a discrete statistical distribution does not depend upon the unit of measure- ment, we can extend the concept to situations where we have no metric and we would not ordinarily think of using 81

Miller 1956 the Magical Number Seven, Plus or Minus Two

Embed Size (px)

Citation preview

  • VOL. 63, No. 1 MARCH, 1956

    THE PSYCHOLOGICAL REVIEWTHE MAGICAL NUMBER SEVEN, PLUS OR MINUS TWO:

    SOME LIMITS ON OUR CAPACITY FORPROCESSING INFORMATION 1

    GEORGE A. MILLERHarvard University

    My problem is that I have been perse-cuted by an integer. For seven yearsthis number has followed me around, hasintruded in my most private data, andhas assaulted me from the pages of ourmost public journals. This number as-sumes a variety of disguises, being some-times a little larger and sometimes alittle smaller than usual, but neverchanging so much as to be unrecogniz-able. The persistence with which thisnumber plagues me is far more thana random accident. There is, to quotea famous senator, a design behind it,some pattern governing its appearances.Either there really is something unusualabout the number or else I am sufferingfrom delusions of persecution.

    I shall begin my case history by tell-ing you about some experiments thattested how accurately people can assignnumbers to the magnitudes of variousaspects of a stimulus. In the tradi-tional language of psychology thesewould be called experiments in absolute

    1This paper was first read as an InvitedAddress before the Eastern Psychological As-sociation in Philadelphia on April IS, 19SS.Preparation of the paper was supported bythe Harvard Psycho-Acoustic Laboratory un-der Contract NSori-76 between Harvard Uni-versity and the Office of Naval Research, U. S.Navy (Project NR142-201, Report PNR-174).Reproduction for any purpose of the U. S.Government is permitted.

    judgment. Historical accident, how-ever, has decreed that they should haveanother name. We now call them ex-periments on the capacity of people totransmit information. Since these ex-periments would not have been donewithout the appearance of informationtheory on the psychological scene, andsince the results are analyzed in termsof the concepts of information theory,I shall have to preface my discussionwith a few remarks about this theory.

    INFORMATION MEASUREMENTThe "amount of information" is ex-

    actly the same concept that we havetalked about for years under the nameof "variance." The equations are dif-ferent, but if we hold tight to the ideathat anything that increases the vari-ance also increases the amount of infor-mation we cannot go far astray.

    The advantages of this new wayof talking about variance are simpleenough. Variance is always stated interms of the unit of measurementinches, pounds, volts, etc.whereas theamount of information is a dimension-less quantity. Since the information ina discrete statistical distribution doesnot depend upon the unit of measure-ment, we can extend the concept tosituations where we have no metric andwe would not ordinarily think of using

    81

  • GEORGE A. MILLER

    the variance. And it also enables us tocompare results obtained in quite dif-ferent experimental situations where itwould be meaningless to compare vari-ances based on different metrics. Sothere are some good reasons for adopt-ing the newer concept.

    The similarity of variance and amountof information might be explained thisway: When we have a large variance,we are very ignorant about what is go-ing to happen. If we are very ignorant,then when we make the observation itgives us a lot of information. On theother hand, if the variance is very small,we know in advance how our observa-tion must come out, so we get little in-formation from making the observation.

    If you will now imagine a communi-cation system, you will realize thatthere is a great deal of variability aboutwhat goes into the system and also agreat deal of variability about whatcomes out. The input and the outputcan therefore be described in terms oftheir variance (or their information).If it is a good communication system,however, there must be some system-atic relation between what goes in andwhat comes out. That is to say, theoutput will depend upon the input, orwill be correlated with the input. If wemeasure this correlation, then we cansay how much of the output variance isattributable to the input and how muchis due to random fluctuations or "noise"introduced by the system during trans-mission. So we see that the measureof transmitted information is simply ameasure of the input-output correlation.

    There are two simple rules to follow.Whenever I refer to "amount of in-formation," you will understand "vari-ance." And whenever I refer to "amountof transmitted information," you willunderstand "covariance" or "correla-tion."

    The situation can be described graphi-cally by two partially overlapping cir-

    cles. Then the left circle can be takento represent the variance of the input,the right circle the variance of the out-put, and the overlap the covariance ofinput and output. I shall speak of theleft circle as the amount of input infor-mation, the right circle as the amountof output information, and the overlapas the amount of transmitted informa-tion.

    In the experiments on absolute judg-ment, the observer is considered to bea communication channel. Then theleft circle would represent the amountof information in the stimuli, the rightcircle the amount of information in hisresponses, and the overlap the stimulus-response correlation as measured by theamount of transmitted information. Theexperimental problem is to increase theamount of input information and tomeasure the amount of transmitted in-formation. If the observer's absolutejudgments are quite accurate, thennearly all of the input information willbe transmitted and will be recoverablefrom his responses. If he makes errors,then the transmitted information maybe considerably less than the input. Weexpect that, as we increase the amountof input information, the observer willbegin to make more and more errors;we can test the limits of accuracy of hisabsolute judgments. If the human ob-server is a reasonable kind of communi-cation system, then when we increasethe amount of input information thetransmitted information will increase atfirst and will eventually level off at someasymptotic value. This asymptotic valuewe take to be the channel capacity ofthe observer: it represents the greatestamount of information that he can giveus about the stimulus on the basis ofan absolute judgment. The channel ca-pacity is the upper limit on the extentto which the observer can match his re-sponses to the stimuli we give him.

    Now just a brief word about the bit

  • THE MAGICAL NUMBER SEVEN 83

    and we can begin to look at some data.One bit of information is the amount ofinformation that we need to make adecision between two equally likely al-ternatives. If we must decide whethera man is less than six feet tall or morethan six feet tall and if we know thatthe chances are SO-SO, then we needone bit of information. Notice thatthis unit of information does not referin any way to the unit of length thatwe usefeet, inches, centimeters, etc.However you measure the man's height,we still need just one bit of information.

    Two bits of information enable us todecide among four equally likely alter-natives. Three bits of information en-able us to decide among eight equallylikely alternatives. Four bits of infor-mation decide among 16 alternatives,five among 32, and so on. That is tosay, if there are 32 equally likely alter-natives, we must make five successivebinary decisions, worth one bit each, be-fore we know which alternative is cor-rect. So the general rule is simple:every time the number of alternativesis increased by a factor of two, one bitof information is added.

    There are two ways we might in-crease the amount of input information.We could increase the rate at which wegive information to the observer, so thatthe amount of information per unit timewould increase. Or we could ignore thetime variable completely and increasethe amount of input information byincreasing the number of alternativestimuli. In the absolute judgment ex-periment we are interested in the secondalternative. We give the observer asmuch time as he wants to make his re-sponse; we simply increase the numberof alternative stimuli among which hemust discriminate and look to see whereconfusions begin to occur. Confusionswill appear near the point that we arecalling his "channel capacity."

    ABSOLUTE JUDGMENTS OF UNI-DIMENSIONAL STIMULI

    Now let us consider what happenswhen we make absolute judgments oftones. Pollack (17) asked listeners toidentify tones by assigning numerals tothem. The tones were different with re-spect to frequency, and covered therange from 100 to 8000 cps in equallogarithmic steps. A tone was soundedand the listener responded by giving anumeral. After the listener had madehis response he was told the correctidentification of the tone.

    When only two or three tones wereused the listeners never confused them.With four different tones confusionswere quite rare, but with five or moretones confusions were frequent. Withfourteen different tones the listenersmade many mistakes.

    These data are plotted in Fig. 1.Along the bottom is the amount of in-put information in bits per stimulus.As the number of alternative tones wasincreased from 2 to 14, the input infor-mation increased from 1 to 3.8 bits. Onthe ordinate is plotted the amount of

    a 2UJ

    enz

    _2.5BITS

    PITCHES100-8000 CPS

    0 1 2 3 4 5

    INPUT INFORMATION

    FIG. 1. Data from Pollack (17, 18) on theamount of information that is transmitted bylisteners who make absolute judgments ofauditory pitch. As the amount of input in-formation is increased by increasing from 2to 14 the number of different pitches to bejudged, the amount of transmitted informa-tion approaches as its upper limit a channelcapacity of about 2.S bits per judgment.

  • 84 GEORGE A. MILLER

    transmitted information. The amountof transmitted information behaves inmuch the way we would expect a com-munication channel to behave; the trans-mitted information increases linearly upto about 2 bits and then bends off to-ward an asymptote at about 2.5 bits.This value, 2.5 bits, therefore, is whatwe are calling the channel capacity ofthe listener for absolute judgments ofpitch.

    So now we have the number 2.5bits. What does it mean? First, notethat 2.5 bits corresponds to about sixequally likely alternatives. The resultmeans that we cannot pick more thansix different pitches that the listener willnever confuse. Or, stated slightly dif-ferently, no matter how many alterna-tive tones we ask him to judge, the bestwe can expect him to do is to assignthem to about six different classes with-out error. Or, again, if we know thatthere were N alternative stimuli, thenhis judgment enables us to narrow downthe particular stimulus to one out ofN/6.

    Most people are surprised that thenumber is as small as six. Of course,there is evidence that a musically so-phisticated person with absolute pitchcan identify accurately any one of 50or 60 different pitches. Fortunately, Ido not have time to discuss these re-markable exceptions. I say it is for-tunate because I do not know how toexplain their superior performance. SoI shall stick to the more pedestrian factthat most of us can identify about oneout of only five or six pitches before webegin to get confused.

    It is interesting to consider that psy-chologists have been using seven-pointrating scales for a long time, on theintuitive basis that trying to rate intofiner categories does not really add muchto the usefulness of the ratings. Pol-lack's results indicate that, at least forpitches, this intuition is fairly sound.

    1 2 3 4 5INPUT INFORMATION

    FIG. 2. Data from Garner (7) on the chan-nel capacity for absolute judgments of audi-tory loudness.

    Next you can ask how reproduciblethis result is. Does it depend on thespacing of the tones or the various con-ditions of judgment? Pollack variedthese conditions in a number of ways.The range of frequencies can be changedby a factor of about 20 without chang-ing the amount of information trans-mitted more than a small percentage.Different groupings of the pitches de-creased the transmission, but the losswas small. For example, if you candiscriminate five high-pitched tones inone series and five low-pitched tones inanother series, it is reasonable to ex-pect that you could combine all ten intoa single series and still tell them allapart without error. When you try it,however, it does not work. The chan-nel capacity for pitch seems to be aboutsix and that is the best you can do.

    While we are on tones, let us looknext at Garner's (7) work on loudness.Garner's data for loudness are sum-marized in Fig. 2. Garner went to sometrouble to get the best possible spacingof his tones over the intensity rangefrom 15 to 110 db. He used 4, 5, 6, 7,10, and 20 different stimulus intensities.The results shown in Fig. 2 take intoaccount the differences among subjectsand the sequential influence of the im-mediately preceding judgment. Againwe find that there seems to be a limit.

  • THE MAGICAL NUMBER SEVEN 85

    TASTJUDGMENTS OF SALINE

    CONCENTRATION

    1 2 3 4

    INPUT INFORMATION

    FIG. 3. Data from Beebe-Center, Rogers,and O'Connell (1) on the channel capacity forabsolute judgments of saltiness.

    The channel capacity for absolute judg-ments of loudness is 2.3 bits, or aboutfive perfectly discriminable alternatives.

    Since these two studies were done indifferent laboratories with slightly dif-ferent techniques and methods of analy-sis, we are not in a good position toargue whether five loudnesses is signifi-cantly different from six pitches. Prob-ably the difference is in the right direc-tion, and absolute judgments of pitchare slightly more accurate than absolutejudgments of loudness. The importantpoint, however, is that the two answersare of the same order of magnitude.

    The experiment has also been donefor taste intensities. In Fig. 3 are theresults obtained by Beebe-Center, Rog-ers, and O'Connell (1) for absolutejudgments of the concentration of saltsolutions. The concentrations rangedfrom 0.3 to 34.7 gm. NaCl per 100cc. tap water in equal subjective steps.They used 3, 5, 9, and 17 different con-centrations. The channel capacity is1.9 bits, which is about four distinctconcentrations. Thus taste intensitiesseem a little less distinctive than audi-tory stimuli, but again the order ofmagnitude is not far off.

    On the other hand, the channel ca-pacity for judgments of visual positionseems to be significantly larger. Hake

    and Garner (8) asked observers to in-terpolate visually between two scalemarkers. Their results are shown inFig. 4. They did the experiment intwo ways. In one version they let theobserver use any number between zeroand 100 to describe the position, al-though they presented stimuli at only5, 10, 20, or SO different positions. Theresults with this unlimited responsetechnique are shown by the filled circleson the graph. In the other version theobservers were limited in their re-sponses to reporting just those stimu-lus values that were possible. That isto say, in the second version the num-ber of different responses that the ob-server could make was exactly the sameas the number of different stimuli thatthe experimenter might present. Theresults with this limited response tech-nique are shown by the open circles onthe graph. The two functions are sosimilar that it seems fair to concludethat the number of responses availableto the observer had nothing to do withthe channel capacity of 3.2S bits.

    The Hake-Garner experiment has beenrepeated by Coonan and Klemmer. Al-though they have not yet publishedtheir results, they have given me per-mission to say that they obtained chan-nel capacities ranging from 3.2 bits for

    -3.25. BITS

    8z

    100

    0 1 2 3 4 5 6

    INPUT INFORMATION

    FIG. 4. Data from Hake and Garner (8)on the channel capacity for absolute judg-ments of the position of a pointer in a linearinterval.

  • 86 GEORGE A. MILLER

    very short exposures of the pointer po-sition to 3.9 bits for longer exposures.These values are slightly higher thanHake and Garner's, so we must con-clude that there'are between 10 and ISdistinct positions along a linear inter-val. This is the largest channel ca-pacity that has been measured for anyunidimensional variable.

    At the present time these four experi-ments on absolute judgments of simple,unidimensional stimuli are all that haveappeared in the psychological journals.However, a great deal of work on otherstimulus variables has not yet appearedin the journals. For example, Eriksenand Hake (6) have found that thechannel capacity for judging the sizesof squares is 2.2 bits, or about fivecategories, under a wide range of ex-perimental conditions. In a separateexperiment Eriksen (5) found 2.8 bitsfor size, 3.1 bits for hue, and 2.3 bitsfor brightness. Geldard has measuredthe channel capacity for the skin byplacing vibrators on the chest region.A good observer can identify about fourintensities, about five durations, andabout seven locations.

    One of the most active groups in thisarea has been the Air Force OperationalApplications Laboratory. Pollack hasbeen kind enough to furnish me withthe results of their measurements forseveral aspects of visual displays. Theymade measurements for area and forthe curvature, length, and direction oflines. In one set of experiments theyused a very short exposure of the stimu-lus%0 secondand then they re-peated the measurements with a 5-second exposure. For area they got2.6 bits with the short exposure and2.7 bits with the long exposure. Forthe length of a line they got about 2.6bits with the short exposure and about3.0 bits with the long exposure. Direc-tion, or angle of inclination, gave 2.8bits for the short exposure and 3.3 bits

    for the long exposure. Curvature wasapparently harder to judge. When thelength of the arc was constant, the re-sult at the short exposure duration was2.2 bits, but when the length of thechord was constant, the result was only1.6 bits. This last value is the lowestthat anyone has measured to date. Ishould add, however, that these valuesare apt to be slightly too low becausethe data from all subjects were pooledbefore the transmitted information wascomputed.

    Now let us see where we are. First,the channel capacity does seem to be avalid notion for describing human ob-servers. Second, the channel capacitiesmeasured for these unidimensional vari-ables range from 1.6 bits for curvatureto 3.9 bits for positions in an interval.Although there is no question that thedifferences among the variables are realand meaningful, the more impressivefact to me is their considerable simi-larity. If I take the best estimates Ican get of the channel capacities for allthe stimulus variables I have mentioned,the mean is 2.6 bits and the standarddeviation is only 0.6 bit. In terms ofdistinguishable alternatives, this meancorresponds to about 6.5 categories, onestandard deviation includes from 4 to10 categories, and the total range isfrom 3 to IS categories. Consideringthe wide variety of different variablesthat have been studied, I find this tobe a remarkably narrow range.

    There seems to be some limitationbuilt into us either by learning or bythe design of our nervous systems, alimit that keeps our channel capacitiesin this general range. On the basis ofthe present evidence it seems safe tosay that we possess a finite and rathersmall capacity for making such unidi-mensional judgments and that this ca-pacity does not vary a great deal fromone simple sensory attribute to another.

  • THE MAGICAL NUMBER SEVEN 87

    ABSOLUTE JUDGMENTS OF MULTI-DIMENSIONAL STIMULI

    You may have noticed that I havebeen careful to say that this magicalnumber seven applies to one-dimensionaljudgments. Everyday experience teachesus that we can identify accurately anyone of several hundred faces, any oneof several thousand words, any one ofseveral thousand objects, etc. The storycertainly would not be complete if westopped at this point. We must havesome understanding of why the one-dimensional variables we judge in thelaboratory give results so far out ofline with what we do constantly in ourbehavior outside the laboratory. A pos-sible explanation lies in the number ofindependently variable attributes of thestimuli that are being judged. Objects,faces, words, and the like differ fromone another in many ways, whereas thesimple stimuli we have considered thusfar differ from one another in only onerespect.

    Fortunately, there are a few data onwhat happens when we make absolutejudgments of stimuli that differ fromone another in several ways. Let uslook first at the results Klemmer andFrick (13) have reported for the abso-lute judgment of the position of a dotin a square. In Fig. 5 we see their re-

    POINTS IN A SQUARENO GRID

    .03 SEC. EXPOSURE

    3 4 5 6 7INPUT INFORMATION

    FIG. S. Data from Klemmer and Frick (13)on the channel capacity for absolute judg-ments of the position of a dot in a square.

    suits. Now the channel capacity seemsto have increased to 4.6 bits, whichmeans that people can identify accu-rately any one of 24 positions in thesquare.

    The position of a dot in a square isclearly a two-dimensional proposition.Both its horizontal and its vertical po-sition must be identified. Thus it seemsnatural to compare the 4.6-bit capacityfor a square with the 3.25-bit capacityfor the position of a point in an inter-val. The point in the square requirestwo judgments of the interval type. Ifwe have a capacity of 3.2S bits for esti-mating intervals and we do this twice,we should get 6.5 bits as our capacityfor locating points in a square. Addingthe second independent dimension givesus an increase from 3.2S to 4.6, but itfalls short of the perfect addition thatwould give 6.5 bits.

    Another example is provided by Beebe-Center, Rogers, and O'Connell. Whenthey asked people to identify both thesaltiness and the sweetness of solutionscontaining various concentrations of saltand sucrose, they found that the chan-nel capacity was 2.3 bits. Since the ca-pacity for salt alone was 1.9, we mightexpect about 3.8 bits if the two aspectsof the compound stimuli were judgedindependently. As with spatial loca-tions, the second dimension adds a littleto the capacity but not as much as itconceivably might.

    A third example is provided by Pol-lack (18), who asked listeners to judgeboth the loudness and the pitch of puretones. Since pitch gives 2.S bits andloudness gives 2.3 bits, we might hopeto get as much as 4.8 bits for pitch andloudness together. Pollack obtained 3.1bits, which again indicates that thesecond dimension augments the channelcapacity but not so much as it might.

    A fourth example can be drawn fromthe work of Halsey and Chapanis (9)on confusions among colors of equal

  • 88 GEORGE A. MILLER

    luminance. Although they did not ana-lyze their results in informational terms,they estimate that there are about 11 toIS identifiable colors, or, in our terms,about 3.6 bits. Since these colors variedin both hue and saturation, it is prob-ably correct to regard this as a two-dimensional judgment. If we comparethis with Eriksen's 3.1 bits for hue(which is a questionable comparison todraw), we again have something lessthan perfect addition when a seconddimension is added.

    It is still a long way, however, fromthese two-dimensional examples to themultidimensional stimuli provided byfaces, words, etc. To fill this gap wehave only one experiment, an auditorystudy done by Pollack and Picks (19).They managed to get six different acous-tic variables that they could change:frequency, intensity, rate of interrup-tion, on-time fraction, total duration,and spatial location. Each one of thesesix variables could assume any one offive different values, so altogether therewere S8, or 15,625 different tones thatthey could present. The listeners madea separate rating for each one of thesesix dimensions. Under these conditionsthe transmitted information was 7.2 bits,which corresponds to about 150 differ-ent categories that could be absolutelyidentified without error. Now we arebeginning to get up into the range thatordinary experience would lead us toexpect.

    Suppose that we plot these data,fragmentary as they are, and make aguess about how the channel capacitychanges with the dimensionality of thestimuli. The result is given in Fig. 6.In a moment of considerable daring Isketched the dotted line to indicateroughly the trend that the data seemedto be taking.

    Clearly, the addition of independentlyvariable attributes to the stimulus in-creases the channel capacity, but at a

    I 2 3 4 5 6 7

    NUMBER OF VARIABLE ASPECTS

    FIG. 6. The general form of the relation be-tween channel capacity and the number of in-dependently variable attributes of the stimuli.

    decreasing rate. It is interesting tonote that the channel capacity is in-creased even when the several variablesare not independent. Eriksen (5) re-ports that, when size, brightness, andhue all vary together in perfect correla-tion, the transmitted information is 4.1bits as compared with an average ofabout 2.7 bits when these attributes arevaried one at a time. By confoundingthree attributes, Eriksen increased thedimensionality of the input without in-creasing the amount of input informa-tion ; the result was an increase in chan-nel capacity of about the amount thatthe dotted function in Fig. 6 would leadus to expect.

    The point seems to be that, as weadd more variables to the display, weincrease the total capacity, but we de-crease the accuracy for any particularvariable. In other words, we can makerelatively crude judgments of severalthings simultaneously.

    We might argue that in the course ofevolution those organisms were mostsuccessful that were responsive to thewidest range of stimulus energies intheir environment. In order to survivein a constantly fluctuating world, it wasbetter to have a little information abouta lot of things than to have a lot of in-formation about a small segment of the

  • THE MAGICAL NUMBER SEVEN 89

    environment. If a compromise was nec-essary, the one we seem to have made isclearly the more adaptive.

    Pollack and Picks's results are verystrongly suggestive of an argument thatlinguists and phoneticians have beenmaking for some time (11). Accordingto the linguistic analysis of the soundsof human speech, there are about eightor ten dimensionsthe linguists callthem distinctive featuresthat distin-guish one phoneme from another. Thesedistinctive features are usually binary,or at most ternary, in nature. For ex-ample, a binary distinction is made be-tween vowels and consonants, a binarydecision is made between oral and nasalconsonants, a ternary decision is madeamong front, middle, and back pho-nemes, etc. This approach gives usquite a different picture of speech per-ception than we might otherwise obtainfrom our studies of the speech spectrumand of the ear's ability to discriminaterelative differences among pure tones.I am personally much interested in thisnew approach (15), and I regret thatthere is not time to discuss it here.

    It was probably with this linguistictheory in mind that Pollack and Picksconducted a test on a set of tonalstimuli that varied in eight dimensions,but required only a binary decision oneach dimension. With these tones theymeasured the transmitted informationat 6.9 bits, or about 120 recognizablekinds of sounds. It is an intriguingquestion, as yet unexplored, whetherone can go on adding dimensions in-definitely in this way.

    In human speech there is clearly alimit to the number of dimensions thatwe use. In this instance, however, it isnot known whether the limit is imposedby the nature of the perceptual ma-chinery that must recognize the soundsor by the nature of the speech ma-chinery that must produce them. Some-body will have to do the experiment to

    find out. There is a limit, however, atabout eight or nine distinctive featuresin every language that has been studied,and so when we talk we must resort tostill another trick for increasing ourchannel capacity. Language uses se-quences of phonemes, so we make sev-eral judgments successively when welisten to words and sentences. That isto say, we use both simultaneous andsuccessive discriminations in order toexpand the rather rigid limits imposedby the inaccuracy of our absolute judg-ments of simple magnitudes.

    These multidimensional judgments arestrongly reminiscent of the abstractionexperiment of Kulpe (14). As you mayremember, Kiilpe showed that observersreport more accurately on an attributefor which they are set than on attributesfor which they are not set. For exam-ple, Chapman (4) used three differentattributes and compared the results ob-tained when the. observers were in-structed before the tachistoscopic pres-entation with the results obtained whenthey were not told until after the pres-entation which one of the three attri-butes was to be reported. When theinstruction was given in advance, thejudgments were more accurate. Whenthe instruction was given afterwards,the subjects presumably had to judge allthree attributes in order to report onany one of them and the accuracy wascorrespondingly lower. This is in com-plete accord with the results we havejust been considering, where the ac-curacy of judgment on each attributedecreased as more dimensions wereadded. The point is probably obvious,but I shall make it anyhow, that theabstraction experiments did not demon-strate that people can judge only oneattribute at a time. They merely showedwhat seems quite reasonable, that peo-ple are less accurate if they must judgemore than one attribute simultaneously.

  • 90 GEORGE A. MILLER

    SUBITIZING

    I cannot leave this general area with-out mentioning, however briefly, the ex-periments conducted at Mount HolyokeCollege on the discrimination of num-ber (12). In experiments by Kaufman,Lord, Reese, and Volkmann randompatterns of dots were flashed on a screenfor y5 of a second. Anywhere from 1to more than 200 dots could appear inthe pattern. The subject's task was toreport how many dots there were.

    The first point to note is that on pat-terns containing up to five or six dotsthe subjects simply did not make errors.The performance on these small num-bers of dots was so different from theperformance with more dots that it wasgiven a special name. Below seven thesubjects were said to subitize; aboveseven they were said to estimate. Thisis, as you will recognize, what we onceoptimistically called "the span of atten-tion."

    This discontinuity at seven is, ofcourse, suggestive. Is this the samebasic process that limits our unidimen-sional judgments to about seven cate-gories? The generalization is tempting,but not sound in my opinion. The dataon number estimates have not been ana-lyzed in informational terms; but onthe basis of the published data I wouldguess that the subjects transmittedsomething more than four bits of in-formation about the number of dots.Using the same arguments as before, wewould conclude that there are about 20or 30 distinguishable categories of nu-merousness. This is considerably moreinformation than we would expect toget from a unidimensional display. Itis, as a matter of fact, very much like atwo-dimensional display. Although thedimensionality of the random dot pat-terns is not entirely clear, these resultsare in the same range as Klemmer andFrick's for their two-dimensional dis-play of dots in a square. Perhaps the

    two dimensions of numerousness arearea and density. When the subjectcan subitize, area and density may notbe the significant variables, but whenthe subject must estimate perhaps theyare significant. In any event, the com-parison is not so simple as it mightseem at first thought.

    This is one of the ways in which themagical number seven has persecutedme. Here we have two closely relatedkinds of experiments, both of whichpoint to the significance of the numberseven as a limit on our capacities. Andyet when we examine the matter moreclosely, there seems to be a reasonablesuspicion that it is nothing more thana coincidence.

    THE SPAN OF IMMEDIATE MEMOKYLet me summarize the situation in

    this way. There is a clear and definitelimit to the accuracy with which we canidentify absolutely the magnitude ofa unidimensional stimulus variable. Iwould propose to call this limit thespan of absolute judgment, and Imaintain that for unidimensional judg-ments this span is usually somewherein the neighborhood of seven. We arenot completely at the mercy of thislimited span, however, because we havea variety of techniques for gettingaround it and increasing the accuracyof our judgments. The three most im-portant of these devices are (a) tomake relative rather than absolute judg-ments; or, if that is not possible, (b)to increase the number of dimensionsalong which the stimuli can differ; or(c) to arrange the task in such a waythat we make a sequence of several ab-solute judgments in a row.

    The study of relative judgments isone of the oldest topics in experimentalpsychology, and I will not pause to re-view it now. The second device, in-creasing the dimensionality, we have justconsidered. It seems that by adding

  • THE MAGICAL NUMBER SEVEN 91

    more dimensions and requiring crude,binary, yes-no judgments on each at-tribute we can extend the span of abso-lute judgment from seven to at leastISO. Judging from our everyday be-havior, the limit is probably in thethousands, if indeed there is a limit. Inmy opinion, we cannot go on compound-ing dimensions indefinitely. I suspectthat there is also a span of perceptualdimensionality and that this span issomewhere in the neighborhood of ten,but I must add at once that there is noobjective evidence to support this sus-picion. This is a question sadly need-ing experimental exploration.

    Concerning the third device, the useof successive judgments, I have quite abit to say because this device introducesmemory as the handmaiden of discrimi-nation. And, since mnemonic processesare at least as complex as are perceptualprocesses, we can anticipate that theirinteractions will not be easily disen-tangled.

    Suppose that we start by simply ex-tending slightly the experimental pro-cedure that we have been using. Upto this point we have presented a singlestimulus and asked the observer to nameit immediately thereafter. We can ex-tend this procedure by requiring the ob-server to withhold his response until wehave given him several stimuli in suc-cession. At the end of the sequence ofstimuli he then makes his response. Westill have the same sort of input-out-put situation that is required for themeasurement of transmitted informa-tion. But now we have passed froman experiment on absolute judgment towhat is traditionally called an experi-ment on immediate memory.

    Before we look at any data on thistopic I feel I must give you a word ofwarning to help you avoid some obvi-ous associations that can be confusing.Everybody knows that there is a finitespan of immediate memory and that for

    a lot of different kinds of test materialsthis span is about seven items in length.I have just shown you that there is aspan of absolute judgment that can dis-tinguish about seven categories and thatthere is a span of attention that willencompass about six objects at a glance.What is more natural than to think thatall three of these spans are different as-pects of a single underlying process?And that is a fundamental mistake, asI shall be at some pains to demonstrate.This mistake is one of the maliciouspersecutions that the magical numberseven has subjected me to.

    My mistake went something like this.We have seen that the invariant fea-ture in the span of absolute judgmentis the amount of information that theobserver can transmit. There is a realoperational similarity between the ab-solute judgment experiment and theimmediate memory experiment. If im-mediate memory is like absolute judg-ment, then it should follow that the in-variant feature in the span of immediatememory is also the amount of informa-tion that an observer can retain. If theamount of information in the span ofimmediate memory is a constant, thenthe span should be short when the indi-vidual items contain a lot of informa-tion and the span should be long whenthe items contain little information. Forexample, decimal digits are worth 3.3bits apiece. We can recall about sevenof them, for a total of 23 bits of in-formation. Isolated English words areworth about 10 bits apiece. If the totalamount of information is to remainconstant at 23 bits, then we should beable to remember only two or threewords chosen at random. In this wayI generated a theory about how the spanof immediate memory should vary as afunction of the amount of informationper item in the test materials.

    The measurements of memory span inthe literature are suggestive on this

  • 92 GEORGE A. MILLER

    question, but not definitive. And so itwas necessary to do the experiment tosee. Hayes (10) tried it out with fivedifferent kinds of test materials: binarydigits, decimal digits, letters of the al-phabet, letters plus decimal digits, andwith 1,000 monosyllabic words. Thelists were read aloud at the rate of oneitem per second and the subjects had asmuch time as they needed to give theirresponses. A procedure described byWoodworth (20) was used to score theresponses.

    The results are shown by the filledcircles in Fig. 7. Here the dotted lineindicates what the span should havebeen if the amount of information in thespan were constant. The solid curvesrepresent the data. Hayes repeated theexperiment using test vocabularies ofdifferent sizes but all containing onlyEnglish monosyllables (open circles inFig. 7). This more homogeneous testmaterial did not change the picture sig-nificantly. With binary items the spanis about nine and, although it drops toabout five with monosyllabic Englishwords, the difference is far less thanthe hypothesis of constant informationwould require.

    1O22z

    tft2K

    fco:m53z

    BINARY DECIMAL [-BETTERS |000DIGITS DIGITS .-LETTERS a DIGITS WORDS

    P- J i U 1DU

    40

    30

    20

    10

    O

    >i* CONSTANT\INFORMATION\\\

    \\\

    s

    -y, t*

    >^ 40

    S 30

    ez 20

    10

    PREDICTEDFROM SPAN FOROCTAL DIGITS

    ,*/ N OBSERVED

    ONE HIGHLY PRACTICED SUBJECT

    2:1 3:1 4:1 SM

    RECOOINO RATIO

    FIG. 9. The span of immediate memory forbinary digits is plotted as a function of thereceding procedure used. The predicted func-tion is obtained by multiplying the span foroctals by 2, 3 and 3.3 for receding into base4, base 8, and base 10, respectively.

    a mnemonic trick for extending thememory span, you will miss the moreimportant point that is implicit innearly all such mnemonic devices. Thepoint is that receding is an extremelypowerful weapon for increasing theamount of information that we candeal with. In one form or another weuse receding constantly in our dailybehavior.

    In my opinion the most customarykind of receding that we do all the timeis to translate into a verbal code. Whenthere is a story or an argument or anidea that we want to remember, we usu-ally try to rephrase it "in our ownwords." When we witness some eventwe want to remember, we make a verbaldescription of the event and then re-member our verbalization. Upon recallwe recreate by secondary elaborationthe details that seem consistent withthe particular verbal recoding we hap-pen to have made. The well-known ex-periment by Carmichael, Hogan, andWalter (3) on the influence that nameshave on the recall of visual figures isone demonstration of the process.

    The inaccuracy of the testimony of

    eyewitnesses is well known in legal psy-chology, but the distortions of testi-mony are not randomthey follownaturally from the particular recodingthat the witness used, and the particu-lar recoding he used depends upon hiswhole life history. Our language is tre-mendously useful for repackaging ma-terial into a few chunks rich in infor-mation. I suspect that imagery is aform of recoding, too, but images seemmuch harder to get at operationally andto study experimentally than the moresymbolic kinds of recoding.

    It seems probable that even memori-zation can be studied in these terms.The process of memorizing may be sim-ply the formation of chunks, or groupsof items that go together, until thereare few enough chunks so that we canrecall all the items. The work by Bous-field and Cohen (2) on the occurrenceof clustering in the recall of words isespecially interesting in this respect.

    SUMMARYI have come to the end of the data

    that I wanted to present, so I wouldlike now to make some summarizing re-marks.

    First, the span of absolute judgmentand the span of immediate memory im-pose severe limitations on the amountof information that we are able to re-ceive, process, and remember. By or-ganizing the stimulus input simultane-ously into several dimensions and suc-cessively into a sequence of chunks, wemanage to break (or at least stretch)this informational bottleneck.

    Second, the process of recoding is avery important one in human psychol-ogy and deserves much more explicit at-tention than it has received. In par-ticular, the kind of linguistic recodingthat people do seems to me to be thevery lifeblood of the thought processes.Recoding procedures are a constantconcern to clinicians, social psycholo-

  • 96 GEORGE A. MILLER

    gists, linguists, and anthropologists andyet, probably because receding is lessaccessible to experimental manipulationthan nonsense syllables or T mazes, thetraditional experimental psychologist hascontributed little or nothing to theiranalysis. Nevertheless, experimentaltechniques can be used, methods of re-coding can be specified, behavioral in-dicants can be found. And I anticipatethat we will find a very orderly set ofrelations describing what now seems anuncharted wilderness of individual dif-ferences.

    Third, the concepts and measuresprovided by the theory of informationprovide a quantitative way of getting atsome of these questions. The theoryprovides us with a yardstick for cali-brating our stimulus materials and formeasuring the performance of our sub-jects. In the interests of communica-tion I have suppressed the technical de-tails of information measurement andhave tried to express the ideas in morefamiliar terms; I hope this paraphrasewill not lead you to think they are notuseful in research. Informational con-cepts have already proved valuable inthe study of discrimination and of lan-guage; they promise a great deal in thestudy of learning and memory; and ithas even been proposed that they canbe useful in the study of concept for-mation. A lot of questions that seemedfruitless twenty or thirty years ago maynow be worth another look. In fact, Ifeel that my story here must stop justas it begins to get really interesting.

    And finally, what about the magicalnumber seven? What about the sevenwonders of the world, the seven seas,the seven deadly sins, the seven daugh-ters of Atlas in the Pleiades, the sevenages of man, the seven levels of hell,the seven primary colors, the seven notesof the musical scale, and the seven daysof the week? What about the seven-point rating scale, the seven categories

    for absolute judgment, the seven ob-jects in the span of attention, and theseven digits in the span of immediatememory? For the present I propose towithhold judgment. Perhaps there issomething deep and profound behind allthese sevens, something just calling outfor us to discover it. But I suspectthat it is only a pernicious, Pythagoreancoincidence.

    REFERENCES1. BEEBE-CENTER, J. G., ROGERS, M. S., &

    O'CoNNELL, B. N. Transmission of in-formation about sucrose and saline solu-tions through the sense of taste. J.Psychol., 19SS, 39, 157-160.

    2. BOUSFIELD, W. A., & COHEN, B. H. Theoccurrence of clustering in the recall ofrandomly arranged words of differentfrequencies-of-usage. /. gen. Psychol.,1955, 52, 83-9S.

    3. CARMICHAEL, L., HOGAN, H. P., & WALTER,A. A. An experimental study of theeffect of language on the reproductionof visually perceived form. /. exp.Psychol., 1932, IS, 73-86.

    4. CHAPMAN, D. W. Relative effects of de-terminate and indeterminate Aufgaben.Amer. J. Psychol, 1932, 44, 163-174.

    5. ERIKSEN, C. W. Multidimensional stimu-lus differences and accuracy of discrimi-nation. VSAF, WADC Tech. Rep.,1954, No. 54-165.

    6. ERIKSEN, C. W., & HAKE, H. W. Abso-lute judgments as a function of thestimulus range and the number ofstimulus and response categories. /.exp. Psychol., 1955, 49, 323-332.

    7. GARNER, W. R. An informational analy-sis of absolute judgments of loudness./. exp. Psychol., 1953, 46, 373-380.

    8. HAKE, H. W., & GARNER, W. R. The ef-fect of presenting various numbers ofdiscrete steps on scale reading accuracy,/. exp. Psychol., 1951, 42, 358-366.

    9. HALSEY, R. M., & CHAPANIS, A. Chro-maticity-confusion contours in a com-plex viewing situation. J. Opt. Soc.Amer., 1954, 44, 442-4S4.

    10. HAYES, J. R. M. Memory span for sev-eral vocabularies as a function of vo-cabulary size. In Quarterly ProgressReport, Cambridge, Mass.: AcousticsLaboratory, Massachusetts Institute ofTechnology, Jan.-June, 1952.

  • THE MAGICAL NUMBER SEVEN 97

    11. JAKOBSON, R,, FANT, C. G. M., & HALLE,M, Preliminaries to speech analysis.Cambridge, Mass.: Acoustics Labora-tory, Massachusetts Institute of Tech-nology, 19S2. (Tech. Rep. No. 13.)

    12. KAUFMAN, E. L., LORD, M. W., REESE,T. W., & VOLKMANN, J. The discrimi-nation of visual number. Amer. J.Psychol, 1949, 62, 498-525.

    13. KLEMMEE, E. T., & FRICK, F. C. Assimi-lation of information from dot andmatrix patterns. /. exp. Psychol., 1953,45, 15-19.

    14. KTJLPE, O. Versuche tiber Abstraktion.Ber. u. d. I Kongr. f. exper. Psychol.,1904, 56-68.

    15. MILLER, G. A., & NICELY, P. E. An analy-sis of perceptual confusions among some

    English consonants. J. Acoust. Soc.Amer., 1955, 27, 338-352.

    16. POLLACK, I. The assimilation of sequen-tially encoded information. Amer. J.Psychol., 1953, 66, 421-435.

    17. POLLACK, I. The information of elemen-tary auditory displays. /. Acoust. Soc.Amer., 1952, 24, 745-749.

    18. POLLACK, I. The information of elemen-tary auditory displays. II. /. Acoust.Soc. Amer., 1953, 25, 765-769.

    19. POLLACK, I., & FICKS, L. Information ofelementary multi-dimensional auditorydisplays. /. Acoust. Soc. Amer., 1954,26, 155-158.

    20. WOODWORTH, R. S. Experimental psy-chology. New York: Holt, 1938.

    (Received May 4, 1955)