12
This article was downloaded by: [UQ Library] On: 14 November 2014, At: 04:50 Publisher: Routledge Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK The Journal of General Psychology Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/vgen20 A Criticism of the Use of the Wechsler-Bellevue Scale as a Diagnostic Instrument Melvin R. Marks a a AGO, Personnel Research Section , USA Published online: 06 Jul 2010. To cite this article: Melvin R. Marks (1953) A Criticism of the Use of the Wechsler- Bellevue Scale as a Diagnostic Instrument, The Journal of General Psychology, 49:1, 143-152, DOI: 10.1080/00221309.1953.9710682 To link to this article: http://dx.doi.org/10.1080/00221309.1953.9710682 PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content.

A Criticism of the Use of the Wechsler-Bellevue Scale as a Diagnostic Instrument

Embed Size (px)

Citation preview

Page 1: A Criticism of the Use of the Wechsler-Bellevue Scale as a Diagnostic Instrument

This article was downloaded by: [UQ Library]On: 14 November 2014, At: 04:50Publisher: RoutledgeInforma Ltd Registered in England and Wales Registered Number: 1072954Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH,UK

The Journal of GeneralPsychologyPublication details, including instructions forauthors and subscription information:http://www.tandfonline.com/loi/vgen20

A Criticism of the Use of theWechsler-Bellevue Scale as aDiagnostic InstrumentMelvin R. Marks aa AGO, Personnel Research Section , USAPublished online: 06 Jul 2010.

To cite this article: Melvin R. Marks (1953) A Criticism of the Use of the Wechsler-Bellevue Scale as a Diagnostic Instrument, The Journal of General Psychology, 49:1,143-152, DOI: 10.1080/00221309.1953.9710682

To link to this article: http://dx.doi.org/10.1080/00221309.1953.9710682

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all theinformation (the “Content”) contained in the publications on our platform.However, Taylor & Francis, our agents, and our licensors make norepresentations or warranties whatsoever as to the accuracy, completeness,or suitability for any purpose of the Content. Any opinions and viewsexpressed in this publication are the opinions and views of the authors, andare not the views of or endorsed by Taylor & Francis. The accuracy of theContent should not be relied upon and should be independently verified withprimary sources of information. Taylor and Francis shall not be liable for anylosses, actions, claims, proceedings, demands, costs, expenses, damages,and other liabilities whatsoever or howsoever caused arising directly orindirectly in connection with, in relation to or arising out of the use of theContent.

Page 2: A Criticism of the Use of the Wechsler-Bellevue Scale as a Diagnostic Instrument

This article may be used for research, teaching, and private study purposes.Any substantial or systematic reproduction, redistribution, reselling, loan,sub-licensing, systematic supply, or distribution in any form to anyone isexpressly forbidden. Terms & Conditions of access and use can be found athttp://www.tandfonline.com/page/terms-and-conditions

Dow

nloa

ded

by [

UQ

Lib

rary

] at

04:

50 1

4 N

ovem

ber

2014

Page 3: A Criticism of the Use of the Wechsler-Bellevue Scale as a Diagnostic Instrument

T h e Journal of General Psychology, 1953, 49, 143-152.

A CRITICISM OF T H E USE OF T H E WECHSLER-BELLEVUE SCALE AS A D I A G N O S T I C I N S T R U M E N T *

-4G0, Persoqanel Research Section

MELVIN R. MARKS

A. INTRODUCTION As an instrument by which mental disease syndromes may be differentiated

diagnostically, the Wechsler-Bellevue Scale ( WB ) has been espoused by clinical psychologists with increasing enthusiasm since its introduction in 1939. Its popularity as such an instrument poses the question: What is the rationale for its use?

Wechsler’s (6) description of the scale attempts justification, and this will be discussed in detail below. Rapaport, Schafer, and Gill (4) in their manual disagree with Wechsler’s interpretation of the verbal-performance dichotomy of the scale. Watson ( S ) , and very recently Rabin and Guertin ( 3 ) have summarized and coordinated the findings where the WB has been used with various clinical groups. T h e essence of the findings with respect tn diagnosis based on subtest patterns is that empirical work has not borne out the expectations. Rabin and Guertin suggest that some attention should be concentrated on the measuring instrument itself. T h e present paper is an effort a t a theoretical critique of the diagnostic usefulness (not with use as an intelligence test) of the scale. T h e instrument itself will be ex- amined and relatively little reference will be made to empirical findings ex- cept for some factor analysis material.

B. DEFINITIONS AND CRITERIA Justification for the use of subtest patterns as diagnostic tools rests upon

Wechsler’s statement that, “The ultimate products of intelligent behavior are not only a function of the number of abilities or their quality, but also of the way in which they are combined, that is, upon their configuration” (6 , p. 3). Wechsler believes that the verbal-performance dichotomy, and the further breakdown into subtests have diagnostic implications because : ( a ) Although two individuals might achieve the same total score on the full scale, differential contribution of “verbal” and “performance” abilities might still serve to discriminate between them, or (b ) if performance and

*Received in the Editorial Office on July 16, 1951.

143

Dow

nloa

ded

by [

UQ

Lib

rary

] at

04:

50 1

4 N

ovem

ber

2014

Page 4: A Criticism of the Use of the Wechsler-Bellevue Scale as a Diagnostic Instrument

144 JOURNAL OF GENERAL PSYCHOLOGY

verbal totals as well as full scale were still equal for the two persons, there might still be interpretable variation in the subtest profiles. T h e argument then proceeds : If these intratest fluctuations form profiles (patterns) typical of clinically defined groups, then a given pattern might be “diagnosed” with reference to an established group pattern paradigm.

This discussion of the WB must be made with reference to a set of criteria which are appropriate for any battery which yields patterns which are to be interpreted diagnostically. Criteria immediately suggested are reliability and validity with their usual test-theoretical definitions. T w o additional cri- teria are proposed here. A diagnostic test should have efficiency as defined by its capacity to yield its measure without superfluous operations; it should have uniwocaiity as defined by its capacity to express individual performance in a manner susceptible of unique interpretation. While reliability and validity may be expressed numerically, efficiency and univocality must be evaluated qualitatively. This critique will not be concerned with the relia- bility of the ~Vl3;~ that criterion must be satisfied by empirical demonstration. Wechsler has offered reliability coefficients for test-retest correlations, verbal- performance correlations and standard error of measurement.

C. THE MECHANICS OF THE DIAGNOSTIC USE OF THE W B Before examining the W B for efficiency, validity, and univocality, it is

necessary to summarize the method by which the pattern to be diagnosed is derived from the raw scores of the test. Each of the raw subtest totals is transformed to a standard (weighted) scale with Mean of 10.00 and SD of 3.00. T o assist the psychometrician, a conversion table is supplied (6 , p. 228). Next, a deviation unit is defined: For full scale standard scores in the range 80-110, 1 deviation unit (DU) is 2 standard score points.

IAccording to Cronbach (2, p. 151), “For two scores to be compared with each other as in a profile, the difference between them must be reliable. . . . If either one is unreliable, the difference between them is likely to be due to chance.” Cronbach gives data on the equivalence of Wechsler’s Forms I and I1 by subtest. T h e coeffi- cients range from 0.34 to 0.80, seven of the subtests having coefficients below 0.80. An estimate of the reliability of a difference between two tests is given by the formula

- rx,u-- ray - -

rx-u 1-rw - where yo,(, is the average reliability of the tests and rXy is the correlation between them. I f this estimate is applied to Cronbach’s data\ for the Picture Completion and Comprehension subtests which have reliabilities of 0.34 and 0.44, respectively, the reliability of the difference between them is indicated by a coefficient r- - -0 .15 . Certainly, the stability of a profile including these two subtests is to be questioned!

Dow

nloa

ded

by [

UQ

Lib

rary

] at

04:

50 1

4 N

ovem

ber

2014

Page 5: A Criticism of the Use of the Wechsler-Bellevue Scale as a Diagnostic Instrument

MELVIN R. MARKS 145

For full scale standard scores outside of the indicated range, the individual’s mean subtest standard score is divided by 4 to get 1 DU. For a given per- son, each subtest standard score is transformed into DU from the mean of all his subtests; these D U are expressed as “+” or “-.‘I For example, if a subtest standard score were between 1.5-2.5 DU above the mean for a particular person, it would be assigned a value of “+.” If that subtest were 2.5-3.5 DU below the mean it would be assigned ‘I-- .,, Standard scores in the range k0.5 DU from the mean are assigned “0”. T h e pattern, in terms of these f and - notations is then compared with a set of paradigm patterns given by Wechsler in his Table 30 (6, p. 150). Beside the specific paradigms for Organic Brain Disease, Schizophrenia, Neurotics, Adolescent Psychopaths, and Mental Defectives, certain subsidiary signs are furnished. These purport to aid in finer discrimination. T w o comments are in order here: ( a ) for the usual case, i.e., where the mean subtest standard score is in the interval 8-11, the DU is derived not from the data for the individual being diagnosed, but from group norms; (6) the use of diagnostic profiles derived from group norms probably violates the clinician’s canon that the genotypic is preferable to the phenotypic approach.

D. EVALUATION 1 . The Eficiency of the WB

The Law of Parsimony would dictate that, other things being equal, a test is good to the extent that it is simple and short. For justifiable inclu- sion in a battery a subtest should make a unique contribution, and where the battery is to be interpreted diagnostically, this requirement is quite com- pelling. Lest this requirement of “unique contribution” be confused with the criterion of uniwoculity to be discussed later, the criterion of efficiency may be restated. In any battery the attempt should be made to minimize subtest intercorrelations while maximizing correlations of subtests with ex- ternal criteria.2 When such conditions exist the battery is composed of subtests which approach factorial “purity,” i.e., each subtest is saturated with a single factor which has negligible loadings on the other subtests. Con- versely, when subtests are intercorrelated, they are measuring the same fac-

T h a t Wechsler is either not aware of this requirement or does not subscribe to its necessity is evidenced by his statement in connection with the Object Assembly test: “In spite of its limitations, . . . (it) has a number of compensating features. . . . While the test correlates poorly with almost every one of our subtests it does con- tribute something to the total score . . . and (low) correlations . . . are primarily due to the large devi.ation of a relatively small and seemingly special group uf indi- viduals” (6, p. 97). Note also Wechsler’s statements relative to the Block Design Test (6, P. 72).

Dow

nloa

ded

by [

UQ

Lib

rary

] at

04:

50 1

4 N

ovem

ber

2014

Page 6: A Criticism of the Use of the Wechsler-Bellevue Scale as a Diagnostic Instrument

146 JOURNAL OF GENERAL PSYCHOLOGY

tor; then each is essentially a duplication of one of the others and the Law of Parsimony would dictate that all but the best be discarded. I n other words, no advantage is gained by testing the same aspect (trait, ability, etc.) more than once except possibly in increased reliability. T h e last desideratum may be accomplished more readily by increasing subtest length.

Wechsler’s ( 6 ) data shows an average inter-r of 0.45; 2/3 of the inter-r’e exceed 0.40. T h e magnitude of the intercorrelations indicate that, despite the nominally different captions, groups of subtest are extensions of some prototype test.; T h e JVB is thus inefficient in the sense that the present 10 subtests might be reduced to 4 or 5 without loss in diagnostic power. An explanation of the inefficiency thus defined is offered in terms of two sug- gested causes: ( a ) the manner of original selection of the subtests, and ( b ) intra-subtest heterogeneity. An analysis of the reasons given by Wechsler for inclusion of the various subtests will illustrate these points.

T h e Information test was included because, “The test is of value because it gives the subject’s range of information” (6, p. 79).

T h e Comprehension test was included because, “. . . off hand it might be termed a test of common sense . . . (and) . . . when given orally, one of the most gratifying things about it . . . is the rich clinical data which it fur- nishes us about the subject” (6, p. 80).

T h e Arithmetic test was included because, “. . . it has long been recog- nized as a sign of mental alertness . . . a good measure of general intelligence”

T h e Block Design test was included because, “. . . its author’s (Kohs) enthusiasm is fully justified. . . . In fact, it correlates better with compre- hension, information, and vocabulary than some of the verbal tests them- selves” (sic !) (6, p. 92).

T h e remaining subtests are justified by Wechsler in a similar fashion. Probably none of the justifications are scientifically satisfying. T h e incon- sistent rationale for inclusion, characterized as it is by ad hoc and retrospec- tive reasoning, seriously impugns the value of the battery as a producer of usable diagnostic patterns. While it is possible that the skilled clinician, when considering the performance of one individual might profit from the exhibited W B subtest pattern, it is also quite possible that he might secure the same information by other means-r even that he might be misled.

(6, P. 82).

W h a t is meant here is that the W-B is a “batrery” in name only. Perhaps the prototype suggested is a test of Spearman’s “g”. The intercorrelations are “high” when compared with Thurstone’s PMA, where the average inter-r = 0.34, and only 1/3 of the inter-r’s exceed 0.40.

Dow

nloa

ded

by [

UQ

Lib

rary

] at

04:

50 1

4 N

ovem

ber

2014

Page 7: A Criticism of the Use of the Wechsler-Bellevue Scale as a Diagnostic Instrument

MELVIN R. MARKS 147

The second “cause” suggested for the WB’s inefficiency was termed “intra- subtest heterogeneity.” Such heterogeneity involves the confounding, within a single subtest, of items which seem to be testing different things. Illustra- tive examples may be chosen from the Comprehension test. T h e first item reads, “What is the thing to do if you find an envelope in the street that is sealed, addressed and has a new stamp?” Although this test may call for common sense (as Wechsler believes) the answer may be mediated also by intelligence (as usually tested), curiosity, social responsibility, etc. Item 3 of the same test reads, “Why should we stay away from bad company?” Aside from the objection that the question is of the form, “When did you stop beating your wife ?” the answer may involve intelligence, acceptance of prevailing moral code, early training, degree of confidence in personal incorruptibility, etc. Item 4, “Why should people pay taxes?” begs the question also; its answer may involve a particular political philosophy, spe- cial economic status, etc. Other items in this and other subtests are open to similar objection. They are complexly determined in a manner which varies from item to item of the same subtest. One is tempted to ask why Wechsler did not establish item patterns for individual subtest-as i t is, total score on a given subtest cannot be thought of as a unique measure to be accorded its proper place in a diagnostic profile. T h e high intercorrelations among the subtests may well be a function of this complexity; but, it is just this sort of complex determination which Wechsler was trying to avoid when he first dichotomized the WB into a verbal and a performance part, and then further refined these divisions into subtests.

2. T h e Va l id i t y of the WB Wechsler validated the WB as an intelligence test-not as a diagnostic

instrument-through the usual techniques of correlation with external cri- teria, viz., Stanford-Binet, teachers’ ratings, psychiatrists’ estimates, etc. The high subtest intercorrelations noted in the previous section challenge, if only indirectly, the validity of the derived patterns. Some factor analysis ma- terial is pertinent here. Table 1 gives the final rotated factor matrix ex- tracted by Balinsky (1) from original data furnished by Wechsler. The names of the factors are irrelevant here, but the magnitude of the com- munalities is important. For Balinsky, these average about .400; the writer has factor analyzed the intercorrelations of Wechsler’s Table 41 (6, p. 223) and obtained communalities which averaged approximately ,500. T h e dis- crepancy may be attributed to the fact that the two analyses were based on data from subjects of different age ranks.

Dow

nloa

ded

by [

UQ

Lib

rary

] at

04:

50 1

4 N

ovem

ber

2014

Page 8: A Criticism of the Use of the Wechsler-Bellevue Scale as a Diagnostic Instrument

148 JOURNAL OF GENERAL PSYCHOLOGY

TABLE 1

WECHSLER-BELLEVUE SCALE* FACTOR ANALYSIS O F WECHSLER’S DATA ON INTERCORRELATIONS BETWEEN SUBTESTS OF

Factors Subtests I I1 I11 IV ha

Comprehension .540 .059 .270 .071 Information .504 .018 .415 .125 Arithmetic .034 .182 3 5 1 -.127 Digit Span .051 ,496 .126 -.lo6 Picture Arrangement .244 -.087 .381 .246

Object Assembly .025 .001 .033 .729 Digit Symbol .395 .364 -.lo1 .208

Picture Completion -.085 .I15 .403 .379

Block Design .174 --.048 .231 .743

*After Balinsky ( 1 ) . The Similarities subtest was omitted by Balinsky of an “insufficient” number of cases. Data are based on ages 25-29.

.373

.442

.354 ,276 .273 .326 .539 .342 .639

because

T h e subtest communalities account for a little less than half of the total variance. This poses a serious problem for proponents of the diagnostic use of patterns derived from the W B . With smaller communalities (and assumed negligible error variance), most of the variance would be attributable to specific factors, i.e., the subtests would be making individual and separate contributions to the battery. This would imply meaningfulness in the pattern of subtest totals. Conversely, if the communalities were larger (again with negligible error variance), it would be interpreted that common factors were the principal determinants of the test total. I n such case, use of the W B might lead to a profile in terms of the common factors, rather than of the subtest totals. Unfortunately, since the obtained communalities are neither “small” nor “large” the W B subtests are neither “pure” enough to be good tests of specific factors, nor overlapping enough to be good tests of common factors.

T h e evaluation of the WB’s validity via factor analysis is quite specific. T w o further approaches to the question will be explored here. T h e first involves a discussion of the philosophy of diagnostic tests; the second in- volves examination of Wechsler’s attempts to determine diagnostic validity.

Wechsler’s use of the WB as a diagnostic tool assumes ( u ) that the sub- tests are separate entities (or, as a corollary, that they test distinct behavioral aspects of the organism), and (S) that mental illness involves differential impairment of the aspects tested. From these assumptions it may be deduced that, to the extent that mental disease syndromes are different, a more or less unique subtest profile is typical of each, i.e., there exists an isomorphism between test structure and intellective structure. Indeed, Wechsler says:

Dow

nloa

ded

by [

UQ

Lib

rary

] at

04:

50 1

4 N

ovem

ber

2014

Page 9: A Criticism of the Use of the Wechsler-Bellevue Scale as a Diagnostic Instrument

MELVIN R. MARKS 1 49

The general clinical problem of test patterning consists of establishing associations between particular test score divergencies, and specific clinical entities. . . . The above procedure implies a prior demonstration of the existence of established associations or correlations between par- ticular test “signs” and disease entities (6, p. 152).

It has been demonstrated previously in this paper that the subtests are not separate entities. As to the assumption that mental illness involves differ- ential impairment of the aspects tested, a further question is implied: “Can a test be constructed so that its parts are isomorphic with intellective struc- ture?” Such a question cannot be answered, since its answer depends on the establishment of definite information as to the nature of intellective struc- ture, independent of tests. It is the unavailability of such information that leads to operational definitions of intelligence of the type, “Intelligence is what intelligence tests test.”

It was noted previously that no attempt at statistical validation of the diagnostic paradigms was attempted by Wechsler. H e says on this point:

The extensive statistical work required for such correlation has only barely begun, but we have accumulated sufficient clinical experience with the Bellevue Scale to warrant a presentation of certain empirical findings which we believe will be of use to the clinician in his attempts at a diagnosis. These have been put together in Table 30 which attempts to summarize the test patterning met with in the various common mental disease entities (6, p. 152).

As far as the writer knows, the results of the “extensive statistical work” as it applies to the validation of test patterns have not yet been published. As to the stability ,of the “empirical findings” of “clinical experience,” the em- pirical findings of Rabin ( 3 ) , Rapaport, et ul. ( 4 ) and others are sufficient comment.

3. T h e Univoculity of the WB Univoculity as used here refers to the uniqueness-diagnosticaIly-of a

particular test pattern as expressed in Wechsler’s deviation notation. If the profile paradigms given by Wechsler in his Table 30 are univocal, two im- plications may be noted: ( u ) a paradigm profile should be capable of fit by only a single test profile; practically, this should be amended to a sharply limited number of different test profiles; ( b ) a particular test pro- file should not meet the requirements of more than one paradigm. These implications comprise the essence of an isomorphism-a one-to-one corre- spondence. Abstractly, for an isomorphism, for any element in collection A

Dow

nloa

ded

by [

UQ

Lib

rary

] at

04:

50 1

4 N

ovem

ber

2014

Page 10: A Criticism of the Use of the Wechsler-Bellevue Scale as a Diagnostic Instrument

150 JOURNAL OF GENERAL PSYCHOLOGY

there exists one and only one element (image) in collection B; conversely, to any image in collection B there corresponds one and only one element in collection A. By analogy, collection A is the set of paradigms furnished by Wechsler; collection B is composed of individual test profiles, i.e., pos- sible patterns on the W B . Presumably there is an additional (and un- known) collection, say A’ which consists of the mental disease syndromes in one-to-one correspondence with the paradigms, and also of course with the individual test profiles.

Certainly the mathematical requirements for an isomorphism are far more rigorous than those which may be demanded of a diagnostic test. Intel- lective structure is a concept built partially from test information and par- tially from clinical insights. Although it is idle to speculate on the possi- bility of constructing a test which will mirror the structure of a concept, it is possible, and Wechsler may have attempted to do this, to define the concept in terms of test operations. If the latter be the case, we have a right to expect that the proposed diagnostic instrument be univocal in the practical sense in which the term is used here.

If one considers the range of “+” and “-” signs, by subtest, for each of the para- digms given by Wechsler in his Table 30, one is impressed by their elasticity. Elementary combinatorial analysis shows that 9 possible test patterns “fit” Organic Brain Disease; 1,152 fit Schizophrenia; 4 fit Neurotics; 768 fit adolescent Psychopaths; 48 fit Mental Defectives. Let us now consider the Schizophrenia paradigm, first because schizophrenics comprise a formidable proportion of the mentally ill and second because the case against Wechsler’s thesis may be illustrated m s t forcefully with this example.

They were constructed by taking for each subtest the extreme DU’s permitted by the Schizophrenia paradigm. For example, Similarities may vary (according to Wechsler) from “+” to “-- ” so individual A is given a score of I ‘ + ”

and B is given ii-- .” Next, we assign a rank order to the DU according

“+ +” = 5. Now, if both A and B are classifiable as schizophrenics by the paradigm (as they are here), then it is not unreasonable to expect that the rank order correlation of the DU will be fairly high. Actually, with Vocabulary excluded (as it is usually), rho = 0.09. This lack of relation- ship, coupled with the large number of patterns possible (1,152), appears to falsify the first implication of univocality-the number of different patterns

When the WB is analyzed for univocality it is found wanting.

Table 2 gives test patterns for fictitious individuals A and 8.

to the following scheme: ‘‘- -” = 1 ; “-” = 2 ; “0” = 3 ; “+” = 4;

Dow

nloa

ded

by [

UQ

Lib

rary

] at

04:

50 1

4 N

ovem

ber

2014

Page 11: A Criticism of the Use of the Wechsler-Bellevue Scale as a Diagnostic Instrument

MELVIN R. MARKS 151

TABLE 2* TEST CHARACTERISTICS OF SCHIZOPHRENIA IN TERMS OF DEVIATIONS FROM MEAN

WEIGHTE~ SCORES

I I1 Subtest From To

+ + Information + Comprehension + Arithmetic 0 Digit Span 0 + Similarities + + + Vocabu I a ry + + Picture Completion 0 Picture Arrangement - Object Assembly - - Block Design 0 + Digit Symbol - -

- - -

-- 0

*Adapted from Wechsler, D., Mearurcmcnt o f Adul t Intelligence, Table 30.

fittable to the paradigm is not sharply limited. Either A and t3 are not both schizophrenic in which case the paradigm fails; or, they may be both schizo- phrenic, in which case the results are of such generality as to be of doubtful use. There is of course the third, possibly even more distressing alternative that neither A nor B is schizophrenic! But this is a question of validity.

The adequacy of the WB for the second implication of univocality (that a test pattern fit not more than one paradigm) may be analyzed as follows. Let US consider a new fictitious person, say C, with this DU pattern. Informa- tion + ; Comprehension + ; Arithmetic 0 ; Digi t Span 0 ; Similarities + ;1 Vocabulary +; Picture Completion 0 ; Picture Arrangement 0 ; Object As- sembly -; Block Design 0 ; Digi t Symbol -. If the psychometrician “shops around” in the paradigms he will find that C’s test pattern fits the Schizophrenics and Neurotics paradigms almost equally well ; the single ex- ception is Digit Span in which the Schizophrenic should get 0 and the Neurotic -, according to Wechsler.

T h e above example, although fictitious, is not impossible of occurrence. It effectively falsifies the second implication of univocality-that a particular test pattern should not fit more than one paradigm. It may be objected that, inasmuch as Wechsler’s Table 30 gives additional comments which pur- port to aid in differential diagnosis, the force of the argument is mitigated. However, if the objection be valid, it almost follows that the profiles are necessarily of little value; the comments, if they distinguish the clinical en- tities would be sufficient in themselves.

Dow

nloa

ded

by [

UQ

Lib

rary

] at

04:

50 1

4 N

ovem

ber

2014

Page 12: A Criticism of the Use of the Wechsler-Bellevue Scale as a Diagnostic Instrument

152 JOURNAL OF GENERAL PSYCHOLOGY

E. SUMMARY AND CONCLUSIONS T h e W B , as a diagnostic tool only, has been analyzed for adequacy with

T h e respect to suggested criteria of efficiency, validity, and univocality. following conclusions were reached.

1. Eficiency

High subtest intercorrelations, inconsistent rationale of inclusion of sub- tests, and confounding within subtests of a variety of items indicate that the WB is not efficient, i.e., it does not yield its measure with minimum opera- tions.

2. Validi ty

T h e lack of orthodox validation data on the diagnostic paradigms; the failure to demonstrate justification for the assumption of test-intellective structure isomorphism ; factor analysis which reveals confounding of specific and common factors in the subtests, all indicate that the W B lacks validity.

3 . Univocality

Since a wide variety of individual test patterns may be fitted to a specific paradigm, and since a particular test pattern may fit more than one paradigm, it appears that the W B is not univocal as a diagnostic tool.

REFERENCES 1. BALINSKY, B. An analysis of the mental factors in various age groups from

2. CRONBACH, L. J. Essentials of Psychological Testing. New York: Harper, 1949. 3. RABIN, A. I., & GUERTIN, W. H. Research with the Wechsler-Bellevue test,

4. RAPAPORT, D., with collaboration of SCHAFER, R., & GILL, M. Manual of Diag-

5. WATSON, R. I. The use of the Wechsler-Bellevue Scale: A supplement. Prychol.

6. WECHSLER, D. The Measurement of Adult Intelligence. (3rd ed.) Baltimore: Williams & Wilkins, 1944.

AGO, Personnel Research Section Department of the A r m y Washington 25, D . C.

9 to 60. Genet. Prychol. Monog., 1941, a3, 191-234.

1945-1950. Psychol. Bull., 1951, 48, 211-248.

nostic Testing. New York: Macy, 1944.

Bull., 1946, 43, 61-68.

Dow

nloa

ded

by [

UQ

Lib

rary

] at

04:

50 1

4 N

ovem

ber

2014