24
ED 251 483 AUTHOR TITLE INSTITUTION SPONS AGENCY PUB DATE NOTE PUB TYPE EDRS PRICE DESCRIPTORS IDENTIFIERS DOCUMENT RESUME TM 840 693 Schrader, William B. The Graduate Management Admission Test: Technical Report on Test Development and Score Interpretation for GMAT Users. Educational Testing Service, Princeton, N.J. Graduate Management Admission Council, Princeton, NJ. [84] 24p. Reports - Research / Technical (143) MF01/PC01 Plus Postage. *Administrator Education; Business Administration; *College Entrance Examinations; *Graduate Study; Higher Education; Item Analysis; *Scoring; Student Characteristics; *Test Construction; Test Format; Testing; Testing Programs; Test Interpretation; Test Items; Test Manuals; Test Reliability; Test Validity *Graduate Management Admission Test ABSTRACT This report provides information on test development, test administration, and score interpretation for the Graduate Management Admission Test (GMAT). The GMAT, first administered in 1954, provides objective measures of an applicant's abilities for use in admissions decisions by graduate management schools. It is currently composed of five sections: (1) Reading Comprehension; (2) Problem Solving; (3) Practical Judgment; (4) Data Sufficiency; and (5) Usage. New test forms are developed systematically from a test item pool. New items are pretested at regular test administrations and evaluated empirically. Test specifications are the blueprints for assembling a final test form. Uniform test administration conditions are maintained through responsible supervision and careful test security procedures. In addition, several publications provide all examinees with GMAT test information and test taking strategies. The GMAT score scale has a mean of 500, a standard deviation of 100 for the base group, and a possible range from 200 to 800. Angoff's methods are used to equate scores. Continuing validity studies (predictive, content, and construct) provide an important basis for score interpretation information on reliability, standard error, descriptive statistics, and biographical data for examinees are also given to assist in score interpretai-ion. Appendices contain sample GMAT test items and methods for calculating reliability coefficients and equating parameters. (BS) *********************************************************************** Reproductions supplied by EDRS are the test that can be made from the original document. ***********************************************************************

DOCUMENT RESUME TM 840 693 Schrader, William B. · The Graduate Management Admission Test: Technical Report on Test Development and Score Interpretation ... for Data Sufficiency was

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: DOCUMENT RESUME TM 840 693 Schrader, William B. · The Graduate Management Admission Test: Technical Report on Test Development and Score Interpretation ... for Data Sufficiency was

ED 251 483

AUTHORTITLE

INSTITUTIONSPONS AGENCY

PUB DATENOTEPUB TYPE

EDRS PRICEDESCRIPTORS

IDENTIFIERS

DOCUMENT RESUME

TM 840 693

Schrader, William B.The Graduate Management Admission Test: TechnicalReport on Test Development and Score Interpretationfor GMAT Users.Educational Testing Service, Princeton, N.J.Graduate Management Admission Council, Princeton,NJ.[84]24p.Reports - Research / Technical (143)

MF01/PC01 Plus Postage.*Administrator Education; Business Administration;*College Entrance Examinations; *Graduate Study;Higher Education; Item Analysis; *Scoring; StudentCharacteristics; *Test Construction; Test Format;Testing; Testing Programs; Test Interpretation; TestItems; Test Manuals; Test Reliability; TestValidity*Graduate Management Admission Test

ABSTRACTThis report provides information on test development,

test administration, and score interpretation for the GraduateManagement Admission Test (GMAT). The GMAT, first administered in1954, provides objective measures of an applicant's abilities for usein admissions decisions by graduate management schools. It iscurrently composed of five sections: (1) Reading Comprehension; (2)Problem Solving; (3) Practical Judgment; (4) Data Sufficiency; and(5) Usage. New test forms are developed systematically from a testitem pool. New items are pretested at regular test administrationsand evaluated empirically. Test specifications are the blueprints forassembling a final test form. Uniform test administration conditionsare maintained through responsible supervision and careful testsecurity procedures. In addition, several publications provide allexaminees with GMAT test information and test taking strategies. TheGMAT score scale has a mean of 500, a standard deviation of 100 forthe base group, and a possible range from 200 to 800. Angoff'smethods are used to equate scores. Continuing validity studies(predictive, content, and construct) provide an important basis forscore interpretation information on reliability, standard error,descriptive statistics, and biographical data for examinees are alsogiven to assist in score interpretai-ion. Appendices contain sampleGMAT test items and methods for calculating reliability coefficientsand equating parameters. (BS)

***********************************************************************Reproductions supplied by EDRS are the test that can be made

from the original document.***********************************************************************

Page 2: DOCUMENT RESUME TM 840 693 Schrader, William B. · The Graduate Management Admission Test: Technical Report on Test Development and Score Interpretation ... for Data Sufficiency was

THE GRADUATE MANAGEMENT ADMISSION TEST:

hnicall Repo onTest Developmette andSara° Interpretation

for GMAT Use

William B. SchraderU.S. DEPANTINENT OF EDUCATION

NATIONAL INSTITUTE OF EDUCATIONEDUCATIONAL RESOURCES INFORMATION

CENTER tfRIC)This (1cm., h.ts Ilan epfoduced asreceived frur the nelson or organuaaonornirigiong a

Krim rhdges hay. Iateo made to improvet.0f, 91fdlIty

Pomo. of wow or 0o,ons ,haral in this %loco-tn.ra du not neLefpooh rewosent ()Ito&NIEpositme ur vela y

*PERMISSION TO REPRODUCE THISMATERIAL HAS BEEN GRANTED BY

0%TO THE EDUCATIONAL RESOURCESINFORMATION CENTER (ERIC)."

..1 Prepared for the Graduate Management Admission Councilot) with the advice and assistance of

The Graduate Management Admission Council Selection Committeeand a number of colleagues at Educational Testing Service

2

Page 3: DOCUMENT RESUME TM 840 693 Schrader, William B. · The Graduate Management Admission Test: Technical Report on Test Development and Score Interpretation ... for Data Sufficiency was

CONTENTS

I. INTRODUCTION 5Purpose of GMAT 5Evolution of Test CoMposition 5

H. DEVELOPING A NEW FORM OF GMAT 6Developing an Item Pool 6Pretest Item Analysis: Purpose 7Pretest Item Analysis: An Example 7Assembling the Final Test Form 9

III. ADMINISTERING THE TEST 10Maintaining Uniform Testing Conditions 10Preparing the Examinee for the Test 11

IV. FACILITATING SCORE INTERPRETATION 11

Defining the Score Scales 11

Score Equating 12Validity Studies: Purpose and Background 13

Validity Studies: Methods 13

Validity Studies: Results 14

Content and Construct Validity 15Reliability of Test Scores 15Standard Error of Measurement 16Standard Error of a Score Difference 16Descriptive Statistics or Test Scores 17

Biographical Information for GMAT Examinees 17

APPENDIX A: Sample Items for Item Types used in GMAT 21

APPENDIX B: Calculating Reliability Coefficientsand Equating Parameters 25

3

3

Page 4: DOCUMENT RESUME TM 840 693 Schrader, William B. · The Graduate Management Admission Test: Technical Report on Test Development and Score Interpretation ... for Data Sufficiency was

THE GRADUATE MANAGEMENT ADMISSION TEST:Technic& Report on Test Development and Score Interpretation For GMAT Users

I. INTRODUCTION

The GMAT was first administered in February 1954 toabout 1,300 prospective students of graduate schoolsof business. Less than a year earlier, in March 1953, aconference at which 12 graduate schools of businesswere represented had agreed that a nationwide testingprogram in this area would be useful. There followed aperiod of vigorous activity. Two meetings of a policycommittee formed to guide the new program were held.An important focus of this effort was the identificationof suitable abilities to be measured by the test. All nec-essary steps for scoring, publicizing, developing, andadministering the new test were worked out by ETSwith the advice and approval of the Policy Committee.The test was called the Admission Test for GraduateStudy in Business until 1976. In that year the test namewas changed to Graduate Management AdmissionTest.

From the outset the program was guided by repre-sentatives of participating schools. The test, whichwas the focus of the program, was prepared by ETStest development staff members and was adminis-tered, under secure conditions, throughout the UnitedStates and in a number of foreign cii es. Scoring, re-porting, and various statistical services designed to aidin test development and score interpretation were pro-vided. Finally, research aimed at improving programeffectiveness was identified as an integral part of pro-gram activities. As the program developed over theyears, services relevant to admission but not directlyconcerned with testing were initiated. This report,however, will be limited to matters directly related tothe GMAT.

The incorporation. in 1970, of the Graduate BusinessAdmission Council (now the Graduate ManagementAdmission Council) defined explicitly the role of theCouncil with respect to the test and other program ac-tivities. The Council, which consists of representativesof 54 graduate schools of management, is both a ser-vice organization and a professional organization. As aservice organization it seeks to improve the selectionprocess for graduate management schools by develop-ing and administering appropriate testing instruments,and informing schools and students as to the appropri-ate use of such instruments and other materials relatedto the selection process. In addition, it serves as amedium of information exchange betwee studentsand schools. As a professional organization it servesas a torum for interchange of ideas and information.The Council sponsors the GMAT; ETS consults with theCouncil on all matters of general policy affecting pro-gram activities that it conducts for the Council.

Purpose of GMAT

The purpose of the GMAT is to provide objective measures of an applicant's abilities for use by graduatemanagement schools as one consideration in makingadmissions decisions. In order to make the test as use-ful as possible for this purpose, the test must measureabilities that are relevant to successful performance ingraduate management school and that are developedby a wide range of educational experiences, it must besufficiently long to provide a reasonably dependablemeasure, it must be administered under uniform, se-cure conditions, and it must be scored accurately. Fi-nally, scores must be reported promptly in a conve-nient form and accompanied by materials to aid in theiruse. When these conditions are fulfilled, the testscores may be relied upon by admissions officers tosupplement other data about applicants, particularlyprevious academic performance.

Evolution of Test Composition

The composition of the test with respect to the abilitiesmeasured and the relative weight given to each abilityare the characteristics that define a particular test.

In planning the original 1954 form of the test, itseemed clear that both verbal and quantitative abilitieswere important, and that roughly equal weight shouldbe given to each. Tests of these abilities were considered to be appropriate for students who had enrolled indifferent undergraduate programs. Tests of these abil-ities that had proved to be successful in otherprograms, that seemed appropriate on judgmentalgrounds, and that could be produced expeditiouslywere chosen for the 1954 test. The test consisted offour separately timed sections, as follows:

I. Verbal (25 minutes)II. Quantitative (65 minutes)III. Best Arguments (30 minutes)IV. Quantitative Reading (55 minutes)

Beginning in 1955, the Total test score was supple-mented by a Verbal part score, based on the Verbal andBest Arguments sections, and a Quantitative partscore, based on the Quantitative section. QuantitativeReading items were not included in either part score.The new part scores provided users with informationabout an applicant's relative standing in verbal andquantitative ability.

The composition of the test was changed in severalways beginning in November 1961. Three new itemtypes, Organization of Ideas, Directed Memory (latercalled Reading Recall) and Data Sufficiency were intro-

5

Page 5: DOCUMENT RESUME TM 840 693 Schrader, William B. · The Graduate Management Admission Test: Technical Report on Test Development and Score Interpretation ... for Data Sufficiency was

duced, in part because research evidence indicatedthat they would increase the predictive effectiveness ofthe test (Pitcher, 1960). The Organization of Ideas sec-tion provided an objective measure of an examinee'sability to identify a logical structure within a set ofstatements. The Directed Memory section measuredreading comprehension under conditions that pre-vented the examinee from referring back to the readingpassages when answering questions based on the pas-sages. The Data Sufficiency item type was a measureof quantitative ability based on the examinee's abilityto analyze a mathematical problem without carryingout the actual solution. Quantitative Reading, Verbal,and Best Arguments were dropped from the test. Of thefive parts, Quantitative and Data Sufficiency definedthe Quantitative part score and the other three partsdefined the Verbal part score. The test included the fol-lowing sections.

I. Directed Memory (Reading Recall) (35 minutes)II. Quantitative (75 minutes)III. Organization of Ideas (20 minutes)IV. Data Sufficiency (15 minutes)V. Directed Memory (Reading Recall) (35 minutes)

In November 1966, Organization of Ideas was re-placed by a 20-minute Verbal Omnibus section that in-cluded antonyms, analogies, and sentence completionitems. Except for this change the basic structure of thetest remained the same until 1972, when two 20-minutesections of Practical Business Judgment replaced 35minutes of the time allocated to Reading Recall. Prac-tical Business Judgmen terns were included in theVerbal part score. At t' same time, the Verbal Omnibus section wa: stfloitened from 20 minutes to 15minutes.

In 1976 several changes were introduced in the com-position of the test. Of the two sections that definedthe Quantitative part score, the 75-minute Quantitativesection that emphasized Data Interpretation items wasreplaced by a 40-minute Mathematics section that em-phasized problem solving items, and the time allotmentfor Data Sufficiency was increased from 15 to 30 min-utes. Several changes were made in the sections in-cluded in the Verbal part score. The 15-minute VerbalOmnibus section was replaced by a 15-minute Usagesection. The new section called for the identification oferrors in Standard Written English. It was introduced inrecognition of the importance of written expression inmanagement, a point that was highlighted in the find-ings of the Casserly and Campbell (1973) survey ofskills and abilities needed by graduate students. A fur-ther change, introduced in 1977, sub5tituted 30 min-utes of Reading Comprehension for the 35 minutes ofReading Recall. It was judged that these item typesmeasured very similar abilities, but that Reading Com-prehension would present fewer complications in testadministration

The changes introduced in 1976 and 1977 broughtGMAT to its present composition, which is as follows.

6

I. Reading Comprehension (30 minutes)II. Problem Solving (40 minutes)III. Practical Judgment (40 minutes)IV. Data Sufficiency (30 minutes)V. Usage (15 minutes)

A complete recent form of GMAT is published in the1979.80 Guide to Graduate Management Education.Appendix A of the present report gives sample itemsfor each of the item types included in the test from 1954to the present time.

This brief review of the evolution of the test suggeststhat changes have been gradual and relatively infre-quent. Thus, there is a strong continuity within the testover the years, a condition that is highly desirable ifscores earned at different times are to be treated ascomparable.

II. DEVELOPING A NEW FORM OF GMAT

In an ongoing testing program, a systematic plan foredeveloping new forms of the test is essential, particu-larly to minimize the possibility that examinees willhave an opportunity to anticipate some of the ques-tions included in the test. Ordinarily, a new form of thetest should measure the same abilities and be at thesame difficulty level as previous forms, but should becomposed mainly or exclusively of items not includedin any previous form. If changes are to be introducedbetween a new form and its predecessors, they shouldbe made deliberately, not inadvertently, and gradually,not abruptly, in order to maintain comparability be-tween scores earned at different test administrationsover a period of several years. This discussion will bebased on the more typical situation in which the newform is designed to match earlier forms as closely aspossible.

Each new form of GMAT is composed of objectivetest items that call for the examinee to choose amongfive options of which only one is the best choice and isscored as correct. The first task in building a new formis to develop a supply of items that measure the abil-ities tested in the earlier forms and that are as free aspossible of identifiable defects. It is also necessary tobe able to compare the difficulty of new items inpool with that of items included in previous tests. Thenew test form can then be matched with older testforms with respect to overall difficulty.

Developing an Item Pool

The indispensable first step in building an item pool isto create the first draft of an item. There are a numberof item-writing rules. but their value is mainly in reduc-ing the proportion of items that are found later to be de-fective. items do need to be compatible with otheritems of the same type when used in a test. a fact thatitem writers consider in developing a possible item.The actual production of items seems to depend main-

Page 6: DOCUMENT RESUME TM 840 693 Schrader, William B. · The Graduate Management Admission Test: Technical Report on Test Development and Score Interpretation ... for Data Sufficiency was

ly on a good understanding of the ability measured by aparticular kind of item, perceptiveness in identifyingtasks that will be suitable in difficulty for GMAT exam-inees, and fertility in devising incorrect responses thatwill attract the less able examinees. This is anotherway of saying that item writing is an art.

Once the item nas been drafted, however, a series offormal processes can be used to remedy apparent de-fects, particu!ar!y in the way in which the item is ex-pressed. Obviously, faulty grammar, awkardness of expression, and inconsistencies in style can be correctedby competent editing. Review by one or two persons fa-miliar with the item type may identify items that may beambiguous, especially to examinees who are very highin the ability measured, and items that may inadver-tently give clues, particularly to sophisticated testtakers, concerning the correct answer. Finally, itemsare reviewed by persons who are sensitive to expres-sions that are objectionable to women or to minoritygroups; items are then revised to remove this kind ofdefect.

Pretest Item Analysis: Purpose

Items that survive the review processes are next pre-tested at a regular test administration. Because itemsdo not necessarily work in the way that their authors in-tended, it is important to make an empirical evaluationof each Item before it is permitted to contribute to anexaminee's score. Pretesting also makes it possible tocontrol the difficulty level of new test forms. These twopurposes correspond to the two main kinds of statis-tical analyses applied to pretest data. First, the rela-tionship between examinees' performance on eachitem and their total scores on all items of that partic-ular type helps to identify items that need to be revisedor, possibly, discarded. Second, an index of the diffi-culty of the items, when adjusted to take account of theability level of the pretest group, is useful in controllingthe level of difficulty of the test.

The statistical analysis of each pretested item pro-vides information about the relationship between per-formance on the item and a score based on a set ofsimilar items in three different ways:

(a) The biserial correlation coefficient between the itemand the total score on items of the same type,

(b) The mean score on items of the same type for exam-inees choosing each of the five options and for stu-dents who omit the item; and

(C" or students who rank in each fifth on the totalscores. the number choosing each option or omit-ting the item.

Essentially, the biserial correlation coefficient is anobjective index of the extent to which the examineeswho answered the itE i correctly differ in average scorefrom the remainder of the group. The biserial correla-tion coefficient, which is used for item analysis work at

ETS, adjusts the result to take account of the percent-age of students who give the correct answer.* This adjustment is considered to make the resulting correla-tion coefficients more nearly comparable for items atdifferent difficulty levels. Experience in using itemanalysis results has indicated that items that have acorrelation below .30 need to be reviewed with specialcare to try to find out why examinees who earn hightotal scores on the set of items do not perform appreci-ably better on the item than do examinees who earnlow total scores.

Pretest Item Analysis: An Example

The detailed steps involved in using pretest data maybe illustrated by discussing the example shown inFigure 1.

Information comparing the test score of examineeswho give the correct answer with those who chooseone of the incorrect answers receives special attention.If too many high-scoring students choose a wrong an-swer on tpe item, it often happens that the question isopen to nfisinterpretation. The statistical results are in-tended only to supplement the search for flaws in theitem based on a thoughtful scrutiny of the item by re-viewers.

The biserial correlation coefficient for the sampleitem, shown in the lower right hand corner of the print-out, was .54 for the group tested. This is well above the"danger-point" figure of .30. Thus, it is unlikely that themore detailed statistical results available for each op-tion will reveal serious flaws in the item.

A second way of looking at the results is to considerthe average test score (expressed on a scale to be de-scribed later) of those choosing each option. This anal-ysis makes it possible to spot any wrong answer thatseems to be attracting too many above-average stu-dents. On the sample item, the average total score onquantitative items for examinees who chose the cor-rect option (designated by an asterisk) is 16.3; the high-est average for an incorrect option is 12.5. The averagefor all 2,000 examinees is 13.0.

One feature of the item analysis procedure designedfor the convenience of those who use the results is thatthe average total score is always set at 13.0 for thegroup on which the item analysis is based. The stan-dard deviation of Total scores for the total item anal-ysis group is always set at 4.0. Thus, it is easy to tellwhether the examinees choosing a particular optionare above or below the average of the total group, andby how much. The use of a uniform scale for the testscore enables persons who work with item analysis re-sults to get some idea of how well an item is working bylooking at the pattern of average total scores for thevarious options.

Along with the average test scores, the bottom sec-tion of the printout also shows how many examinees

The biserial coefficient is described more fully in statistical textbooksfe.g . Guilford and Fruchter. 1973)

7

Page 7: DOCUMENT RESUME TM 840 693 Schrader, William B. · The Graduate Management Admission Test: Technical Report on Test Development and Score Interpretation ... for Data Sufficiency was

Figure 1. Item analysis results for a sample Item.

A man has exactly enough fencing to enclose a rectangular region 3 times as long as It is wide. He discovers that If he uses thesame amount of fencing to enclose a square region, he can enclose 225 additional square feet. How many feet of fencing does hehave?

(A) 30 (8) 120 (C) 150 (0) 675 (E) 900

ITEM No, US NO.

EDUCATIONAL

TESTING

SERVICE

TEST: FORM, BASE N, DATE TABULATED'

RESPONSE CODE IV:# 147 N7 N4 HIGHNs

OMIT122 136 140 149 132

A

10 8 10 8 9s

24 26 40 51 172C

29 17 i2 80

53 46 30 34 18E

19 22 10 e 17TOTAL

7 c 7 >KR 7s&P 770 ark/.

ITEM ANALYSIS

DiNons comto ertsroNseFORM

198STEST CODE

MATH 3

BASE N

2000OMIT

699ITEM NO.

76 13.1

45M4

12.5

313* 75 181As MC m,

16.3 10.8 11.6Ms

12.3

13.4PtoAi

0.69

4 f SCAM

CP S

0.23

O # CRITERION

14.213070o ibis

16.4 0.54

chose each option. Of course, the item writer attemptsto create wrong answers that will be attractive to exam-inees and thus to sharpen the differentiation betweenable and less able students on the ability measured bythe item. The extent to which all options are attractingreasonable numbers of examinees tells the item writerhow successful he or she has been in formulating ef-fective wrong answers.

Besides the average score for each of the five possi-ble responses, the printout also shows the averagetotal score for examinees who omitted the item. Anitem is considered to be an "omit" only if the examineehas answered a subsequent item in the separately-timed part of the test under consideration. This definition attempts to distinguish between "omits" (i e.,items that are considered but not answered) and itemsat the end of the test that the examinees may not evenhave read. In Figure 1, the group of examinees whoomitted the sample item is fairly large and their aver-age total score is slightly higher (13.1 vs 13.0) thal thatfor the entire item analysis group. However, becausethe mean score of examinees who reached the item is13.4, those who omitted it have a slightly lower scorethan all examinees who reached it. For this item, a con-siderable proportion of examinees at all five abilitylevels omitted it. On the whole, the proportion of omitson this item is larger than would be ideal, but is not suf-ficiently large to warrant revising it.

8

Finally, the numbers in the upper portion of the print-out provide still greater detail on the relation betweentotal score and responses. The right-hand column(headed "High N ") shows the number of examinees inthe top fifth on total score who gave each response,and the other four columns show the number of exam-inees in successively lower fifths who gave each re-sponse. It will be noted that the numbers for the correctresponse (B) increase consistently from 25 in the bot-tom fifth to 172 in the top fifth and that the numbers forthe most popular incorrect response (D) show a down-ward trend. The figure for "TOTAL" shows the numberwho answered or omitted the item. Because there areexactly 400 examinees in each of the five groups, thedifference between the figure reported for "TOTAL"and 400 shows how many did not reach this item. Forthe top fifth only 44 did not reach it for the bottom fifth,143 did not reach it.

A preliminary idea of how difficult an item is can beobtained by dividing the number of examinees who answered it correctly by the number of examinees whoreached it. For the sample item, this figure (P.) turnedout to be .23. The figure in the box labeled M"--TorAL."shows the average score on the total test for the exam-inees who reached the sample item. Because this aver-age is 13.4, we may conclude that the examinees whoreached this item were more able than the rest of theitem analysis group. We need an estimate of how diffi-

Page 8: DOCUMENT RESUME TM 840 693 Schrader, William B. · The Graduate Management Admission Test: Technical Report on Test Development and Score Interpretation ... for Data Sufficiency was

cult the item would be for the total Item analysis group.The measure of difficulty used for item analysis, calleddelta (A), provides such an estimate.

Delta is so defined that a higher value of delta meansa more difficult Item and thug a smaller percentage ofexaminees in the item analysis group who would an-swer it correctly. It also assumes that a given change inthe delta value would have a greater effect on propor-tion correct for items in the middle of the difficultyrange than for those at the extremes. The followingtable shows the proportion of the item analysis groupwho would be expected to give the correct answer foritems having selected values of delta:

ProportionA Value Correct

17 .1615 .31

13 .5011 .699 .847 .93

The delta scale for item difficulties is defined interms of a normal curve having a mean of 13 and a startdard deviation of 4. Then the percentage of the normalcurve above a particular difficulty value is equal to thepercentage of members of the item analysis group whoanswered the item correctly. Figure 2 illustrates thispoint.

Because test construction often requires precisemeasurements of difficulty level, it is necessary to takeaccount of the fact that different item analysis groupsdiffer from each other in ability level. A relatively sim-ple way of adjusting for the difference between twogroups is applicable provided that sufficient items (us-ually 20 or more) have been administered to bothgroups. Each of these common items will have two val-ues of deltaone for each group. When the pairs ofdeltas are plotted on ordinary graph paper, they gener-ally fall along a straight line. It is then possible to deter-mine a linear equation relating the two sets of deltas.The resulting equation can be applied to transform adelta obtained on one group to the corresponding deltafor the other group. The process of equating item diffi-culties for a new pretest in continuing programs is fa-cilitated by the fact that any item that has previouslybeen equated can be used in the set of 20 or more ilmsneeded for equating the new items.

A delta value calculated for a particular item analysisgroup is called an observed (or raw) delta (A0). Becauseitem analysis groups vary in ability level, the observeddelta for an item will be higher if it was administered toa less able group than if it had been administered to amore able group. Observed deltas can be adjusted sta-tistically so that they represent the difficulty level thateach item would have had if it had been administeredto a standard reference group. The adjusted delta foran item, called its equated delta (As). may be compareddirectly to equated deltas for other items, even thoughthe item analyses were based on different groups.

Figure 2. Pereent correct for various values of when allexaminees have reached the Item.

100

90

80

70

60

50

40

20

10

0

am. :.........pi..'=11:mraz.:...::a _inn: WriYfi:M:=1.

.....:acuzim-

%amiG

:.: il

Effil

::IMPIMIOSSa 14:6::

-,g....:innammi

MEE..armirshnimi:lEmEgiiiiiiffEs

:s...Es 'fiJi

Oa

EIIINEEMELIMENE

. ._.

NEDIANIIIRMEMEii

:IIIIERMINELIVEMMEN

ERVIEMENNEEME

iiitlici:IMBEMINERRIENE

NEIMINIMINEMENEMENIMENIMEROMMEMENAli11111MINEEMINEMENElIIIIIIMMEM

I.?.

MIllECINEWRIMENEINE

:14

MM

MBIlit

10141:111110.1

ITEINIM::'- MEMEMEN

MENFAR,BIERWM:t

utliiiiiegrENIMEME

:i'l III lit. IgniffillinV.E

EMEMITEM

:ii.iammEgmA:HIPMPMil :.t."-1

9

7 8 9 10 11 12 13 15 18 17 18

VALUE

The sample item shown in Figure 1 has an observeddelta of 16.4 and an equated delta of 14.2. Clearly, theitem is more difficult for the group on which its itemanalysis was based than it would have been for thestandard reference group to which equated deltas arereferred. This result indicates that the group on whichthe item analysis was based was less able than thestandard reference group.

Assembling the Final Test Form

If the items in the item pool are the building blocks1,om which a new test form is built, the test specifica-tions are the blueprint that guides the construction of atest form for operational -use in the testing program.The specifications state the number of items of a par-ticular item type and the level and range of difficulty ofthe items to be included in each separately-timed sec-tion. In the usual case, the specifications will be de-signed so that the new test form will match recent pre-vious forms in these respects.

For GMAT and other continuing testing programs,number of items in each separately timed section in re-lation to the time limits is regularly monitored for eachnew test form. Although s.,:veral indicators are used for

9

Page 9: DOCUMENT RESUME TM 840 693 Schrader, William B. · The Graduate Management Admission Test: Technical Report on Test Development and Score Interpretation ... for Data Sufficiency was

this purpose, the percentage answering the last itemmay serve to illustrate the principle. As a general guide-line, it is considered that, if about 80% of the exam-inecs attempt the last item, the time limits and numberof items are reasonably consistent with each other.The results for this indicator for the two most recentforms for which results are available are as follows:

% Reaching Last Item

Section Form A Form B

Reading Comprehension 81.4 74.8

Problem Solving 20.5 39 2Practical Business Judgment 88.2 83.0

Data Sufficiency 77.3 70.1

Usage 61.4 64.3Practical Business Judgment 83.6 90.6

These results suggest that Usage and, to a greater ex-tent, Problem Solving may include a few more itemswithin the allotted time than would be optimal. Partic-ularly for Problem Solving, however, the results may beaffecti..t.i by the tendency of some examinees not totackle the more difficult items even when they havetime to do so. To the extent that this occurs, the per -

e,centage attempting the last item cannot be regardeda satisfactory indicator of the degree to which time

allowed and number of items are suitably matched.If a new item type is to be introduced, it is necessary

to make a judgment concerning the number of itemsthat can be completed by the great majority of exam-inees in the allotted time. This judgment is guided byexperience with the item type in pretest studies or inother testing programs. It is also necessary to judgethe appropriate level and range of item difficulties sothat the new test will be appropriate for GMAT takers.

Once specifications are set, the tasks of selecting aset of items that will fulfill the desired specificationsand of arranging the items in a suitable manner can beperformed. Items are often arranged in an ascendingorder of difficulty, but other considerations such asgrouping similar items may be given priority in deter-mining the arrangement of items. Finally, as an essen-tial test development step, suitable directions to theexaminee must be provided.

The draft test is reviewed to insure that editorial andprinting layoutrules have been followed and that errorsin the item or the designation of the correct answershave not been introduced. Each separately timed partis reviewed with respect to content balance and thetest as a whole is reviewed from the viewpoint of howwomen and ethnic minorities are presented in itemsthat refer to individuals or groups.

III. ADMINISTERING THE TEST

Maintaining Uniform Testing Conditions

Administering the test under uniform conditions is essential if scores earned by different examinees are to

10

be strictly comparable. It is especially important thatthe time limits for each separately timed test sectionbe uniform for all examinees, that the test directions befully understood, and that distractions be held to a minimum. Because an examinee who had access to testitems before the examination would gain an unfair ad-vantage over other examinees, elaborate precautionsare taken to keep test booklets secure before, during,and after the examinations. Examinees are not per-mitted to use any kinds of extraneous materials (e.g.,dictionaries, calculators, notes) during the test and supervisors and proctors are cautioned to be alert to pre-vent copying. To insure that the person for whomscores are reported is actually the person who took thetest, each examinee is asked to provide positive identi-fication and this identification is checked by the super-visor or proctor before the examination begins. From alogical viewpoint, the goal of insuring that each examinee is tested under uniform conditions calls for thor-ough efforts to preserve test security and to preventcopying and impersonations.

The key to maintaining uniform conditions during thetesting sessions is the selection of supervisors whohaVe good judgment and who take a highly responsibleattitude toward administering the tests. In addition, thGMAT Supervisor's Manual provides specific inform -tion on the many detailed tasks that supervisors neeto perform. From the time when the examinees havebeen seated until the examinees are dismissed, eachstatement to be made in conducting the test is speci-fied by the manual and is read verbatim by the personadministering the test in a particular room. Finally, themanual includes a brief form on which the supervisor isasked to report any significant irregularities affectingindividual candidates (e.g., illness, defective test mate-rials) or affecting a group of candidates (e.g., mistim-ing). These Supervisor's Irregularity Reports identifyany significant deviations from uniform testing condi-tions. Each reported deviation is given to an appropri-ate ETS staff member for action. Ordinarily, the actionis based on guidelines or procedures established forhandling various difficulties. For example, if a supervisor reports suspected copying, the irregularity is re-ferred to the staff group concerned with test security.Again, if the supervisor discovers that a mistiming hasoccurred, the report is evaluated by a GMAT programdiction staff member, possibly in consultation withtest statisticans, to determine whether a special testadministration may be needed, or whether some othersolution is appropriate.

Because any breach of test security involves a riskthat some examinees will gain an unfair advantage, thecare that is taken within ETS and by the companies responsible for printing the tests to protect the securityof the tests is an essential part of maintaining uniformtest conditions. After the tests have been administered, further steps for detecting copying or impersona-tion may be performed, based on analyses of answerpatterns or handwriting comparisons. These proce-dures are followed if a school questions the scores

Page 10: DOCUMENT RESUME TM 840 693 Schrader, William B. · The Graduate Management Admission Test: Technical Report on Test Development and Score Interpretation ... for Data Sufficiency was

"earned by one of its applicants or if a person repeatingthe test has shown an exceptionally large score gain.The procedures followed have been carefully designedboth to protect the examinee whose score Is bona fideand tovoid reporting a score that is riot a fair repre-sentation of the examinee's ability because he or shehas copied or has been impersonated.

Preparing the Examinee for the Test

Over and above the need for maintaining uniform con-ditions at the test administration, an obligation hasbeen accepted to provide examinees with informationabout the test and how to approach it. In this way, thepossible advantage of more sophisticated test-takersshould be minimized. Consequently, the Bulletin of In-formation supplied to prospective examinees goesbeyond providing the necessary information about themechanics of registration and procedures for dealingwith exceptional conditions that may arise. In addition,it provides a carefully-prepared set of sample items de-signed to give examinees a realistic idea of the kinds ofitems that they may expect to encounter on the actualexamination. Moreover, there is a discussion of whatGMAT measures and of the role it plays in admissionto graduate management schools. This discussionshould be helpful, for example, in reassuring exam-inees who have an exaggerated idea of the importanceof test scores in admissions.

In recent years, the Council has published a com-plete sample test for use by examinees. The most re-cent publication is in the 1979.80 Guide to GraduateManagement Education. Moreover, both general test-taking suggestions and advice on how to deal witheach item type have been provided. An examinee whoworks through the sample test and who adheres to thetime limits for each section should have a very goodidea of the demands that the actual test will make. Be-cause an answer key is also provided, and a number ofthe items are discussed, the prospective examinee mayobtain an idea of how well he or she has done, and maybe alerted to the need for careful attention to all infor-mation given by an item in choosing an answer. Themain value of the sample test may well be that workingthrough it enables the examinees to adopt a more real-istic attitude in approaching the actual test.

One point of test-taking strategy that has receivedspecial attention arises from the fact that GMAT scoresare corrected for guessing; that is, a percentage of thenumber of wrong answers is subtracted from the num-ber of right answers. This procedure is designed to dis-courage blind guessing. It is important, however, for ex-aminees to understand that they should answer ques-tions about which they have some information even ifthey are not sure of the correct answer. In order to em-phasize this point. the person administering the exami-nation re.ds the following statement to the examinees:

"Although you have already read instructionsabout guessing. they are very important, and I

have been asked to summarize them before youbegin the test. Your GMAT scores will be based onthe number of questions you answer correctlyminus a fraction of the number you answer incor-rectly. Therefore, It Is unlikely that mere guessingwill improve your scores significantly, and it doestake time. However, if you have some knowledgeof a question and can eliminate at least one of theanswer choices as wrong, your chance of gettingthe right answer is improved, and it will be to youradvantage to answer the question. If you knownothing at all about a particular question, it isprobably better to skip it."

IV. FACILITATING SCORE INTERPRETATION

Defining the Score Scales

An important step in initiating a new testing program isthe definition of the score scale In terms of which testperformance is reported. The definition of the scale isparticularly important when, as in GMAT, there is a con-tinuing program of building new test forms composedmainly or entirely of new test Items and yielding scoresthat are interchangeable with scores on earlier formsof the test. Any change in the definition of the scorescale in a continuing program-conflicts with strict com-parability of scores from one test form to another andthus introduces ;onfusion and possible error.

Several widely-used tests (College Board ScholasticAptitude Test, Low School Admission Test, GraduateRecord Examinatit.ns Aptitude Test) have scales so de-fined that some reasonably appropriate group of exam-inees has a mean score of 500 and a standard deviationof scores of 100. A further element in the definition mayprescribe that no reported score can be higher than 800or lower than 200. The GJAAT scale was so defined thatit had a mean of 500 and a standard deviation of 100 forthe base group and a possible range from 200 to 800. Inestablishing the GMAT scale, the base group that wasused included all examinees tested in February, May,and August, 1954. in effect, this choice of the referencegroup assumed that the ability level of examinees infuture years would not change so drastically as to re-quire a revision of the definition of the scale. Manygraduate management schools find the score range650-800 important in admissions decisions and a sub-stantial number of exiiminees score in the 200 to 350range. There has been no apparent need to expand orcontract the range of possible scores. Thus, the defini-tion of the scale on the basis of 1954 examinees re-mains satisfactory. Of course, the scale value of 500has-not represented the average performance of exam-inees for many years. The current average score, nowabout 460, can be found only by consulting descriptivestatistics on current examinees,

The choice of the 200 to 800 scale has the advantagethat scores on GMAT cannot be confused with IQ's,percentage grades, or percentile ranks. It is true, how-

1011

Page 11: DOCUMENT RESUME TM 840 693 Schrader, William B. · The Graduate Management Admission Test: Technical Report on Test Development and Score Interpretation ... for Data Sufficiency was

ever, that the use of the same numerical scale forGMAT as for other widely-used 'admissions tests mayoccasionally cause confusion if it is thought, for exam-ple, that a 500 on GMAT has the sanipmeaning as a 500on the Law School Admission Test. Both because thevarious tests measure somewhat different abilities andbecause the examinee groups used In defining thescore scales were different for the different tests, thisassumption of comparability is clearly unwarranted.

The score scales for the Verbal and Quantitativetests were established on the basis of the same groupused for defining the total score scale. However, thescore scales for these tests were set so that the basegroup would have a mean of 30 and a standard devia-tion of 8 for these scores. As part of the definition, itwas decided that no scaled score for Verbal or Quanti-tative could be lower than 0 or higher than 60. The useof different units minimizes the risk that the part scoreswill be confused with the total scores and serves as areminder to users that these tests were designed tosupplement rather than to replace the total score.

Score Equating

In a Program that offers test:, at several administra-tions each year, and gives different test forms at eachadministration, a procedure that will permit scores onthe different test forms tp be used interchangeably isessential. Only in this way can admission officers con-fidently assume that scores earned at different test ad-ministrations are comparable.

If two different forms of the same test are to yield in-terchangeable scores, it is important that the composi-tion of the two tests be as similar as possible. Evenmodest changes in the weight given to various abilitiesresult in some loss of strict comparability. This factdoes not preclude changes in the test but it does em-phasize the need for considering changes carefully be-fore introducing them. It is also highly desirable thatthe difficulty level of items in the new form be matchedas closely as possible with the difficulty level of itemsin the previous form. It is this use of item difficulties intest construction that makes the precise determinationof item difficulty indexes, discussed earlier in this re-port, so important.

Assuming that the new test form has been carefullymatched with previous forms with respect to abilitiesmeasured and difficulty level, score equating corn-pletes the process of making scores fully interchange-able between the new form and previous forms.

The method of score equating described in this re-port was introduced in 1962, and has served as thebasic method for score equating in GMAT since thattime. It is expected that extensive modifications inGMAT equating procedures will be required as the re-sult of legislation enacted by New York State requiringdisclosure of test questions following each test admin-istration. This section, accordingly, should be consid-ered as describing how score comparability was main-tained during the period from 1962 to the present time.

12

This method of equating calls for administering eachnew form with an old form so that the groups takingeach form are substantially equal in ability. In the sim-plest application of this method, the old form and newform are alternated in each pack ge of testi-books.When the number of examinees, to ing each form islarge, it can safely be assumed that is process willproduce groups that are closelyrtt t hed in abilitylevel. Because the old form has already'been equated,it is possible to calculate the mean and standard devia-tion of reported scores on GMAT To. for the group tak-ing it. Then, equating can be done by determining anequation that, when applied to raw scores on the newform, will yield a mean and standard deviation equal tothe mean and standard deviation of reported scores onGMAT Total for the group taking the old form. The sameprocedure is used for equating Verbal and Quantitativescores for the new test form.

The development of the linear equation relating rawscores to reported scores on the new form can be de-scribed briefly. For the old form, the equation relatingraw scores (X0) to reported scores calls for multiplyingthe raw score by a constant (A0) and adding a constant(Bo). We want to determine values of AN and BN for thenew form so that the reported score for the new form--and the old form will have the same mean and standarddeviation for the two equating groups.

If the standard deviations of reported scores for thetwo equating groups are to be made equal, we maywrite:

ANON = A000 ,

where oN and 00 are the raw score standard deviationsof the new and old forms respectively. Then,

00AN = Ao .

Suppose, for example, that the new form has a largerstandard deviation than the old form. Then, ANwill beproportionately larger than Ao, and equating will com-pensate for this difference between the two test forms.

SimiOrly, if the mean reported scores for the twoequating groups are to be equal,

ANMN + BN = AoMo Bo,

where MN and M0 are the mean raw scores for the newand old forms, respectively, so that

BN = AoMo BO ANMN

Thus the new multiplier (AN) and the new additive con-stant (BN) may readily be determined using the meanand standard deviation of raw scores on the new andold forms and the Ao and Bo values for the old form.

A simplified example of the way this equation workscan be developed if we suppose that Ao is equal, to AN.Then, if the new form is easier than the old fokn, itsmean raw score will be higher than the mean rascorefot the old form, so that MN will be larger than 14110. Be-cause the product of AN with MN will be larger than theproduct of Ao with Mo, application of the equation will

11

Page 12: DOCUMENT RESUME TM 840 693 Schrader, William B. · The Graduate Management Admission Test: Technical Report on Test Development and Score Interpretation ... for Data Sufficiency was

make Bit smaller than Bo. Thus, if standard deviationsare equal, equating'compensates for an easier test byreducing the sizeof the additive constant. Of course,under realistic conditions, the relative size of Bo and BNwill depend on both the relative size'of Ao and AN andon the relative size of Mo and M N.

In some instances, three forms are packaged ire a se-quence, making it possible to equate one new form totwo old forms and thus to check on the precision of.equating, or to equate two new forms to one old form. A

chart showing the linkages involiied In maintaining theGMAT score scale is shown in 'Figure 3. It will be notedthat each torm shown as been linktd directly or indi-rectly to the first form tThed in the program.

The basic equating method used for GMAT requires,as a practical matter, that both the old and the newformS be administered in the same testing room. Onthe rare occasions when this condition cannot be ful-filled, more complex procedures utilizing commonitems are necessary. A comprehensive discussion ofequating methods can be found to Angoff (1971). Ap-

pendix B of is report provides a sample of the calcu-lations involved in determining the linear equation re-lating raw scores to scaled scores. )Validity Studies: Purpose and Background

From the beginning of the program, an important basisfor score interpretation has been provided by validitystudies. These studies provide objective evidence onthe extent to which scores predict subsequent perfor-mance. These studies have generally used first yearaverage grades as the measure of academic achieve-ment and test scores and undergraduate averagegrades as predictors. Because undergraduate gradeshave a long history of acceptance as one factor in ad-missions, the relative validity of the test scores andprevious grades' is a point of special interest. A closelyrelated question is the extent to which the use of testscores along with previous grades results in more ef-

fective prediction.The first validity studies were initiated in 195,

soon as the students tested in 1954 had earned firth-year grades (Olsen, P1957). A second series of studieswas initiated in 1958 as part of an effort to evaluatepossible new item types for inclusion in the test (Pit.cher, 1960). In 1963, all schools represented on theCouncil were invited to participate in validity studies,and 19 did so. During the three-year period 1967through 1969-70, 67 graduate schools participated in acomprehensive program in which the study reportswere supplemented by regional seminars at which ad-missions methods as well as study results were dis-cussed (Pitcher, 1972). For a number of years subse-quent to this major effort, the responsibility for con-ducting validity studies rested solely with the schoolsthat use the tests. Recently, a validity study service hasbeen instituted. The new service emphasizes flexibilityby facilitating the use of additional predictors beyondrest scores and undergraduate average grades, of addi-

I.

Add.

1954

1955

1957

1958

1961

1962

1966

1967

1968

1970

1971

1972

1973

1974

1975

1976'

1977

1978.:

Figure 3. Genealogical chart showing linkagesbetween GMAT forms.

tional measures of success beycd first-year grades,and of subgroups as well as the total student group. Amanual for users of the service has been prepared byPowers and Evans (977). This manual is included asone section of the GIOAC Handbook. In 1977-78. 10schools participated in a pilot study of the new service(Powers and Evans, 1973), and during 1978-79, 25schools participate4 in the service. Nearly all pdtici-pating schools have taken advantage of the optionsprovided by the new system.

Validity Studies: MethodsBecause validity study results are customarily ex-pressed in terms of correlation coefficients, a brief dis-cussi n of this index should be useful. Figure 4 pre-sents raphically the relationship between test scoresand g ades for a group of students. Consideration ofthe plotted points reveals a clear upward trend. The

1213

Page 13: DOCUMENT RESUME TM 840 693 Schrader, William B. · The Graduate Management Admission Test: Technical Report on Test Development and Score Interpretation ... for Data Sufficiency was

correlation coefficient provides a widely-accepted, ob-jective method of summarizing a set of data of thiskind. Essentially, the coefficient measures how closelya straight line fitted to the data describes all the points.When the trend is upward from left to right, the coeffi-cient is given a plus sign; when the trend is downward,the coefficient is given a minus sign. If all the pointsfall on the line, the correlation is perfect and the coeffident is 1.0. When there is no upward or downwardtrend in the line, the coefficient is zero. Validity datagenerally show a clear upward trend with the plottingpoints scatteped above and below the bend line, as inFigure 4."The correlation coefficient provides a conve-nient way of summarizing a complicated set of data inthe form of a single number. Although many other waysof analyzing validity data are useful for various pur-poses. none has approached the correlation coef fi-'cient in general acceptance.

Figure 4. How the line of relation summarizes the main trend ofthe relationship between test scores and grades.

(Each dot represents one student.)

Tne correlation coefficient for these data is about .50.

4 0

35

3 0-1

25-

.

4(i^ 450 500 50 6(1./0

TEST SCORES

650

. ..

700 750 800

A/11Pri retiulfS of validity studies are being consideri.d. the question arises as to lust how close a relat,11:,hip a particular correlation represents. Table 1 hasPeen prepared to provide a partial answer to this kind

que!-,tion In preparing this table, the group was divi:led1 ir110 top fifth. middle threefifths and low fifth onthe predictor and on the measure of success. Then theprobability that students 5tandtng at each level on the

edictor will attain each level on the measure of suc-cr,sF, wan determined using published tables of prob,abilities for the bivariate normal distribution (Schrader.1965: For example. with a correlation of 50, a studentle the' top fifth on the predictor has 44 chances in 100 of

In the tub fifth and only 4 chances in 100 of

scoring in the bottom fifth on the measure of success.Consideration of the data presented in Table 1 providesa reasonable estimate of the strength of relationshiprepresented by different levels of correlation coeffi-cients.

Because the combined effectiveness of two or morepredictors is often the primary concern in validity stud-ies, the multiple correlation coefficient, which ex-presses the correlation between the measure of suc-cess and the bestweighted total of scores on two ormore predictofs, is a valuable tool. For this purpose theweights are determined statistically so that the correla-tion will be as high as possible for the set of data beinganalyzed. Thus, the multiple correlation coefficient ofgraduate school first-year grades with a combination ofGMAT scores and undergraduate grades can be com-pared with the corresponding correlation coefficientbased on undergraduate grades only.

Table 1

Relation Between Standing on Predictor and Standing onCriterion for Various Values of the Correlation Coefficient

CorrelationCoefficient

Standir g onPredictor

Per Cent olitudents StandingIn Each Criterion Group

10

Top FifthMiddle Three-FifthsBottom FifthTop Fifth

BottomFifth

MiddleThree-Fifths

TopFifth

16

2024

13

606060

59

242016

2820 Middle ThreeFifths 20 60 20

Bottom Fifth 28 59 13

Top Fifth 10 57 3330 Middle Three-Fifths 19 62 19

Bottom Fifth 33 57 10

Top Fifth 7 55 3840 Middle Three.Fifths 18 64 18

Bottom Fifth 38 55 7

Top Fifth 4 52 4450 Middle Three- Fifths 1/ 66 17

Bottom Fifth 44 52 4

fop Fifth 2 48 5060 Middle.Three Fifthii 16 68 16

Bottom Fifth 50 48 2

Top Fifth 1 43 5670 Middle Threr Fifths 14 72 14

Bottom Fifth 56 43 1

Top Fifth 02 354 64480 Middle Three-Fifths 1 1 8 764 11.8

Bottom Fifth 64 4 35 4 0.2

Top Fifth (0 0021 25 2 74 890 Middle ThreeFifths 8 4 83 2 8.4

Bottom Fifth 74 8 25 2 (0 002)

Validity Studies: Results

The primary concern in this summary of validity studyresults will be with the validity of undergraduate aver-

13

Page 14: DOCUMENT RESUME TM 840 693 Schrader, William B. · The Graduate Management Admission Test: Technical Report on Test Development and Score Interpretation ... for Data Sufficiency was

age grades used alone and with the validity of an opti-mally weighted total of undergraduate average grades,GMAT Verbal scores, and GMAT Quantitative scores.Median correlation coefficients for five studies bearingon this question will be shown, as follows:

(a) 1963.64 results for 17 schools that participated inboth the 1963-64 studies and the 1967-70 studies;

(b) 1967.70 results for the same 17 schools;

(c) 1967.70 results for 69 studies conducted for 67 par-ticipating schools;

(d) 1977-78 results for 10 schools participating in thepilot study of the new Validity Study Service; and

(e) 1978.79 results for 25 participating schools.

In order to enhance the comparability of results between the earlier and more recent studies, the mediansof the multiple correlation coefficients were calculatedfor the earlier studies, using results published in Pit-cher's (1972) survey of pre1972 studies.

Table 2 shows the results for the five comparisons.Perhaps the most striking finding is the fact that themultiple correlation obtained by the use of three pre-dictors (undergraduate average grades, GMAT-Verbal,and GMATQuantitative) is substantially larger than thecorrelation of undergraduate average grades usedalone. The increment in validity attributable to GMATscores ranges from .16 for the 67 schools in the 1967.70studies to .22 for the 10 schools in the 1977.78 studies.Except for the first two sets of coefficients shown inTable 2, the interpretation of the comparisons is ob-scured by the fact that the schools represented in themedians differ from one set of studies to another. Theimportance of this point is supported by the fact that1963.64 and 1967-70 studies show virtually identical re-sults when the comparison is limited to schools thatparticipated in both studies. Under these circumstances, it is difficult to evaluate possible trends in theresults, particularly because the median validity of un-dergraduate average grades ranges only from .23 to .29and the multiple correlation ranges only from .39 to .48in the five sets of studies.

Table 2

Median Validity Coefficients for Undergraduate AverageGrades Separately and in Combination with GMAT Test Scores

Correlation with FirstYear Grades

UndergraduateAverageGrades.

Years GMAT.m winch Number Undergraduate verbal.STItd INS of Average GMAT

were done Schools Grades Ouantila live

1963 64 17* 29 471967 70 17* 28 46196,' 10 67 23 391977.78 10 23 451978 19 25 27 48

D te-

Content and Construct Validity

Tests that are used extensively in admissions have tra-ditionally been evaluated in terms of their effective-ness in predicting academic perform nce. At the sametime, the abilities tested have been juuyad to be rele-vant to the kind of tasks that graduate managementstudents are called upon to perform. Considerations ofthis kind have been implicit in the selection of itemtypes to be tried out for possible inclusion and in deci-sions about what Item types to introduce into GMAT.The survey of 19 graduate schools by Casserly andCampbell (1973) represented a more systematic effortto approximate a job analysis of graduate study. Theirsurvey provided strong support for the relevance of theverbal and quantitative abilities measured by GMAT.Their finding that written English was important con-tributed to the decision to include the Usage section incurrent forms of GMAT.

In recent years the concept of construct validity hasreceived increasing attention from testers. Constructvalidation calls for systematic efforts to find out what atest measures, studying the relation of test perfor-mance to other variables, and the development andtesting of tentative theories that account for the ob-served results (Cronbach, 1971). The well-establishedprinciple that GMAT scores and undergraduate averagegrades supplement each other in the prediction of aca-demic performance shows how much can be gained byusing different measures jointly rather than in isola-tion. Studies of such factors as age, sex, ethnic groupmembership, undergraduate major field, and previousbusiness experience may be regarded as steps towarddeveloping construct validity. Also relevant are thelong-range prediction studies by Harrell (19b9) and byCrooks and Campbell'(1974) and studies of factors af-fecting test peformance such as the speedednessstudy done by Evans and Reilly (1972). Although con-struct validation presents formidable tasks, it repre-sents a promising approach to better use of testscores.

Reliability of Test Scores

Because GMAT scores often play a significant role indecisions about individuals, high standards of reliabili-ty for these scores have been maintained since the pro-gram was begun. The importance of reliability for scoreinterpretation arises from the fact that it measures theconsistency of individual scores from one test form toanother. Unless the test scores are highly reliable, anindividual's relative standing, and hence his or herscore, would show excessive fluctuations.

The logic of reliability is perhaps most readily under-stood if it is thought of as the correlation coefficientbetween scores on two forms of the same test. If theorro forms are closely matched with respect to the abil-ities that they measure, and if they include a reason-ably large number of questions, we would expect eachexaminee's relative standing to be quite similar on thetwo forms. Thus we would expect that, if we adminis-

1 4

15

Page 15: DOCUMENT RESUME TM 840 693 Schrader, William B. · The Graduate Management Admission Test: Technical Report on Test Development and Score Interpretation ... for Data Sufficiency was

tered both forms to each member of a large group of ex-,amlnees and calculated the correlation coefficient be-tween the scores on the two forms, the resulting reli-

S ability coefficient would be relatively high. Rather thancaiculating reliability coefficients by this relativelycumbersome procedure, it is customary to use certaintheoretical developments to estimate what the correla-tion would be between scores on a test form and a sim-ilar form. A more detailed account of the proceduresused in calculating reliability coefficients is given inAppendix 8.

The reliability coefficients for four recent forms ofGMAT are as follows:

Reliability Coefficient of:

Form GMAT Total Verbal QuantitativeA .92 .90 .86

.92 .89 .87

C .92 .90 .88D .93 90 .88

These coefficients are based on a cross section ofexaminees at regular administrations of GMAT. The re-liability coefficients for GMAT Total are, as would be ex-pected, somewhat higher than those for the two partscores. However, the part scores may be considered tohave acceptable reliability, particularly if both aretaken into account in making decisions about individ-uals. The extent of relationship represented by a corre-lation of .90 is shown in Table 1.

Standard Error of Measurement

Although the reliability coefficient is useful for judgingwhether a test yields sufficiently reliable scores to war-rant using the scores as one element in impot tant deci-sions. the standard error of measurement is more use-ful in judging how much individual scores would fluctu-ate from one form to another.

If a person could take a large number of forms of thesame test, we may assume that his or her scores onthese forms would follow a normal distribution with astandard deviation equal to the standard error of mea-surement. The mean score on all the forms is called theindividual's true score. Thus, if the GMAT Total scorehas a standard error of measurement of 30 and if an ex-aminee has a true score of 500, we would estimate thatapproximately two-thirds of his or her observed scoreswould fall between 470 and 530 and 95 percent wouldfall between 440 and 560. There is no way of knowing,of course. whether the person's observed score on aparticular occasion is higher or lower than his or hertrue score. The main value of the standard error of mea-surement is in providing some idea of how much varia-tion in observed scores we would expect to find if a per-son took different forms of the same test.

The size of the standard error of measurement ofscores on GMAT for four recent forms was as follows:

16

Standard Error of Measurement of:

Form Total Score Verbal Quantitative

A 29 3 329 3 3

C 30 3 3D 28 3 3

Standard Error of a Score Difference

When an examinee repeats a test, a number of factors,including the effect of practice, growth in ability duringthe interval between tests, and differences in motiva-tion and anxiety may affect the differences in scores. itis possible, however, to estimate the probability thatvarious score differences are attributable to standarderror of measurement. For this purpose we need toknow the standard error of the difference. If the twotests have equal standard errors of measurement, thestandard error of the difference is simply the standarderror of measurement times VI Thus, if the standarderror of measurement is 30, the standard error of thedifference of two scores would be 42. Assuming thatthe differences are normally distributed, we could con-clude that, if the person's true score remains the same,about two-thirds of the differences would be 42 pointsor less and 95 percent would be 84 points or less. Thesame reasoning can be followed in estimating the like-lihood of various score differences for persons havingthe same true scores.

Table 3

Percentages of Candidates Tested from November 1975through July 1978 Who Scored below

Selected Total Test Scores

Score Percentage below

700 99675 98650 97625 94

600 90

575 85550 79

525 71

500 63

475 53450 44

425 36400

375 21

350 16

325 11

300 08

275 05250 03225 02

Number of Candidates 457,103'

Mean 461

Standard Deviation 107

'Candidates included were self seiected

15

Page 16: DOCUMENT RESUME TM 840 693 Schrader, William B. · The Graduate Management Admission Test: Technical Report on Test Development and Score Interpretation ... for Data Sufficiency was

The fact that the standard error of measurement andthe standard error of the difference can be computedfor test scores offers some aid in interpreting testscores, by indicating the limitations of scores evenwhen tests are professionally constructed and accu-rately scored. Although standard errors of measure-ment are not available for non-test measures such asundergraduate axerage grades, it is plausible that they,too, would vary if'the person had attended a differentcollege, followed a different program of courses, oreven had different itructors. In summary, althoughan individual's test scores are facts, inferences aboutthe individual based on these facts are more realistic ifaccount is taken of the fact that the reliability of thescores is less than perfect.

Descriptive Statistics on Test Scores

Although 500 was the mean score for examinees in1954, the mean of all GMAT takers for November 1975through June 1978 is 461. Thus, an examinee or admis-sions officer whc wishes to know how well a scorecompares *11th those earned by the 'Whole group of ex-aminees reeds information on the distribution ofscores for current applicants to graduate managementschools who take GMAT. This kind of information isprovided in the Guide to the Use of GMAT Scoresprepared for admissions officers and admissions com-mittees and in GMAT Candidate Score InterpretationGuido. The most recent table for GMAT Total from the1978.79 Guide to the Use of GMAT Scores is shown inTable 3. Users of this information are reminded that thegroup includes only those prospective graduate stu-dents of management who take the GMAT, and is thusself-selected. From the viewpoint of examinees, thesedescriptive statistics provide a rough idea of how wellthey have performed on the test. Because graduatemanagement schools differ substantially in the scorelevel of applicants that they attract and of studentswhom they enroll, examinees are urged, in the Bulletinof Information, to talk with a placement or counselingofficer in their undergraduate college. By consideringthe student's test scores in relation to his or her col-lege record, and by drawing on experience with thesuccess of students with various credentials in gainingadmission to various graduate management schools,the placement or counseling officer can often provide amore meaningful interpretation of the scores than isprovided by national statistics.

Biographical Information for GMAT Examinees

In recent years, GMAT examinees have been asked toanswer nine biographical data questions. For most ofthese questions, their answers are transmitted as partof the GMAT report to schools that they designate. Twoquestions, concerned with self-reported language flu-ency and with population subgroup membership. areasked solely for research purposes. and the responsesgiven by an individual to those questions are never re-

ported. Results for a few questions will be reportedhere because they help to describe the total group ofexaminees. Unless otherwise noted, the results arebased on 457,730 questionnaires completed in 1975-76,1976-77, and 1977-78. (Because test repeaters com-pleted a questionnaire each time they were tested, theni.mber of individuals included in the sample is ap-preciably less than the number of questionnairesanalyzed.)

Table 4

Representation of Undergraduate Major Fieldsif! GMAT Examinee Group, 1975.1978

Major Field Percent of Examinees

Business and Commerce (39.9%)Accounting 13.9

Management 9.3

Marketing 5.1

Finance 4.1

Business Education 1.5

Industrial Relations 0.5

Hotel Administration 0.2Other Business and Commerce 5.3

Social Science (25.2%)Economics 8.0Psychology 3.7

Political Science 3.5History 34Education 2.0

Sociology 1.9

Government 0.5

Other Social Science 2.1

Science (21.5%)Engineering 10.3

Mathematics 30Biological Sciences 2.9

Chemistry 1.5

Computer Science 0.9

Physics 0.6

Architecture 0.4

Statistics 02Other Science 1.7

Humanities (7.1 %)English 30Foreign Language 17Fine Arts 10Philosophy 09Other Humanities 0.5

Other Major (6.2%)Of ittis foal 61..nfonem tift,uu. 15 - did nut ttnpu..1 1,. t, I.,. I. n ,n undergraclthitP mayor 1,0%)

Graduate students in management are drawn from awide spectrum of undergraduate major fields, asshown in Table 4. Roughly two-fifths of the examineesmajored in Business and Commerce, about one-fourthmajored in Social Sciences, and over one-fifth majoredin Science. When major fields are considered sepa-rately. Accounting (13.9%), Engineering (10.3%), Man-agement (9.3%), and Economics (8.0%) are most heav-ily represented in the examinee group. Indeed, these

16

17

Page 17: DOCUMENT RESUME TM 840 693 Schrader, William B. · The Graduate Management Admission Test: Technical Report on Test Development and Score Interpretation ... for Data Sufficiency was

four fields account for more than two-fifths of all examinees. Another background Item of special interegtconcerns the number of months of full-time work expe-rience reported by the examinees. Results for the totalgroup may be summarized as follows:

No response or zero months 16.7%1.12 months 19.9%

13-24 months 14.6%25-60 months 23.9%61 or more months 24.9%

In this group nearly one-half (48.8%) had more than twoyears of full-time work experience and nearly one-

Table 5

Representation of Male and Female U.S. Citizens whoReported Membership in Various Population

Subgroups In the GMAT Examinee Group, 1V75-78

Percentage ofPopulation Subgroup Examinee Group

American Indian (0.3%)MateFemale

Black/Negro/Afro-American (6.5%)

MaleFemale

Caucasian/White (87.4%)MaleFemale

Mexican American/Chicano (0.7%)MaleFemale

Onentai/Asian (3.0%)MaleFemale

Puerto Rican (0.4%)MaleFemale

Other (1.7%)MaleFemale

0.20.1

3.82.7

64.123.3

0.60.1

2.1

0.9

030.1

1.30.4

or the Iota. (poop tit examinees l 5°o reported that Iney were not U S Citizens4 9' ,vitt) thy :.(1 not care to respnnO to the question on population mem,in.1 4 8' .11t1 on! respond In the question

18

fourth (24.9%) had more than five years of work experi-ence.

Many GMAT examinees plan to pursue their graduatestudies on a part-time basis. In the 1975.78 group,46.9% reported that they planned to enroll full-time,41.1% reported that the, planned to enroll part-time,and 12.0% were undecided.

Representation of various population subgroups inthe 1975-78 examinee group is shown in Table 5. li is amatter of some interest that the six population groupsother than Caucasian/White constitute 12.6%, or aboutone-eighth, of the examinees who are United Statescitizens and who reported their population group mem-bership.

A separate tabulation of men and women, based onall but two members of the total 1975.78 sample,showed that 74.3% were male and 25.7% were female.(These results differ slightly from the totals for malesand females for the sample on which Table 5 wasbased.) Evidence from other tabulations shows a risingpercettage of female examinees: for the 1973-75 group,the percentage was 18.2 and for the 1978.79 group itwas 31.5.

The great majority of GMAT examinees takes the ex-amination after college graduation. A tabulation of re-sponses for 1977.78 examinees showed the followingdistribution of years of graduation (or expected gradua-tion):

197919781973-771972 or earlier

2.7%25.4%49.6%22.3%

In this group, well over two-thirds (71.9%) of examineeswere tested after they had completed college, andmore than one-fifth (22.3%) were tested more than fiveyears after completing college.

17

Page 18: DOCUMENT RESUME TM 840 693 Schrader, William B. · The Graduate Management Admission Test: Technical Report on Test Development and Score Interpretation ... for Data Sufficiency was

REFERENCES

Angof, f, W. H. -Scores. Norms, and Equivalent Scores." InThorndike. R. L.. editor, Educational Measurement, SecondEdition. Washington: American Council on Education, 1971.

Casserly, P. L., & Campbell,i. T. A Survey of Skills and AbilitiesNeeded for Graduate Study in Business. ATGSB Researchand Development Brief Number 9. Princeton, N.J.: Educational Testing Service, 1973.

Cronbach, L. J. "Test Validation." In Thorndike, R. L. editor,Educational Measurement, Second Edition. Washington:American Council on Education, 1971.

Crooks. L. A.. & Campbell, J. T. Career Progress of MBAs: AnExploratory Study Six Years Alter Graduation. Program Re-port PR74-8. Princeton. N.J.: Educational Testing Service,1974.

Evans, F. T.. & Reilly, R. R. The Test Speededness Study: AStudy of Test Speededness as a Potential Source of Bias inthe Quantitative Score of the Admission Test for GraduateStudy in Business ATGSB Research and Development BriefNumber 8. Princeton, N.J.: Educational Testing Service,1972.

Guilford, J P.. & Fruchter, B.. Fundamental Statistics in Psychology and Education, Fifth Edition. New York: McGrawHill. 1973.

Harrell, T. W. Predicting Job Success of MBA Graduates.ATGSB Research and Development Brief Number 1. Prince-ton. N.J : Educational Testing Service, 1969.

Olsen. M. The Admission Test for Graduate Study in BusinessAs A Predictor of FirstYear Grades in Business Schools.1954-55. Statistical Report SR57-3. Princeton. N.J.: Educa-tional Testing Service, 1957.

Pitcher, B. The Effectiveness of Tests of Directed Memory, Or-ganization of Ideas and Data Sufficiency for Predicting Busi-ness School Grades. Statistical Report SR60-33. Princeton,N.J.: Educational Testing Service, 1960.

Pitcher, B., & Winterbottom, J. A. The Admission Test for Grad.uate Study in Business as a Predictor of FirstYear Grades inBusiness School, 1962-63. Statistical Report SR65-21.Princeton, N.J.: Educational Testing Service, 1965.

Pitcher, B. Report of Validity Studies Carried Out by ETS forGraduate Schools of Business, 1954-1970. Statistical ReportSR72.30, Princeton, N.J.: Educational Testing Service, 1972.

Powers, D. E. & Evans, F. R. Designing Your Validity Study: AManual for the Graduate Management Admission CouncilValidity Study Service. Program Report 77.14. Princeton,N J: Educational Testing Service, 1977.

Powers, D. E., & Evans. F. R. Relationships of PreadmissionMeasures to Academic Success in Graduate ManagementEducation. Research Bulletin RB78-11. (prepublicationdraft) Princeton, N.J.: Educational Testing Service, 1978.

Schrader. W. B. A Taxonomy of Expectancy Tables. Journal ofEducational Measurement. 1965, 2, 29-35.

1819

Page 19: DOCUMENT RESUME TM 840 693 Schrader, William B. · The Graduate Management Admission Test: Technical Report on Test Development and Score Interpretation ... for Data Sufficiency was

APPENDIX ASample Items For Item Types Used In GMAT

This appendix includes examples of each item typethat has been included in GMAT since the test was firstadministered in 1954. For each item type, the monthand year of the administration at which it was intro-duced is stated. For item types not currently in use, themonth and year of the first administration at which itwas replaced is stated.

For the item type designated as Verbal, three sub-types (Analogies, Antonyms, and Sentence Comple-tion) are shown. For the Quantitative item type, twosubtypes (Data Interpretation and Problem Solving) areshown.

Characteristically, item types that involve reading apassage or interpreting a tabular or graphic presenta-tion include five multiple choice items. For most sam-ples given in this appendix, however, only one of themultiple choice items is shown.

The correct response for each sample item is shown,either by starring the correct response or, in a few in-stances, by showing the correct response in paren-theses.

Verbal

Introduced: 2154Replaced: 2/61Restored: 2/66Replaced: 10176

Analogies

Directions: In each of the following questions, a relatedpair of words or phrases is followed by five letteredpairs of words or phrases. Select the lettered pairwhich best expresses a relationship similar to that ex-pressed in the original pair.

Astronomy : Astrology

(A) chemistry : alchemy' (B) biology : botany(C) religion : mythology (D) geography : geology

(E) medicine : magic

Antonyms

Directions: Each question below consists of a wordprinted in capital letters, followed by five words orphrases lettered A through E. Choose the lettered wordor phrase which is most nearly opposite in meaning tothe word in capital letters.

Since some of the questions require you to distin-guish fine shades of meaning, be sure to consider allthe choices before deciding which one is best.

DOUR: (A) blithe' (B) talkative (C) inflexible(0) nest (E) modish

Sentence completion

Directions: Each of the sentences below has one ormore blank spaces, each blank indicating that a wordhas been omitted. Beneath the sentence are five let-tered words or sets of words. You are to choose the oneword or set of words which, when inserted in the sen-tence, best fits in with the meaning ofthe sentence asa whole.

The manufacture of cupboards and doors, bathtubs end cook-ing stoves, taking place as It does in factories, should be unaf-fected by ; but since the articles are parts of buildingsand there is no demand for them unless buildings are goingup, they too are in activity.

(A) price .. sluggish (B) cost .. expensive(C) weather .. seasonal' (D) methodology .. regulated

(E) policies .. unstable

Quantitative

Introduced: 2/54(Currently in use)

Dal( Interpretation

Directions: In this section solve each problem, usingany available space on the page for scratch work. Thenindicate the one correct answer in the appropriatespace on the answer sheet.

Per Cent of the Total Value of U.S. Lend-LeaseSupplies Received by U.S. Allies

1st Year 2nd Year

Britain 68% 38%Russia 5% 30%All others 27% 32%

Total Value of Supplies(in billions of dollars) 2

What per cent of the total value of lend-lease supplies for bothyears was received by Russia and Britain combined?

(A) 31 (B) 44 (C) 69* (D) 70.5 (E) 141

Problem Solving

If the length of a rectangle is increased by 10 per cent and thewidth by 40 per cent, by what per cent is the area increased?

(A) 4 (B) 15.4 (C) 50 (D) 54* (E) 40Q

Best Arguments

Introduced: 2/54Replaced: 11/61

1921

Page 20: DOCUMENT RESUME TM 840 693 Schrader, William B. · The Graduate Management Admission Test: Technical Report on Test Development and Score Interpretation ... for Data Sufficiency was

Directions: The questions in this part are based on sit-uations which involve some sort of dispute or disagree.ment. In most of the questions you will be asked toevaluate the arguments which might be offered by thedisputants; some questions will require you to analyzethe situations in other ways. You are to assume thatthese disputes are being brought before an intelligentlay arbitrator (not a court of law) for decision; the ques-tions, therefore, will not involve any legal precedents ortechnicalities. You are to evaluate the situations objec-tively in terms of ordinary concepts of reasonablenessand fair play and base your answers on a logical anal-ysis of the facts and arguments as they are presentedto you.

Bruce Bond, a broker, one morning overhears a famous fi.nancier say, "The price of American Beartrap stock will gosky-high within two weeks." Later that day Pete Good-fellow, an old friend to whom Bond owes many favors,calls on Bond to ask for advice about investments. He ernphaslzes that he wants to buy some stock on which hecan make money quickly because he Is in a tight financialspot. Bond says that American Beartrap is the best buy haknows at the moment. When Goodfellow protests that hehas never heard of American Beartrap, Bonds replies thatthe basis of his recommendation is information receivedfrom a reliable source. Goodfellow accepts the advice andinvests heavily. Within two weeks American Beartrapstock has become virtually worthless, Goodfellow's entireinvestment is lost, and Goodfellow is ruined financially.Goodfellow thereupon accuses Bond of causing his finan-cial downfall.

Which one of the following arguments best supports Goodfel-low's accusation?

(A) Bond should not have presumed to give Goodfellow anyadvice.

(B) Goodfellow naturally believed that Bond wanted to helphim.

(C) Bond had misrepresented his knowledge of the situation.'(D) Bond should have cautioned Goodfellow not to invest too

heavily in the stock.(E) Bond had taken advantage of Goodfellow's obvious lack of

knowledge of financial matters.

Quantitative Reading

Introduced: 2/54Replaced. 11/61

There have been many suggestions that in an emergen-cy the professional schools. particularly medicalschools. accelerate their programs. thus graduatingmore trained men and women. If more doctors are to betrained we must have more of the three essentials ofsuch a process--teachers, students. and equipmentor we must utilize those which we have to greater ef-fect. But objections have been made to asking stu-dents and faculty to work through the four quarters ofthe year. and the plan herewith submitted, recognizingthese objections. attempts instead a fuller utilization ofthe third essential. eauipment and supplies, to realizeef fectively the objectives of an accelerated pr gram.

22

The proposed plan, which is essentially the more fre-quent admission of freshman classes, is designed forthose schools which operate on the quarter, as op-posed to the semester, plan. Following this plan such aschool could graduate four classes in three years bythe admission of a freshman class every nine months.

An illustration ',from the table below may serve toclarify the proposal. In accordance with the plan, oneclass, indicated on the table by the letter A, would en-ter medical school as freshmen in the Summer Quarterof 1951, continue in school through three consecutivequarters, and go on vacation during the Spring Quarterof 1952. Students in class B would enter as freshmen inthe Spring Quarter of 1952, continue in school throughthe Summer and Autumn Quarters of 1952, and go onvacation during the Winter Quarter, at which time stu-dents in class C would begin their freshman year. Ascan be seen from the table, there would always be afreshman, sophomore, junior, and senior class inschool.

Quarter Plan Organization for Class Acceleration

Summer Autumn Winter Spring

1951.52 A, A,X,Y,Z, A,X,Y,Z B,X,Y921952.53 A,B, A,B,X,Y,, A,C,XY B,C,X,Y1953 54 A,B,C, A,B,D,X,, A,C,D,X B,C,D,X,,1954.55 A,,B,C,E, A B,D,E, B,,C,D,F,1955.56 BC,E,F, B,,D,E,F, C,,D,E,G, CD,F,G,1956.57 C,,E,F,G, D,,E,F,H, DE,G,H, D,,F,G,H,1957.58 E,,FGel, EF,H,I, E,,G,H,I, F,,G,H,J,1958.59 FG,I,J, F,,H,I,J, G,,HI,K, GH,J,K,

Classes X, Y, and Z are included in the plan

(A) as the second, third, and fourth new classes(B) as convenient symbols to take up the lapse before the ad

mission of B(C) as illustrations of the classes which work through four

quarters a year(D) to show how classes already formed fit into the new plan'(E) to indicate the necessity for a summer recess

Organization of Ideas

Introduced: 11/61Replaced: 11166

Directions: Each set of questions in this section con-sists of a number of statements. Most of these state-ments refer to the same subject or idea. The state-ments can be classified as follows:

(A) the central idea to which most of the statements are re.lated;

(8) main supporting ideas, which are general points directlyrelated to the central idea;

(C) illustrative facts or detailed statements. which documentthe main supporting idea;

(D) statements irrelevant to the central idea.

20

Page 21: DOCUMENT RESUME TM 840 693 Schrader, William B. · The Graduate Management Admission Test: Technical Report on Test Development and Score Interpretation ... for Data Sufficiency was

The sentences do not make up one complete paragraph. Theymay be regarded as the components of a sentence-outline fora brief essay. The outline might, for example, have the follow-ing form:

A contains the central ideaB contains a main supporting ides

C presents an Illustrative factB contains a main supporting Idea

C presents an Illustrative NotC presents an illustrative isct

Classify each of the following sentences in accordance withthe system described above.

(B) The Roman road* connected ail parts of the empire withRome.

(B) The Roman roads were so well built that some of them re-mai ay.

(A) One the greatest achievements of the Romans was theirextensive and durable system of roods.

(0) Wealthy travelers in Roman times used horse-drawncoaches.

(C) Along Roman roads caravans would bring to Rome luxu-ries from Alexandria end the East.

Directed MemoryReading RecallIntroduced: 11/61Replaced: 10/77

Directions: In the test you wrll be given a period.of timefor the study of several extended prose passages.Then, without looking back :at the passages, you willanswer questions based on their contents. The follow-ing exercise is much shorter than those appearing onthe test. but it illustrates the gitneral nature of the pas-sages and the questions. Remember. though, that onthe test you will not be allowed' to refer back to the pas-sages.

SAMPLE PASSAGE:

Soon after the First World War began, public attentionwas concentrated on the spectacular activities of the sub.marine, and the question was raised more pointedly thanever whether or not the day of the battleship had ended.Naval men conceded the importance of the 1.1boat andrecognized the need for defense against it, but they stillplaced their confidence in big guns and big ships. TheGerman naval victory at Coronel, off Chile, and the Britishvictories at the Falkland Islands and in** North Sea con-vinced the experts that fortune still favored superior guns(even though speed played an important part in these bat.ties); and, as long as British dreadnoughts kept the Ger.man High Seas Fleet immobilized, the battleship re-mained in the eyes of naval men the key to naval power.

Public attention was focused on the submarine because

(A) it had immobilized the German High Seas Fleet(8) it had played a major role in the British victories at the

Falkland Islands and in the North Sea(C) at had taken the place of the battleship(0) of its spectacular activities'(E) of its superior speed

Data Sufficiency

Introduced. 11/61(currently in use)

Directions: Each of the questions below is followed bytwo statements labeled (1) and (2), in which certain dataare given. In these questions you do not actually haveto compute an answer, but rather you have to decidewhether the data given in the statements are sufficientfor answering the question. Using the data given in thestatements plus your knowledge of mathematics andeveryday facts (such as the number of days in July),you are to blacken space

A If statement (1) ALONE is sufficient, but statement (2)alone Is not sufficient to answer the question asked;

B if statement (21 ALONE is sufficient, but statement (1)alone is not sufficient to answer the question asked;

C if BOTH statements (1) and (2) TOGETHER are suffi-cient to answer the question asked, but NEITHERstatement ALONE is sufficient;

D If EACH statement ALONE is sufficient to answer thequestion asked;

E if statements (1) and (2) TOGETHER are NOT sufficientto answer the question asked, and additional data spe-cific to the problem are needed.

(E) In a four-volume work, what is the weight of the third vol-ume?

(1) The four-volume work weighs 5 pounds.(2) The first three volumes together weigh 6 pounds.

Practical Business Judgment:

Introduced: 11/72(currently in use)

Directions: The passage in this section is followed bytwo sets of questions, data evaluation and data appli-cation. In the first set, data evaluation, you will be re-quired to classify certain of the facts presented in thepassage' on the basis of their importance, as illustratedin the following example.

(This sample passage is much shorter than passagesappearing in the test, but it is representative of dataevaluation material.)

SAMPLE PASSAGE

Fred North, a prospering hardware dealer in Hillidale,Connecticut, felt that ho needed more store space to ac-commodate a new line of farm equipment and repair partsthat\pe intended to carry. A number of New York City corn.mute rt had recently purchased tracts of land in the en-virons Hillidale and there had taken up farming on asmall scale. Mr. North, foreseeing a potential increase infarming In that area, wanted to expandltis business tocater to this market. North felt that the most feasible andappealing recourse open to him would be to purchase theadjoining store owned by Mike Johnson, who used thepremises for his small grocery store. Johnson's businesshad been on the decline for over a year since the advent ofa large supermarket in the town. North felt that Johnson

2123

Page 22: DOCUMENT RESUME TM 840 693 Schrader, William B. · The Graduate Management Admission Test: Technical Report on Test Development and Score Interpretation ... for Data Sufficiency was

would be willing to sell the property at reasonable terms,and this was Important since North, after the purchase ofthe new merchandise, would have little capital availableto Invest in the expansion of his store.

Consider each item separately in terms of the passage andchooseA If the item is a MAJOR OBJECTIVE in making the decision,

that is, one of the outcomes or results sought by the dee'.sionmaker,

B if the item is a MAJOR FACTOR In making the decision,that is, a consideration, explicitly mentioned in the pass-age, that is basic in determining the decision;

C if the item is a MINOR FACTOR in making the decision, thatis, a secondary consideration that affects the criteria tan.-genteelly, relating to a Major Factor rather than to an Objec-tive;

D if the Item is a MAJOR ASSUMPTION in making the deci-sion, that is, a supposition or projection made by the deo'.slonmaker before weighing the variables;

E if the item is an UNIMPORTANT ISSUE in making the decision, that is, a factor that is insignificant or not immediate.ly relevant to the situation.

SAMPLE DATA EVALUATION QUESTIONS

(0) Increase in farming in the Hillidale area(A) Acquisition of property for expanding store(B) Cost of Johnson's property(C) State of Johnson's grocery business(E) Quality of the farm equipment North intends to sell

A second set of questions, data application, requiresjudgments based on a comparison of the available al-ternatives in terms of the relevant criteria, in order toattain the objectives stated in the passage.

Each of we following questions relates to the pas-sage. For each question, choose the best answer

SAMPLE DATA APPLICATION QUESTIONS

I Potential demand for farm equipment in the Hillidale areaII Desire to undermine Mike Johnson's businessIII Higher profit margin on farm equipment than on hardware

goods

(A) I only' (B) Ill only (C) I and II only(0) II and III only (E) I, Ii, and III

Usage

Introduced: 10/76(currently in use)

Directions: The following sentences contain problernoin grammar. usage, diction (choice of words), andidiom Some sentences are correct. No sentence con-tains more than one error.

You will find that the error, if there is one, is underlined and lettered. Assume that all other elements of thesentence are correct and cannot be changed. In choosinq answers, follow the requirements of standard writ-ten English

24

If there is an error, select the one underlined part thatmust be changed In order to make the sentence cor-rect, and blacken the corresponding space on theanswer sheet.

If there Is no error, mark answer space E.

(C) He spoke bluntly and angrily to we spectators. No errorA E

(A) He works every day so that he would become financially in-A

dependent In his old age. No errorE

Reading Comprehension

Introduced: 10177(currently in use)

Directions: Each passage in this group is followed byquestions based on its content. After reading a pas-sage, choose the best answer to each question andblacken the corresponding space on the answer sheet.Answer all questions following a passage on the basisof what is stated or implied in that passage.

One sample reading passage follows. (It is muchshorter than passages in the test, but it illustrates theirgeneral nature.)

SAMPLE PASSAGE

Not until the mid1960's did any agriculturally basedunions in the Southwest show promise of sustained op-eration. Mexican Americans were involved in efforts dur-ing the 1920's and 1930's to establish farm labor unions,but although these efforts resulted in dramatic and par-tially successful strikes, they were episodic and withoutorganizational continuity.

The migratory work pattern compounded the problemof labor organization. A dispersed population in motion isnot an easy target for organizational appeals. Industrialunionism had the advantage of mobilizing workers whowere more concentrated in workplace and residence. Itwas easier to organize a work force whose members filedin and out of the workplace at fixed locations and at fixedtimes and who were exposed to daily contact with organizers. In addition, the multipleemployer structure offarm work, partially a function of labor mobility, made Itless likely that union gains in one area could be trans.ferred to another.

QUESTION ON READING PASSAGE

According to the passage, when did the first efforts of Mesi.can Americans to form agricultural unions take place?

(A) At the turn of the century(B) During the 1920's(C) During the 1930's(0) Immediately after the Second World War(E) During the 1960's

22

Page 23: DOCUMENT RESUME TM 840 693 Schrader, William B. · The Graduate Management Admission Test: Technical Report on Test Development and Score Interpretation ... for Data Sufficiency was

APPENDIX 13Calculating Reliability Coefficients and Equating Parameters

Calculating Reliability Coefficeinta

For the relatively unspeeded homogeneous subtests ofthe type that constitute GMAT, the reliability coefficientfor each subtest can readily be calculated. All neces-sary data for these calculations are readily obtainedwhen the test is given at a regular test administration.Once the reliability of each subtest has been deter-mined, the reliability of GMAT Total, Verbal, and Quanti-tative scores can also be determined by well estab-lished methods.

The method used for calculating subtest reliabilitiesis based on the assumption that the intercorrelationsof all items included in the subtest are substantiallyuniform, as would be expected to happen if all itemsare measuring the same basic ability. For example, ifeach item on a reading comprehension test were corre-lated with each other item, the whole set of intercorre-lations would be expected to be essentially uniform.Under these conditions, convenient formulas for calcu-lating the reliability coefficient can be derived.

The particular formula used in calculating reliabilitycoefficients for GMAT subtests (called Kuder-Richard-son Formula No. 20) has been adapted for use withformula-scored tests. To calculate the reliability of aGMAT part, the following items of information areneeded:

(a) the number of items (called "n"),(b) the number of right answers for each item (called

"R"),(c) the number of wrong answers for each item (called

(d) the fraction by which answers are multiplied beforesubtracting from the number of right answers in cal-culating formula scores (called "k").

(e) the standard deviation of formula scores (called"or"), and

(f) the number of examinees in the sample (called"N").

In describing the process of calculating a reliabilitycoefficient, formulas for using the item analysis resultsmay be considered first. In these formulas, the symbol'T' means that the results are to be summed over allitems. The formulas are:

NIA - IR'A-Nz

B= N1W -1W2N2

TRW----N3

It may be useful to express the first formula in wordsIn calculating A. we begin by summing the number of

right answers for all items and multiplying the resultsby the number of examinees. From that result, we sub-tract the sum of the squares of the number of rightanswers for all items. The resulting number is thendivided by the.square of the number of examinees.

The values calculated by these three formulas arethen substituted in the following formula:

N 2B + 2kC12

).Reliability = -N 1 1

A + k0

The following numerical sample illustrates the calcu-lation of the reliability of the Reading Comprehensionsubtest. Because the test is composed of 25 five -choice items, n is equal to 25, and k is equal to .25. Thebasic data for the calculations are obtained from thecomputer printout, as follows:

N= 2.080XR = 31,456

SW= 42,480.212

Y.W = 15.961

DAP = 12,427.015

XRW = 17,731,905of ;T.- 5.3847

From these figures, we can compute values for A, 8,C, as follows:

(2.080) (31.456) 42,480,212= 5 3042 .

(2,080)2

B =(2,080) (15,961) 12,427 015 = 4.8012

(2,080)2

C =17'731'905

4.0985 .

(2.080)2

The reliability for the Reading Comprehension part maynow be calculated, as follows:

Reliability=25

24[i -5.30424(.25)2(4.8012)42(.25)(4.0985] .

(5.3847)2

The resulting value for the reliability of this part is .767.Similar calculations provide reliability coefficients foreach of the other parts.

Once the reliability of each part has been calculated,it becomes possible to calculate the reliability of thetotal score. For this purpose, we need the standard er-ror of measurement of each part. The standard error ofmeasurement can be defined by the following formula:

Standard Error of Measurement = a - reliability.

It can be shown that the reliability of the total scoreequals:

1 Sum of squared standard errors of measurementSquared standard deviation of fo lit score

2325

Page 24: DOCUMENT RESUME TM 840 693 Schrader, William B. · The Graduate Management Admission Test: Technical Report on Test Development and Score Interpretation ... for Data Sufficiency was

Table B1 illustrates the calculation of the reliabilityof the Total score for a recent form of GMAT. A similarprocedure Is followed In calculating the reliability ofthe Verbal and Quantitative part scores.

Table B1

Calculation of Reliability Coefficient of GMAT Total Score

I. Basic Data

PartReliability

of PartStandardDeviation

SquaredStandard Standard&mot Env 01

Measurement Measurement

ReadingCompre.hension .7667 5.3847 2.6008 6.7642

ProblemSolving .7489 4.6941 2.3520 5.5319PracticalJudgment .6219 3.9185 2.4094 5.8052

DataSuffi-ciency .8029 6.1773 2.7428 7.523.3

Usage .7626 5.5644 2.7111 7.3501

PracticalJudgment .5889 3.8463 2.4663 6.0826

Total of Squared Standard Errors of Measurement 39.0570

TN. standard deviation of total scores is 21 8836Trifse v ilueS were calculated using more decimal places than are shown in thetat le

II. Calculations

Sum of squared standard errors of estimate-Squared standard deviation

Reliability

, .39 0570 , 39 0570- .918.(2.1 8836P 478.8919

Calculating Equating Parameters

The basic method of equating used for GMAT scoresfrom 1962 to the present time makes use of the wellknown statistical principle that if a very large group isdivided at random into two or more subgroups, the resuiting subgroups will be quite similar in every characteristic. In applying this method we administer a formfor v, hich scaled scores are already known for one ran-dom subgroup and we administer the new form to a dif-ferent random subgroup. Assuming that the two ran.do' subgroups are equal in the ability measured by

26

GMAT, we attribute differences In scores between thetwo forms to a difference in the difficulty of the twotasks.

In order to relate raw scores on a new form to theGMAT scale, we need to know:

(a) the equation relating raw scores to scaled scoresfor the old form;

(b) the mean and standard deviation of raw scores forthe random subgroup of examinees who took theold form; and

(c) the mean and standard deviation of raw scores forthe random subgroup of examinees who took thenew form.

The following data were available for 9,850 exam-inees who took the old form and 9,795 examinees whotook the new form:

Old Form New FormMean 65.765279 63.586932Stanard Deviation 22.795999 23.601557

Equating files show that the equation for convertingraw scores to scaled scores for the old form has a mul-tiplier (A0) of 4.69 and an additive constant (Bo) of167.0300.

To solve for the multiplier for the new form, we usethe formula

= A 0 (f2)0N

AN = 4.63 69 ( 22.795999 )23.601557

AN = 4.478635*.

This value agrees with the computer determinedvalue of 4.4786.

To solve for the additive constant for the new form,we use the formula'

BN = AoMo - ANMN + Bo .

so that

BN = (4.6369) (65.765279) - (4.478635) (63.586932) +167.0300= 304.9470 284.7827 + 167.0300= 187.1943.

As it turned out. the value obtained when E31, was calculated by the computer was also 187.1943, although thesample calculation did not reproduce in detail the cal-culations performed by the computer.

24