14
Journal of Educational Measurement Fall 2009, Vol. 46, No. 3, pp. 330–343 The Circle-Arc Method for Equating in Small Samples Samuel A. Livingston and Sooyeon Kim Educational Testing Service This article suggests a method for estimating a test-score equating relationship from small samples of test takers. The method does not require the estimated equating transformation to be linear. Instead, it constrains the estimated equating curve to pass through two pre-specified end points and a middle point determined from the data. In a resampling study with two test forms that differed substantially in diffi- culty, the proposed method compared favorably with other equating methods, espe- cially for equating scores below the 10th percentile and above the 90th percentile. The Problem Often, it is necessary to equate scores on a new form of a test taken by a very small number of test takers: 30, 20, or even fewer. Equating test scores on the basis of such a small sample of test takers is likely to produce results that will not generalize well to other groups of test takers. As the statisticians responsible for the equating of the scores, we cannot define the problem out of existence by claiming that the small group of test takers whose new-form scores we can observe are the entire target population (i.e., the population for which we want the equating to be correct). A test taker’s reported score should not depend heavily on the particular group of test takers who happened to take the test at the same time. To avoid that kind of dependence, we need an equating relationship that will generalize, at least approximately, to other groups of test takers. Most of our own experience in equating with very small samples has been with teacher-certification tests. Each state decides what tests to require for certification in a given subject. In this testing program, small-sample equating situations typi- cally occur when a new test has been adopted by only a small number of states— particularly, if the subject is one in which not many new teachers are certified each year (e.g., physics). We have encountered several situations with fewer than 30 peo- ple taking the new form and some situations with fewer than 20. Although the tests are built to detailed content specifications, the test makers have no opportunity to pretest the items on large samples of test takers. Consequently, a new test form may be substantially harder or easier than the form to which it is being equated. Statisticians often deal with the problem of small samples by using “strong mod- els” that reduce the number of parameters to estimate from the data—in effect, sub- stituting assumptions for data. In test-score equating, a common strong model is the linear equating model, which assumes that, in the target population, the distributions of scores on the test forms to be equated differ only in their means and standard deviations (SD). For very small samples, there is an even stronger model, “mean equating” (Kolen & Brennan, 2004, p. 125), which assumes that those distributions differ only in their means. But when test forms differ substantially in difficulty, the 330 Copyright c 2009 by the National Council on Measurement in Education

The Circle-Arc Method for Equating in Small Samples

Embed Size (px)

Citation preview

Page 1: The Circle-Arc Method for Equating in Small Samples

Journal of Educational MeasurementFall 2009, Vol. 46, No. 3, pp. 330–343

The Circle-Arc Method for Equating in Small Samples

Samuel A. Livingston and Sooyeon KimEducational Testing Service

This article suggests a method for estimating a test-score equating relationship fromsmall samples of test takers. The method does not require the estimated equatingtransformation to be linear. Instead, it constrains the estimated equating curve topass through two pre-specified end points and a middle point determined from thedata. In a resampling study with two test forms that differed substantially in diffi-culty, the proposed method compared favorably with other equating methods, espe-cially for equating scores below the 10th percentile and above the 90th percentile.

The Problem

Often, it is necessary to equate scores on a new form of a test taken by a very smallnumber of test takers: 30, 20, or even fewer. Equating test scores on the basis of sucha small sample of test takers is likely to produce results that will not generalize wellto other groups of test takers. As the statisticians responsible for the equating of thescores, we cannot define the problem out of existence by claiming that the smallgroup of test takers whose new-form scores we can observe are the entire targetpopulation (i.e., the population for which we want the equating to be correct). A testtaker’s reported score should not depend heavily on the particular group of test takerswho happened to take the test at the same time. To avoid that kind of dependence,we need an equating relationship that will generalize, at least approximately, to othergroups of test takers.

Most of our own experience in equating with very small samples has been withteacher-certification tests. Each state decides what tests to require for certificationin a given subject. In this testing program, small-sample equating situations typi-cally occur when a new test has been adopted by only a small number of states—particularly, if the subject is one in which not many new teachers are certified eachyear (e.g., physics). We have encountered several situations with fewer than 30 peo-ple taking the new form and some situations with fewer than 20. Although the testsare built to detailed content specifications, the test makers have no opportunity topretest the items on large samples of test takers. Consequently, a new test form maybe substantially harder or easier than the form to which it is being equated.

Statisticians often deal with the problem of small samples by using “strong mod-els” that reduce the number of parameters to estimate from the data—in effect, sub-stituting assumptions for data. In test-score equating, a common strong model is thelinear equating model, which assumes that, in the target population, the distributionsof scores on the test forms to be equated differ only in their means and standarddeviations (SD). For very small samples, there is an even stronger model, “meanequating” (Kolen & Brennan, 2004, p. 125), which assumes that those distributionsdiffer only in their means. But when test forms differ substantially in difficulty, the

330 Copyright c© 2009 by the National Council on Measurement in Education

Page 2: The Circle-Arc Method for Equating in Small Samples

Circle-Arc Method for Equating in Small Samples

score distributions tend to differ in shape, so that the equating relationship is notlinear—not even approximately linear. A difficult form and an easy form, adminis-tered to the same group of test takers, will produce differently skewed distributions.The difficult form will spread out the scores of the higher-ability test takers andbunch together the scores of the lower-ability test takers. The easy form will do theopposite. Consequently, the slope of the equating transformation will not be the samefor the weaker test takers as it is for the stronger test takers. A linear transformation,with its constant slope, cannot capture this aspect of the equating relationship. Theinaccuracies tend to be greatest at the ends of the score distribution, where the lin-ear equating transformation may extend beyond the range of scores possible on thereference form. We need a better method.

Previous Research

Only a few previous studies have investigated the performance of existing methodsfor equating in small samples of test takers. Typically, these have been resamplingstudies, in which a pair of test forms is equated in large groups of test takers and thenin repeated small samples from the large groups. Table 1 briefly summarizes fivesuch studies. Livingston (1993) found that presmoothing increased the accuracy ofequipercentile equating about as much as doubling the sample size. Hanson, Zeng,and Colton (1994) found that “ . . . both presmoothing and postsmoothing methods

TABLE 1Five Previous Studies of Equating in Small Samples

Sample Pairs ofEquating Sizes Equating FormsDesign(s) Investigated Method(s) Investigated

Livingston(1993)

Common items 25, 50, 100, 200 Chainedequipercentile withand withoutpresmoothing

1

Hanson et al.(1994)

Random groups 100, 250, 500,1,000, 3,000

Equipercentile withvarious pre- andpostsmoothingmethods

5

Parshall et al.(1995)

Common items 15, 25, 50, 100 Levine-Angoff(linear)

5

Skaggs(2005)

Random groups 25, 50, 75, 100,150, 200

Equipercentile withand withoutpresmoothing,linear, mean,identity

2

Kim et al.(2008)

Common items,externalanchor

10, 25, 50, 100,200

Chained linear,identity, “synthetic”(average of chainedlinear & identity)

2

331

Page 3: The Circle-Arc Method for Equating in Small Samples

Livingston and Kim

can improve estimation of the equipercentile equating function” (p. ii). Parshall,Houghton, and Kromrey (1995) found “ . . . substantial increases in standard errorsas sample size decreases” (p. 37). Skaggs (2005) found mean equating to be moreaccurate than the other methods for below-average scores but not as accurate as theother methods for above-average scores. Kim, von Davier, and Haberman (2008)found that with samples of 10, 25, and 50 test takers, the identity was more accuratethan the chained linear equating; with samples of 200 test takers, an equally weightedcombination of the chained linear equating and the identity was more accurate thaneither of its two components.

A Proposed Solution

The basic idea behind our method comes from a method proposed by Divgi (1987).His idea was to constrain the estimated equating curve to pass through two pre-specified end points and an empirically determined middle point. In Divgi’s method,the end points were determined by the maximum and minimum possible scores onthe test forms to be equated. The middle point was determined by the mean scores.The estimated equating transformation was a cubic polynomial passing through thosethree points, with the slope of the curve at the middle point determined by the ratioof the SD.

The method proposed here is somewhat different, but it also constrains the equat-ing curve to pass through two prespecified end points and an empirically determinedmiddle point. The upper end point of the curve is determined by the maximum possi-ble score on each form. The lower end point of the curve is determined by the lowestmeaningful score on each form. On a multiple-choice test, the lowest meaningfulscore would typically be the “chance score”—the expected score for a test taker whoanswers without reading the questions. The middle point on the curve is determinedfrom the data, by equating at one point in the middle of the score distribution. Ifthose three points happen to lie on a straight line, that line is the estimated equat-ing curve. If the three points do not lie on a straight line, they determine an arc ofa circle. It would be possible to use this arc as an estimate of the equating curve inthe interval between the end points. However, there is an alternative procedure that iscomputationally simpler and produces a curve that more closely resembles the curvesthat result from equipercentile equating in large groups. Borrowing an idea from vonDavier, Holland, and Thayer (2004, pp. 11–13), we decompose the equating func-tion into a linear component and a curvilinear component. We then use a circle arcas the model for the curvilinear component. The circle arc is a better model for thedeviation of the equating curve from linearity than for the shape of the curve itself.To extend the equating function below the lowest meaningful score, the lower endpoint of the curve can be connected linearly to the point representing the minimumpossible scores on the two forms.

Figures 1–4 illustrate the method with a hypothetical example, in which a newform of a test consisting of 20 four-choice items is being equated to a reference formalso consisting of 20 four-choice items. The three points that determine the circle arcare indicated by the square boxes. The upper end point of the curve, determined bythe maximum possible scores, is specified to be (20, 20). The lower end point of the

332

Page 4: The Circle-Arc Method for Equating in Small Samples

0

4

8

12

16

20

0 4 8 12 16 20

New form (X)

Referenceform (Y)

y = L(x)

(x1, y1)

(x2, y2)

(x3, y3)

FIGURE 1. Three points determine the equating curve.

0

4

8

12

16

20

0 4 8 12 16 20

Referenceform (Y)

New form (X)

(x1, y1)

(x1, y*1)

(x2, y2)

(x2, y*2)

(x3, y3)

(x3, y*3)

FIGURE 2. Transforming the three points.

333

Page 5: The Circle-Arc Method for Equating in Small Samples

Livingston and Kim

curve, determined by the chance scores, is specified to be (5, 5). The middle pointhas been determined from the data to be (12, 14), which implies that the new form ismore difficult than the reference form.

Figure 1 shows the three points that determine the equating curve. We will refer tothe lower end point as (x 1, y1), the upper end point as (x 3, y3), and the middle point,determined by the data, as (x 2, y2). Figure 1 also shows the straight line connectingthe two pre-specified end points. This line serves as the linear component of theequating function; we will call it L(x). Algebraically,

L(x) = y1 + y3 − y1

x3 − x1(x − x1).

If the new form and the reference form are alike in test length, item format, andscoring system, L(x) will be the identity line.

In Figure 2, the three points are transformed by subtracting the height of the lineL(x). We will use the notation y∗ to represent the height of the transformed point:

y∗ = y − L(x).

Because the two end points of the curve are on the line L(x), this step transforms theiry values to zero: y∗

1 = y∗3 = 0.

The height of the transformed middle point (x 2, y∗2), near the bottom of

Figure 2, is equal to the vertical distance of the original middle point (x 2, y2)above the line connecting the two end points. This distance depends on what the

0

4

8

12

16

20

0 4 8 12 16 20

Referenceform (Y)

New form (X)

(x1, y1)

(x1, y*1)

(x2, y2)

(x2, y*2)

(x3, y3)

(x3, y*3)

FIGURE 3. Fitting the circle arc to the three transformed points.

334

Page 6: The Circle-Arc Method for Equating in Small Samples

Circle-Arc Method for Equating in Small Samples

small-sample equating indicates about the difficulty of the new form and the refer-ence form. If the small-sample equating indicates that the new form is harder thanthe reference form, the middle point will be above the line connecting the end points,and y∗

2 will be positive. If the small-sample equating indicates that the new form iseasier than the reference form, the middle point will be below the line connecting theend points, and y∗

2 will be negative. In this example, the three transformed points are(5, 0), (12, 2), and (20, 0).

In Figure 3, the three transformed points are used to determine a circle arc, andthis arc determines the value of y∗ at each new-form raw score. If r represents theradius of the circle, and the coordinates of its center are (xc, yc), the equation of thecircle is

(X − xc)2 + (Y − yc)2 = r2.

If y∗2 is positive, the formula for the arc is

Y ∗ = yc +√

r2 − (X − xc)2.

If y∗2 is negative, the formula for the arc is

Y ∗ = yc −√

r2 − (X − xc)2.

0

4

8

12

16

20

0 4 8 12 16 20

Referenceform (Y)

New form (X)

(x1, y1)

(x2, y2)

(x3, y3)

FIGURE 4. Retransforming the points determined by the circle arc.

335

Page 7: The Circle-Arc Method for Equating in Small Samples

Livingston and Kim

The coordinates of the center point are

xc =(x2

3 − x21

)2(x3 − x1)

;

yc =(x2

1

)(x3 − x2

) − (x2

2 + (y∗2

)2)(x3 − x1) + (

x23

)(x2 − x1)

2[y∗

2 (x1 − x3)] ,

and the radius is

r =√

(xc − x1)2 + (yc)2.

In this example, the three transformed points determine a circle with center(12.5, –13) and radius

√225.25 = 15.01.

Finally, in Figure 4, the curve for y∗ is retransformed back to the score scale forthe reference form, by adding in the height of the line L(x). The resulting curve isan estimate of the equating function for scores between the specified end points. Theestimated equating function is not a circle arc; it is the sum of a circle arc and a linewith a positive slope of approximately 1.

Determining the Middle Point

If the equating design is a single-group, counterbalanced, or equivalent-groupsdesign,1 the middle point (x 2, y2) can be determined by equating the mean score onthe new form directly to the mean score on the reference form.

In an anchor equating design, the middle point can be determined by chained linearequating at the mean score of the smaller group of test takers—typically, the grouptaking the new form. The chained linear equating formula for the reference-formscore y corresponding to new-form raw-score x, is

y = mY B + sYB

sVB(mVA − mVB) + sYB

sVB

sVA

sXA(x − mXA) ,

where m and s indicate the means and standard deviations, A and B indicate thetest-taker groups taking the new form X and reference form Y , and V indicates theanchor score. This formula yields an equated score y for any value of x, even if xis not actually a possible score on the new form. Choosing x 2 = mXA simplifies theequating formula to

y2 = mYB + sYB

sVB(mVA − mVB) .

This method of determining the middle point requires only two pieces of infor-mation from the sample of test takers taking the new form: their mean score on thenew form (mXA) and their mean score on the anchor (mVA). It requires four piecesof information from the sample of test takers taking the reference-form group—themeans and standard deviations of their scores on the reference form (mYB, sYB) and

336

Page 8: The Circle-Arc Method for Equating in Small Samples

Circle-Arc Method for Equating in Small Samples

on the anchor (mVB, sVB). In most cases, when the new form and the reference formhave been taken by substantially different numbers of test takers, the number of testtakers who have taken the reference form is larger (often, the reference form hasbeen administered more than once). However, it is possible for the group taking thenew form to be larger than the group taking the reference form. In that case, if thedifference between the group sizes is large enough to matter, it would be better todetermine the middle point (x 2, y2) by specifying y2 to be the mean score of thatsmaller group on the reference form and using chained linear equating to determinex 2, the corresponding score on the new form.

A Tryout of the Method

To see how the method would work in practice, we conducted a small-scale re-sampling study. The data came from a common-item equating of two forms of a107-item, four-option multiple-choice test, based on data from more than 6,400 testtakers taking each form. The test was a widely used certification test for prospec-tive elementary school teachers, testing their knowledge of curriculum, instruction,and assessment. Table 2 shows some statistics describing the test-takers’ scores onthe anchor and on the full test. The groups taking the two forms were about equallystrong; their mean scores on the common items differed by only .03 SD. However,on the full test, the mean score of the group taking the new form was .36 SD lowerthan that of the group taking the reference form, implying that the new form wassubstantially more difficult.

The criterion equating for the resampling study was the chained equipercentileequating of the two forms, computed in the full group of 6,400+ test takers takingeach form, with presmoothing of the joint distribution of total scores and anchorscores in each group. The presmoothing was a log-linear bivariate smoothing thatpreserved the first five univariate moments of each marginal distribution (i.e., of thetotal score and of the anchor score).

The purpose of the resampling study was to compare the circle-arc methods withother methods that might be used in small-sample situations. We included three lin-ear equating methods: the Tucker method, the Levine observed-score method, and

TABLE 2Statistical Comparison of the Groups Taking the New Form and Reference Form

New-Form Reference-FormTest Takers Test Takers

Number of test takers 6,426 6,489Anchor-score mean 30.60 30.46Anchor-score SD 4.96 5.09

Standardized mean difference +.03Total-score mean 73.62 77.47Total-score SD 10.51 10.83

Standardized mean difference −0.36Correlation of total and anchor scores .91 .92

337

Page 9: The Circle-Arc Method for Equating in Small Samples

Livingston and Kim

the chained linear method. We also included the mean equating method presentedby Kolen and Brennan (2004, p. 125, equation 4.78) and another mean equatingmethod based on the chained approach. The results of those methods were nearlyidentical, and only the results of Kolen and Brennan’s version will be reported here.Because the criterion equating was a chained equipercentile equating of presmootheddistributions, we also included that method among the small-sample methods tobe compared. However, in presmoothing the small-sample score distributions, weused a stronger smoothing model, preserving only three univariate moments of eachmarginal distribution.

The resampling study consisted of 1,000 replications of the following procedure:

1. Draw a random sample of nx test takers who took the new form and a randomsample of ny test takers who took the reference form.

2. In those samples of test takers, equate the new form to the reference form by allthe selected equating methods.

Our choice of sample sizes was based on our own experience in equating small-volume tests. We decided to draw samples of 25 test takers who took the new formand 75 who took the reference form.

Figure 5 shows how much the equated scores produced by the small-sample equat-ings differed from those produced by the criterion equating, as indicated by the rootmean squared difference (RMSD) over the 1,000 replications (the formula for this

0

0.1

0.2

0.3

0.4

0.5

0.6

30 40 50 60 70 80 90 100

Raw score

Chained equipercentileLevine (observed)Chained linearTuckerMeanCircle-arcIdentityPercentiles 10, 25, 50, 75, 90

FIGURE 5. Resampling RMSD of small-sample equatings from population equating,expressed in terms of SD of scores.

338

Page 10: The Circle-Arc Method for Equating in Small Samples

Circle-Arc Method for Equating in Small Samples

statistic is in the Appendix). The units of the horizontal scale are raw-score points,but the vertical scale is expressed in SD units, magnified to show the differencesamong the small-sample methods. The horizontal scale of the graph extends fromraw-score 30 (slightly above the chance score of 26.75) to 107, the maximum pos-sible score. Only five of the 6,426 test takers taking the new form had scores lowerthan 38, and only four had scores higher than 100. The X marks on the horizontalscale indicate the 10th, 25th, 50th, 75th, and 90th percentiles of the new-form raw-score distribution in the criterion equating. Figure 5 includes a curve for each of thesmall-sample methods and a curve for the identity (i.e., no equating).

Figure 5 shows that the differences between methods in accuracy were smallfor raw scores near the median of the distribution, but large for scores far fromthe median. In these small samples, the methods based on very strong assumptions(mean equating and circle-arc equating) clearly outperformed those based on weakerassumptions (the three linear equating methods and equipercentile equating). Themost accurate method, overall, was the circle-arc method. In every part of the scorerange, its RMSD was either the smallest or very nearly the smallest. Its advantageover mean equating was substantial below the 10th percentile and above the 90thpercentile.

Figures 6 and 7 decompose the RMSD into its two components, bias and variabil-ity. Figure 6 shows the bias in each small-sample method (and for the identity) at eachscore level, as indicated by the mean, over the 1,000 replications, of the differencesbetween the small-sample equated scores and the criterion equating. Figure 7 shows

-0.4

0

0.4

30 40 50 60 70 80 90 100

Raw score

Chained equipercentileLevine (observed)Chained linearTuckerMeanCircle-arcIdentityPercentiles 10, 25, 50, 75, 90

FIGURE 6. Bias in small-sample equating by each method, expressed in terms of SD ofscores.

339

Page 11: The Circle-Arc Method for Equating in Small Samples

Livingston and Kim

the variability of each small-sample method, as indicated by the SD of the equatedscores it produced over the 1,000 replications (the formulas for these statistics are inthe Appendix).

Figure 6 shows that in the middle of the distribution, all the small-sample meth-ods had a negative bias in equating this particular pair of test forms. They tended toproduce equated scores that were too low, making too small an adjustment for thegreater difficulty of the new form. This bias was smallest for the chained equiper-centile method. Above the 75th percentile, all the linear methods, including meanequating, showed a positive bias that increased rapidly with the distance of the scorefrom the median of the distribution. This result was inevitable, because of the curvi-linearity in the criterion equating. The circle-arc method showed a negative bias inthis portion of the score range. The chained equipercentile method showed less biasthan the other methods, except in the lowest portions of the score range, well belowthe 10th percentile.

Figure 7 shows the main reason for the greater accuracy of the methods based onstrong assumptions—smaller sampling variability. For mean equating, the SD overreplications is constant over the entire score range. For circle-arc equating, the SDover replications decreases to zero as the raw score approaches the prespecified endpoints of the estimated equating curve.

Table 3 shows summary statistics for the resampling RMSD, bias, and SD of eachequating method, combining over score levels by weighting the value at each rawscore according the number of test takers with that raw score in the full data set. The

0

0.1

0.2

0.3

0.4

0.5

0.6

30 40 50 60 70 80 90 100

Raw score

Chained equipercentile

Levine (observed)

Chained linear

Tucker

Mean

Circle-arc

Percentiles 10, 25, 50, 75, 90

FIGURE 7. Resampling standard deviation of equated scores by each method, expressed interms of SD of scores.

340

Page 12: The Circle-Arc Method for Equating in Small Samples

TABLE 3Overall Values, in Raw-Score Points, for the Resampling RMSD, Bias, and StandardDeviation

Equating Method RMSD Bias SD

Identity (no equating) .387 .387 .000Chained equipercentile .199 .036 .196Levine .159 .045 .153Chained linear .146 .044 .139Tucker .145 .046 .137Mean .104 .044 .094Circle arc .097 .036 .090

summary statistic for the bias is a root mean square, which does not allow negativebias at some score levels to cancel out positive bias at other score levels. This measureof bias is the one that partitions the overall RMSD into two orthogonal components:RMSD2 = bias2 + SD2. To provide an additional basis for comparison, Table 3 alsoincludes these statistics computed for the identity, that is, for assuming the forms tobe of equal difficulty instead of attempting to equate them.

In the small-sample equating of this particular pair of test forms, the circle-arcmethod had the smallest overall RMSD, the smallest bias (tied with the chainedequipercentile method), and the smallest SD over replications (except for the identity,which does not vary over replications).

Discussion

The resampling study reported above is intended as a demonstration of the circle-arc method, not a full evaluation. We intend to conduct more resampling studies,with other tests and with other equating designs and sample sizes. One applicationthat appears promising is equating in an equivalent-groups design. The circle-arcmethod could make this equating design practical for situations in which the sampleof test takers available would not be considered adequate for equating by conven-tional methods.

If the circle-arc method performs as well in subsequent studies as it did in ourpreliminary tryout, it could replace mean equating as the method of choice for small-sample equating situations. Its main advantage over mean equating is its accuracy inthe upper and lower tails of the score distribution. Accuracy in these regions can beimportant, particularly if one of them includes a cutscore for classifying test takers.

Circle-arc equating may also be preferable to linear equating in situations wheretest forms differ in difficulty and the samples are too small for equipercentile equat-ing. It may provide a solution to the problems caused by the scarcity—or absence—of data at the high and low ends of the score distribution. In the absence of data,the equipercentile equating in those regions is undefined. A smoothing model canbe used to estimate score probabilities for these regions, but smoothing models aredesigned to provide a good fit to the data in the regions where the data are. Theestimated score probabilities in the no-data regions of the score scale can lead toequating transformations that are highly implausible. The circle-arc method may of-fer a way out of this dilemma.

341

Page 13: The Circle-Arc Method for Equating in Small Samples

Livingston and Kim

It seems appropriate to conclude by acknowledging that there is no theoretical ba-sis for the circle-arc method as a method of equating. The rationale for the methodis purely practical: when applied to small-sample data, it results in estimated equat-ing curves that closely approximate the equipercentile curves computed from large-sample or population data. If this assertion is confirmed by further empirical studies,then the circle-arc method will be a valuable tool for equating with small samples oftest takers.

Appendix

Formulas for the RMSD, Bias, and Standard Deviation over Replications

The following formulas compute the resampling RMSD, bias, and standard devi-ation of the equated score, expressed in raw-score points. The resulting values weredivided by the reference-form SD (10.83) to convert them to standard-score unitsbefore plotting them in Figures 5–7 and entering them into Table 3.

Notation:i indexes the new-form raw-score levels, 0 to 107.j indexes the replications of the sampling and equating procedure, 1 to 1,000.yij represents the equated score for new-form raw-score xi, computed on the jth

replication.yi represents the equated score for new-form raw-score xi, computed from the full

data set.yi represents the mean of yij over replications of the resampling procedure: yi =

11,000

∑1,000j=1 yi j .

n represents the total number of test takers in the full data set who took the newform.

ni represents the number of test takers in the full data set having new-form raw-score xi.

The resampling RMSD at raw-score level i is

RMSDi =√√√√ 1

1, 000

1,000∑j=1

(yi j − yi )2 ,

and the overall resampling RMSD is

√√√√1

n

107∑i=0

ni RMSD2i =

√√√√1

n

107∑i=0

ni

(1

1, 000

1,000∑j=1

(yi j − yi )2

).

The bias at raw-score level i is

biasi = yi − yi ,

and the overall (root mean squared) bias is√

1n

∑107i=0 ni (yi − yi )2.

342

Page 14: The Circle-Arc Method for Equating in Small Samples

Circle-Arc Method for Equating in Small Samples

The resampling SD at raw-score level i is

SDi =√√√√ 1

1, 000

1,000∑j=1

(yi j − yi )2 ,

and the overall resampling SD is

√√√√1

n

107∑i=0

ni SD2i =

√√√√1

n

107∑i=0

ni

(1

1, 000

1,000∑j=1

(yi j − yi )2

).

Note1 This terminology for equating designs is consistent with that of Livingston

(2004, pp. 27–35); Kolen and Brennan (2004, pp. 13–22) use a somewhat differentset of terms.

References

Divgi, D. R. (1987). A stable curvilinear alternative to linear equating (Report CRC 571).Alexandria, VA: Center for Naval Analyses.

Hanson, B. A., Zeng, L., & Colton, D. (1994). A comparison of presmoothing and postsmooth-ing methods in equipercentile equating (ACT Research Report 94–4). Iowa City, IA: Amer-ican College Testing, Inc.

Kim, S., von Davier, A. A., & Haberman, S. (2008). Small-sample equating using a syntheticlinking function. Journal of Educational Measurement, 45, 325–342.

Kolen, M. J., & Brennan, R. L. (2004). Test equating, scaling, and linking (2nd ed.). NewYork: Springer.

Livingston, S. A. (1993). Small-sample equating with log-linear smoothing. Journal of Edu-cational Measurement, 30, 23–29.

Livingston, S. A. (2004). Equating test scores (without IRT). Princeton, NJ: Educational Test-ing Service.

Parshall, C.G., Houghton, P. D. B., & Kromrey, J. D. (1995). Equating error and statisticalbias in small sample linear equating. Journal of Educational Measurement, 32, 37–54.

Skaggs, G. (2005). Accuracy of random groups equating with very small samples. Journal ofEducational Measurement, 42, 309–330.

von Davier, A. A., Holland, P. W., & Thayer, D. T. (2004). The kernel method of test equating.New York: Springer.

Authors

SAMUEL A. LIVINGSTON is a Senior Psychometrician, Educational Testing Service,Princeton, NJ 08541; [email protected]. His primary research interests include equating,reliability estimation, and constructed-response testing.

SOOYEON KIM is a Senior Psychometrician, Educational Testing Service, Princeton, NJ08541; [email protected]. Her primary research interests include equating.

343