45
PERSONNEL PSYCHOLOGY 2012, 65, 385–428 UNLOCKING THE KEY TO BIODATA SCORING: A COMPARISON OF EMPIRICAL, RATIONAL, AND HYBRID APPROACHES AT DIFFERENT SAMPLE SIZES JEFFREY M. CUCINA U.S. Customs and Border Protection PAT M. CAPUTO Aon Hewitt HENRY F. THIBODEAUX U.S. Office of Personnel Management CHARLES N. MACLANE Independent Consulting, Arlington, VA The criterion-related validities of empirical, rational, and hybrid keying procedures for a biodata inventory were compared at different sample sizes. Rational keying yielded the lowest validities. Hybrid keying per- formed best at the smallest sample sizes studied, followed by empirical keying at moderate sizes, and stepwise regression weighting of items at the largest sample sizes. Biographical data (biodata) measures have a long history of success as selection tools, including a meta-analytic validity estimate of .35 (Schmidt & Hunter, 1998). Biodata are theoretically based on the assumption that past behavior is the best predictor of future behavior. In other words, applicants’ behavior up to the point at which they are applying for a job should be a good predictor of their performance on the job after being hired. The questions in a biodata inventory target situations that are likely to have occurred in an individual’s life. The type of questions in a biodata inventory can be quite broad in nature and often encompass past experiences, education, training, and abilities, personality, attitudes, as well as interests. The authors would like to thank Sharron C. Thompson for her valuable comments and suggestions on this paper. Various portions of this research have been presented at the annual conference of the Society for Industrial and Organizational Psychology in 2005, 2006, and 2007. The dataset we used in our study came from the validation dataset of the U.S. Office of Personnel Management’s Individual Achievement Record, which is described in Gandy et al.’s (1994) chapter of the Biodata Handbook. Correspondence and requests for reprints should be addressed to Jeffrey M. Cucina, PhD, Personnel Research and Assessment Division, U.S. Customs and Border Protec- tion, 1400 L Street, NW 7th Floor, Washington DC, 20229-1145; [email protected] or [email protected]. C 2012 Wiley Periodicals, Inc. 385

UNLOCKING THE KEY TO BIODATA SCORING: A …stevenmbrownportfolio.weebly.com/.../7/4/6/17469871/cucina_et_al._2012_keying_biodata.pdfBiographical data (biodata) measures have a long

  • Upload
    others

  • View
    3

  • Download
    1

Embed Size (px)

Citation preview

PERSONNEL PSYCHOLOGY2012, 65, 385–428

UNLOCKING THE KEY TO BIODATA SCORING:A COMPARISON OF EMPIRICAL, RATIONAL,AND HYBRID APPROACHES AT DIFFERENTSAMPLE SIZES

JEFFREY M. CUCINAU.S. Customs and Border Protection

PAT M. CAPUTOAon Hewitt

HENRY F. THIBODEAUXU.S. Office of Personnel Management

CHARLES N. MACLANEIndependent Consulting, Arlington, VA

The criterion-related validities of empirical, rational, and hybrid keyingprocedures for a biodata inventory were compared at different samplesizes. Rational keying yielded the lowest validities. Hybrid keying per-formed best at the smallest sample sizes studied, followed by empiricalkeying at moderate sizes, and stepwise regression weighting of items atthe largest sample sizes.

Biographical data (biodata) measures have a long history of success asselection tools, including a meta-analytic validity estimate of .35 (Schmidt& Hunter, 1998). Biodata are theoretically based on the assumption thatpast behavior is the best predictor of future behavior. In other words,applicants’ behavior up to the point at which they are applying for ajob should be a good predictor of their performance on the job afterbeing hired. The questions in a biodata inventory target situations thatare likely to have occurred in an individual’s life. The type of questionsin a biodata inventory can be quite broad in nature and often encompasspast experiences, education, training, and abilities, personality, attitudes,as well as interests.

The authors would like to thank Sharron C. Thompson for her valuable comments andsuggestions on this paper. Various portions of this research have been presented at the annualconference of the Society for Industrial and Organizational Psychology in 2005, 2006, and2007. The dataset we used in our study came from the validation dataset of the U.S. Officeof Personnel Management’s Individual Achievement Record, which is described in Gandyet al.’s (1994) chapter of the Biodata Handbook.

Correspondence and requests for reprints should be addressed to Jeffrey M. Cucina,PhD, Personnel Research and Assessment Division, U.S. Customs and Border Protec-tion, 1400 L Street, NW 7th Floor, Washington DC, 20229-1145; [email protected] [email protected]© 2012 Wiley Periodicals, Inc.

385

386 PERSONNEL PSYCHOLOGY

Establishing a scoring key is a critical step in creating a biodata mea-sure. Because biodata inventories are often heterogeneous in nature (bothin content and item type), it is common for biodata developers to dif-ferentially weight each item response depending on how well it relatesto job performance. Item responses can be weighted subjectively, usingratings of the estimated relationship between responses and performance(i.e., rational keying), or objectively, using empirical estimates of this re-lationship (i.e., empirical keying). In empirical keying, items and optionsare weighted based on the empirical relationship between endorsementand scores on a criterion. Rational keying relies on theoretical linkagesbetween items and performance domains or expert judgments regardingthe predictive validity of response option endorsement (Guion, 1965).Oftentimes, a hybrid approach is adopted, whereby both empirical andrational information are used to establish the scoring key.

In addition to choosing an empirical, rational, or hybrid approach, abiodata developer must also make a number of other important decisionswhen creating a scoring key. If an empirical or hybrid approach is chosen,then the developer must choose among numerous empirical keying pro-cedures. In addition, the biodata developer must determine how large ofa sample is needed to create an empirical or hybrid key. Another decisionthat must be made is whether items should be unit or empirically weighted.The issue of shrinkage with empirical and hybrid keys is another issue thatneeds to be taken into account. The purpose of this paper is to shed somelight on these issues through the following seven research questions.

1. Which scoring approach (i.e., rational, empirical, or hybrid) yieldsthe highest criterion-related validity?

2. Do the different empirical keying procedures (e.g., vertical per-cent, point biserial, mean criterion, etc.) have different criterion-related validities?

3. Which biodata scoring procedures should practitioners use(considering factors such as validity, feasibility, and legaldefensibility)?

4. Do the different biodata scoring procedures yield similar (i.e.,highly correlated) scores?

5. Does sample size impact the validity of the different biodata scor-ing procedures?

6. What are the sample size requirements for empirical and hybridkeying?

7. Does hybrid keying decrease the sample size requirements?

The remainder of the Introduction serves to describe each of the sevenresearch questions.

JEFFREY M. CUCINA ET AL. 387

Research Question #1: Which Scoring Approach (i.e., Rational, Empirical,or Hybrid) Yields the Highest Criterion-Related Validity?

The selection of an appropriate scoring procedure for biodata mea-sures is a source of some debate. Proponents of empirical keying makeseveral arguments in terms of the predictive ability of empirically keyedbiodata inventories. Several researchers maintain that empirical keyingis ideal when maximizing prediction is the goal rather than maximizingan understanding of theory, relationships, and constructs (Guion, 1965;Mitchell & Klimoski, 1982). Mitchell and Klimoski argue that the ratio-nal approach to scoring biodata items makes untenable assumptions (e.g.,linearity) and does not maximize predictive validity. Hogan (1994) notesthat the meaning of scores on an empirically keyed biodata inventory canonly be interpreted in terms of predictive efficiency, which argues againsta rational approach.

In contrast, proponents of rational keying also make several argumentsfor the use of rational keying. Hogan (1994) notes that empirical keying ofbiodata instruments has come under attack (e.g., Dunnette, 1962), mainlybecause empirical keying does little to assist in theory development. Al-though empirical keying may be the most predominant procedure used toestablish biodata scoring keys (Hogan, 1994), rational keying is thought tobetter advance the understanding of the processes linking predictor itemsand job performance (Hough & Paullin, 1994). Moreover, rational scoringapproaches are often favored for their legal defensibility (Sharf, 1994),increased theoretical understanding (Mumford & Stokes, 1992), and in-creased generalizability across jobs (Stokes, Hogan, & Snell, 1993).

Some psychologists suggest that the hybrid approach (which serves asa compromise between the rational and empirical approaches) may be thebest approach to follow. With hybrid keying, both empirical and rationalinformation are used to create the scoring key for a biodata measure.Sometimes, researchers will create empirical scoring weights for eachresponse option and review the weights for consistency with theory. Ifthe weights do not make theoretical sense, then the weights are modifiedor the item is dropped. Another approach is to combine rational andempirical weights (using unit weighting) to create a set of hybrid weights.For practical reasons, this is the approach taken in the current study, as amanual review of all item and option weights was not feasible.

Past research comparing empirical and rational biodata keys is mixed,with some studies supporting empirical keying and others supportingrational keying. Some researchers have argued that empirical keying ap-proaches display greater criterion-related validity (Karas & West, 1999;Mitchell & Klimoski, 1982); however, other research suggests that ratio-nal keying is as viable as empirical keying (Schoenfeldt, 1999; Stokes &

388 PERSONNEL PSYCHOLOGY

Searcy, 1999). The discrepancy between previous research studies mayresult from two particular limitations. First, each study relied on a singleempirical keying method, and the keying method varied across studies.Second, sample sizes varied across the studies. It is quite possible thatsample size may impact the differences between the criterion-related va-lidities of the empirical, hybrid, and rational approaches. In practice, thechoice of rational or empirical keying procedures is often driven by logis-tical considerations, particularly the availability of a sufficient validationsample. If either the empirical or hybrid approaches are selected, it isnecessary to obtain a large enough sample to generate empirically derivedscoring weights. On the other hand, rational keying does not require a largesample to generate a scoring key because it uses theoretical judgments ofitem response option values.

Research Question #2: Do the Different Empirical Keying Procedures HaveDifferent Criterion-Related Validities?

Psychologists using empirical and hybrid keying must choose betweennumerous empirical keying procedures and have little empirical researchto guide their choice. In this section, we will briefly describe the differentempirical keying procedures that exist. Probably, the best way to organizethe large number of biodata scoring procedures is to place them into dif-ferent approaches, classes, families, and methods, as depicted in Figure 1.Three approaches exist: empirical, rational, and hybrid (shown at thetop of Figure 1). Three classes exist: option weighting, item weight-ing, and combined option and item weighting. Of them, there are fourgeneral families of option weighting methods: the percent family, themean criterion family/method, the correlational family, and the rare re-sponse family/method. Within each family, there are a number of dif-ferent option weighting methods; for example, the correlational familyincludes the point biserial unit weights method, the biserial raw weightsmethod, and others. Descriptions, citations, and examples for each pro-cedure are provided in Table 1, and in-depth reviews can be found inHogan (1994) and Cucina, Bayless, Thibodeaux, Busciglio, and MacLane(2009).

The Option-Level Class of Empirical Keying Procedures

In the option-level class of empirical keying procedures, an empiricallyderived weight is generated for each response option for a biodata item.The weights depend on the empirical relationship between response op-tion endorsement and a criterion. For example, if endorsement of option Ais empirically associated with high job performance, then applicants who

JEFFREY M. CUCINA ET AL. 389

Figure 1: An Illustration of the Differences Between Biodata ScoringApproaches, Classes, Families, and Methods.

Note. At the top of the figure, there are three rectangles corresponding to the three biodatascoring approaches: empirical, hybrid, and rational. Each of the three approaches can bebroken down into two classes: option-level and item-level (the combined item- and option-level class is not detailed in this figure in the interest of parsimony). The classes can, inturn, be broken down into different families (e.g., the percent family and the correlationalfamily). Finally, each of the families can further be broken down into different methods(e.g., the empirical vertical percent raw weights method).

endorse option A will receive a higher score. If endorsement of option Bhas a negative relationship with job performance, then applicants endors-ing option B will receive a lower score. There are a number of option-levelempirical keying methods, each of which can be placed into four broadfamilies.

The Percent Family of Empirical Keying Methods

In the percent family, option weights are a function of the percentageof high and low criterion groups endorsing an option. In the verticalpercent methods, the option weights are proportional to the differencebetween the percentage of high and low criterion group members who se-lect the response option. The vertical percent methods (Strong, 1926; seealso England, 1961, 1971) are the most commonly used and cited family

390 PERSONNEL PSYCHOLOGY

TABLE 1Descriptions and Examples of Keying Procedures Studied

Procedure Description Example

Rational keyingoption weight*

Developed by psychologists based ontheoretical relationship between itemcontent and performance.Psychologists individually provideweights to each response option andthen come to a consensus during agroup meeting. (See Mumford &Stokes, 1992, for more information.)

Weight = 7 out of 10

Vertical percentraw weights*

Top and bottom 27% of participants(based on job performance) areselected. (If it was not possible todivide the criterion into two groupscontaining exactly 27% of theparticipants, then criterion groups thatprovided the best possibleapproximation to Kelley’s [1939]27% rule were used.) Responseoption weights are the raw differencein percent endorsement between thesetwo groups. This method was firstdeveloped by Strong (1926); however,England (1961, 1971) and Stead andShartle (1940) also provide extensiveinformation.

Weight = 18.2%–9.5% =8.7

Vertical 5%* Same as above, except that unit weightsare used. Differences of 5% or moreyield weights of +/−1; all otherdifferences yield weights of 0. (SeeEngland, 1961, 1971; Stead &Shartle, 1940; and Strong, 1926, formore information.)

Because 8.7% ≥ 5%, theweight is +1.

Vertical 10%* Same as above, except differences of10% or more yield weights of +/−1;all other differences yield weights of0. (See England, 1961, 1971; Stead &Shartle, 1940; and Strong, 1926, formore information.)

Because |8.7%| ≤ 10%, theweight is 0.

Vertical percentnet weights*

Similar to above, except that the unitweights take into account the numberof respondents in each group. The netweights are tabulated using Strong’s(1926) tables (reproduced in England,1961, 1971; Stead & Shartle, 1940,p. 255).

Per Strong’s tables, the netweight is 2.

continued

JEFFREY M. CUCINA ET AL. 391

TABLE 1 (continued)

Procedure Description Example

Vertical percentassignedweights*

Similar to above, except that assignedweights are derived from net weightsand are on a 0 to 2 integer scale. A netweight of −4 or less corresponds toan assigned weight of 0; a net weightof −3 to 3 corresponds to an assignedweight of 1; a net weight of 4 or morecorresponds to an assigned weight of2 (England, 1961, 1971).

Because the net weight of 2falls between −3 and 3,the assigned weight is 1.

Horizontalpercent rawweights*

Similar to vertical percent raw weights,except that the number of respondentsin the high criterion group endorsing aresponse option is divided by the sumof the number of respondents in boththe high and low groups endorsing anoption. The resulting number is thendivided by 10 to compute the weightfor the response option. This methodwas first described by Stead andShartle (1940).

Weight =nHigh

nHigh+nLow

10=

88+410

= .66710

= .067

Mean criterion* Response option weights are the meancriterion score of those respondentschoosing that option. This methodcan be traced back to Guttman(1941). Although Devlin, Abrahams,and Edwards (1992) have proposedthe creation of a modified version ofthis method that takes into accountthe number of participants endorsingeach option, to date, there has been nopublished variation of the meancriterion method that does this.

Weight = 3.68

Biserial rawweights*

Individual response options are coded as0s or 1s in a manner similar todummy coding (except that there arek recoded variables rather than k−1recoded variables as in dummycoding). Response option weights arethe biserial correlation betweenendorsement of that option and thecriterion. The biserial correlationcoefficient entails a correlationbetween a dichotomous variable in adataset and a continuous variable in a

Weight = rbis = .14

continued

392 PERSONNEL PSYCHOLOGY

TABLE 1 (continued)

Procedure Description Example

dataset. In contrast to a point biserialcorrelation, the biserial correlationcoefficient attempts to estimate whatthe correlation coefficient would be ifthe dichotomous variable in thedataset (i.e., response optionendorsement) reflects an underlyingvariable that is truly continuous (i.e.,the criterion), not dichotomous. Thismethod can be traced back to Taylorand Ellison (1967).

Point biserial rawweights*

Same as above, except that the pointbiserial coefficient is used instead ofthe biserial coefficient. The formulaused to compute the point biserialcorrelation coefficient is asimplification or a special case of theformula used to compute the Pearsoncorrelation. Specifically, it is designedfor situations where one of the twovariables is dichotomous and thesecond variable is continuous. If theformulae (or commands in astatistical software package) for thePearson correlation are used in thissituation, the resulting correlationcoefficient has exactly the same valueas if the point biserial formulae wereused. Dean, Russell, and Farmer(2002) provide a recent overview andendorsement of this method.

Weight = rpb = .09

Phi coefficientraw weights*

Same as above, except that the criterionis dichotomized using the twocriterion groups described for thevertical percent methods. Thus, a phicoefficient is computed. Note that likethe point biserial correlationcoefficient, the phi coefficient is aspecial case of the formula used tocompute the Pearson correlationcoefficient. The same formula andstatistical software command forcomputing a Pearson correlationcoefficient can also be used tocompute a phi coefficient. The phi

Weight = φ = .13

continued

JEFFREY M. CUCINA ET AL. 393

TABLE 1 (continued)

Procedure Description Example

coefficient does not try to estimate thecorrelation between two underlyingcontinuous variables that arerepresented in a dataset using twodichotomous variables (that type ofcorrelation coefficient is termed as atetrachoric correlation and has notbeen used in biodata scoring).Lecznar and Dailey (1950) provide anearly description of this method.

Point biserialunit weights*

Same as point biserial raw weights,except that weights of −1, 0, or 1 areassigned based on the statisticalsignificance (p ≤ .05) of thecorrelation in the corresponding rawweights method. Hogan (1994)provides information and anendorsement for unit weighting ofcorrelation coefficients whenempirically keying biodata items.

Because rpb = .09 (p =.27) is not significant,the weight is 0.

Phi coefficientunit weights*

Same as phi coefficient raw weights,except that weights of −1, 0, or 1 areassigned based on the statisticalsignificance (p ≤ .05) of thecorrelation in the corresponding rawweights method. Lecznar and Dailey(1950) provide an early description ofthis method.

Because φ = .13 (p =.25) is not significant,the weight is 0.

Biserial unitweights

In an effort for completeness, weinclude a listing for this method eventhough it does not exist. Currently,there is no commonly acceptedstatistical significance test for biserialcorrelations. The p values obtainedfrom point biserial correlations havebeen offered as an alternative.However, for empirical keyingpurposes, the point biserial p valueswould generate an empirical key forthe biserial method that has weights,which are identical to the pointbiserial method. Therefore, we didnot incorporate a biserial unit weightsmethod in our study.

N/A

continued

394 PERSONNEL PSYCHOLOGY

TABLE 1 (continued)

Procedure Description Example

Standardizedrational keyingoption weight(only used tocompute thehybridapproximationweights)

Rational keying option weights from allitems and all response options arestandardized to a z-scale, whereby therational weights have a mean of 0 anda standard deviation of 1.

In the current study, theunstandardizedrational keying optionweights had a meanof 4.9269 and astandard deviation of2.0311. Thus, anunstandardizedweight of 7 would beequivalent to astandardized weightof 1.0207.

Standardizedempiricalkeying optionweight (onlyused tocompute thehybridapproximationweights)

All of the empirically derived optionweights can be standardized, usingthe same procedure for standardizingthe rational keying option weights.Each method’s weights should bestandardized separately (because thedifferent methods yield optionweights on different scales).

As an example, thepoint biserial rawweight of .09 can bestandardized. In thecurrent study, theunstandardized pointbiserial raw weight amean of −.0030 and astandard deviation of.0468. Thus, anunstandardizedweight of .09 wouldbe equivalent to astandardized weightof 1.9874.

Hybridapproximationweights*

In this procedure, the standardizedrational keying option weight for anoption is summed with thecorresponding standardized empiricalkeying option weight. Because therational and empirical weights arestandardized and unit weighted, therational and empirical weights havean equal amount of influence on theresulting hybrid weight. Thisprocedure provides an approximationof hybrid weighting procedure andhas been described by Bergman,Drasgow, Donovan, Henning, &Juraska (2006).

As an example, a hybridapproximation weightcan be computed forthe point biserial rawweights method.Weight =Standardizedempirical keyingoption weight +Standardized rationalkeying option weight= 1.9874 + 1.0207 =3.0081

continued

JEFFREY M. CUCINA ET AL. 395

TABLE 1 (continued)

Procedure Description Example

Hybrid weights(only used todemonstratethe convergentvalidity of thehybridapproximationweights. Notused in themain analysesof the currentstudy)

In this procedure, psychologistsreview the empirical keyingoption weights and makeadjustments to the weightsbased on rationality. Inessence, this procedure seeksto combine the rational andempirical procedure.Psychologists makejudgments for eachindividual response optionwhen using this procedure.Due to the large number ofempirical keying proceduresused and the multiple samplesizes studied, it was notfeasible to conduct a truehybrid scoring system in thecurrent study. SeeOlson-Buchanan et al.(1998), Mael and Hirsch(1993), and Mumford (1999)for more information.

As an example, hybridweights can begenerated for thepoint biserial rawweights method.Suppose that thetheory behind theexample itemindicates thatselection of theexample responseoption would beassociated withmoderately highlevels of performance.Because the pointbiserial raw weight of.09 is linked toslightly positive levelsof performance, apsychologist couldhybridize this weightto a higher value(e.g., .30).

Sum scoring itemweighting

In this procedure, each itemscore is simply summed orunit weighted with the otheritem scores to get an overallbiodata scale score. Theresponse options for the itemcan be scored using any ofthe empirical, hybrid, orrational scoring approaches.(See Brown, 1994; Hogan,1994; Kluger, Reilly, &Russell, 1991; and Mumford& Owens, 1987 forinformation on unit vs.regression/correlational itemweighting.)

(1)

⎡⎢⎢⎢⎣

Option 1 = −2.0Option 2 = .2Option 3 = 1.4Option 4 = 2.2Option 5 = 1.8

⎤⎥⎥⎥⎦

+(1)

⎡⎢⎢⎢⎣

Option 1 = 3.2Option 2 = 4.5Option 3 = 1.3

Option 4 = −1.0Option 5 = .01

⎤⎥⎥⎥⎦

continued

396 PERSONNEL PSYCHOLOGY

TABLE 1 (continued)

Procedure Description Example

Stepwiseregressionitem weighting

In this procedure, each item scoreis weighted using stepwiseregression to get an overallbiodata scale score. Thestepwise regression weights foreach item are multiplied by eachitem’s score (which comes fromthe response option weights) togive an overall biodata scalescore. The response options forthe item can be scored using anyof the empirical, hybrid, orrational scoring approaches.Goldberg (1972) provides one ofthe earliest examples of stepwiseregression item weighting.

2.3

⎡⎢⎢⎢⎣

Option 1 = −2.0Option 2 = .2Option 3 = 1.4Option 4 = 2.2Option 5 = 1.8

⎤⎥⎥⎥⎦

+1, 3

⎡⎢⎢⎢⎣

Option 1 = 3.2Option 2 = 4.5Option 3 = 1.3

Option 4 = −1.0Option 5 = .01

⎤⎥⎥⎥⎦

Multipleregressionitem weighting

Although it is also conceivable touse multiple linear regression toweight items, we decidedagainst including this procedurein our study. Entering all139 biodata items into a multipleregression equation could proveunwieldy. In addition, themulticollinearity between someof the items could actuallyprevent some items fromentering the regression equation.In addition, using multiple linearregression to weight items israrely performed; instead, mostbiodata developers use stepwiseregression instead of multiplelinear regression.

N/A

Note. This table provides a description of the different empirical, rational, and hybrid keyingprocedures studied. All of the procedures used in the current study are denoted with anasterisk (*). An example of each procedure is provided in the right column, using data fora sample response option shown below:Number (and percent) of respondents in high criterion (top 27%) group endorsing sampleresponse option: 8 (18.2%).Number (and percent) of respondents in low criterion (bottom 27%) group endorsing sampleresponse option: 4 (9.5%).Mean score on criterion for respondents endorsing sample response option: 3.68.Point biserial correlation between sample response option endorsement and criterion:rpb = .09 (p = .27).Biserial correlation between sample response option endorsement and criterion: rbis = .14.Phi coefficient correlation between sample response option endorsement and dichotomizedcriterion: φ = .13 (p = .25).

JEFFREY M. CUCINA ET AL. 397

of empirical keying methods (Hogan, 1994; Mumford & Stokes, 1992).Methods in this family differ in how the raw differences in the two per-centages (i.e., the percent of the high and the percent of the low criteriongroups) endorsing an option are used to create option weights. A relatedfamily, the horizontal percent family (Stead & Shartle, 1940), uses a mod-ified version (as shown in Table 1) of the formula for vertical percentoption weights.

The Mean Criterion Family/Method

The mean criterion method, which can be traced to work by Guttman(1941), is one of the simplest and most straightforward empirical keyingprocedures. The weight for a response option is simply the mean criterionscore for all respondents choosing that option. Although this method isquite simple and straightforward, it has not enjoyed the same level of useas the other empirical keying methods.

The Correlational Family of Empirical Keying Methods

In this family, scoring weights are a function of the correlation betweenendorsement of a response option (coded as 0 or 1) and scores on the crite-rion. These methods differ depending on the particular type of correlationcoefficient that is obtained and whether (and how) the obtained correla-tion coefficients are rounded when creating the response option weights.The most straightforward correlational method is the point biserial rawweights method. In this method, the response option weights are simplythe point biserial correlation between the variable representing responseoption endorsement (coded as 0 or 1) and the variable representing thecriterion. The biserial raw weights method is similar to the point biserialraw weights method except that it entails the use of a biserial correlationcoefficient.

The Rare Response Family/Method

Finally, the rare response method (Miner, 1965; Telenson, Alexan-der, & Barrett, 1983; Tompkins & Miner, 1957) weights item responsesdepending on how rarely they are endorsed. We decided not to includethis family in our study, due to previous research showing its poor va-lidity (Aamodt & Pierce, 1987; Devlin, Abraham, & Edward, 1992). Inaddition, this procedure seems most relevant to clinical psychologicalassessments and applications (where there is an interest in identifyingindividuals who are outliers on their responses) as opposed to personnelselection assessments and applications.

398 PERSONNEL PSYCHOLOGY

The Item-Level Class of Biodata Scoring

Regardless of which option-level keying method is used, after re-sponse option weights are determined, the scored items must be compiledinto scale scores. Two procedures for compiling item scores into scalescores exist. The most common procedure is to simply sum all of the itemscores using unit weighting. Alternatively, item scores can be enteredinto a stepwise regression procedure, which selects and weights itemsin an attempt to maximize prediction. There is a conceptual rationalefor using the stepwise procedure. Although the option-level empiricalkeying methods seek to maximize item-level validity, they do not takeinto account multicollinearity between items. The stepwise method aimsto maximize scale-level validity while also decreasing the length of thescale by dropping items with little unique criterion-related variance. Onedrawback to using stepwise regression is its strong tendency to capitalizeon chance, especially with small sample sizes. This study examined twomethods for keying at the item level: unit weighting of items and stepwiseregression weighting of items. Note that item-level keying was com-bined with option-level keying in this study. Thus, both methods of item-level keying were combined with the different methods of option-levelkeying.

Past Research Comparing Empirical Keying Procedures

There are a large number of studies that use a single method for em-pirical keying; however, it is difficult to compare the validity coefficientsacross studies due to differences in jobs, items, sample sizes, and so on.Some studies only reported a foldback validity coefficient and did notcross-validate their results (e.g., Aamodt & Pierce, 1987), whereas otherstudies present results comparing only a small number of different empir-ical keying techniques (e.g., Lecznar & Dailey, 1950; Lefkowitz, Gebbia,Balsam, & Dunn, 1999; Malone, 1978; Reiter-Palmon & Connelly, 2000).Devlin et al. (1992) conducted the best comparison of different empiricalkeying techniques at different sample sizes. They compared several differ-ent procedures of empirically keying biodata instruments using one typeof respondents (i.e., applicants to the U.S. Naval Academy), one criterion(i.e., a measure equivalent to GPA), and five sample sizes. For each sam-ple size, two-thirds of the sample was used for the developmental groupand one-third was used for the cross-validation group. They directly com-pared the cross-validities of the vertical percent, horizontal percent, phicoefficient, mean criterion, and rare response study procedures. Theyfound that the vertical percent raw weights and net weights methods gave

JEFFREY M. CUCINA ET AL. 399

the highest cross-validities, whereas the rare response and mean criterionmethods gave the lowest cross-validities.

This study is, in part, a modified replication of Devlin et al.’s (1992)work. However, this study addresses several questions that were not stud-ied by Devlin et al., such as using traditional sum scoring versus stepwiseregression scoring and using hybrid and rational keying. This study alsouses data from a larger number of occupations with overall job perfor-mance (as opposed to training performance) as the criterion.

Research Question #3: Which Biodata Scoring Procedures ShouldPractitioners Use?

In addition to examining criterion-related validity, this study will alsoprovide advice to practitioners as to which biodata scoring proceduresthey should use. There are many different empirical keying procedures,and unfortunately, none of them are automatically implemented usingbuilt-in commands in statistical packages commonly used by industrial–organizational psychologists (e.g., SPSS, SAS, Excel, and Access). In-stead, biodata developers must go through laborious steps to create em-pirical scoring keys. As a result, it is often not feasible to run all of theempirical keying procedures and identify which works best. This studywill provide advice on which biodata scoring procedures can easily beimplemented under different conditions (e.g., sample size, availability ofa criterion, whether legal challenges are likely). In the Discussion section,a decision tree will be presented that explains how the results from thestudy can be used to guide practitioners’ decisions about biodata scoringprocedures.

Research Question #4: Do the Different Biodata Scoring Procedures YieldSimilar (i.e., Highly Correlated) Scores?

This study will also examine the correlations between scores on thedifferent biodata scoring procedures. If two procedures yield highly in-tercorrelated scores and if they also both yield similar criterion-relatedvalidities, then an argument can be made that both procedures are roughlyequivalent choices for biodata developers. In this study, we will examinethe intercorrelations between scores from the different biodata scoringprocedures. In addition, a principal components analysis (PCA) will beconducted on the scores to identify underlying components. This shouldhelp to summarize the large number of correlation coefficients that willbe obtained in this study.

400 PERSONNEL PSYCHOLOGY

Research Question #5: Does Sample Size Impact the Criterion-RelatedValidity of the Different Biodata Scoring Procedures?

Given that different formulae are used for computing the option-levelempirical keying procedures and the stepwise regression item-level em-pirical keying procedure is very data-driven, it seems plausible that thedifferent scoring procedures have different sample size requirements. Thisstudy will examine whether or not the cross-validities of the different bio-data scoring procedures are impacted by sample size.

Research Question #6: What Are the Sample Sizes Requirements forEmpirical and Hybrid Keying?

There are several published “rules of thumb” regarding appropriatesample sizes for empirical keying. For example, Hogan (1994) recom-mends using 500 to 1,000 subjects in both the developmental and cross-validation groups when keying 100 or more predictors (where each re-sponse option is a predictor). Other researchers have suggested that usingthe vertical percent method requires between 75 and 125 subjects in eachof the high and low criterion groups (Cascio, 1982; England, 1971). Forthe correlational family, researchers have suggested using between 400and 1,000 subjects (Hunter & Hunter, 1984; Mumford & Owens, 1987).However, none of these “rules of thumb” are based on published empiricalresearch.

Research by Dean and Russell (2001) compared the cross-validity ofthe point biserial raw weights method for different sample sizes drawnfrom three large samples. In general, their results suggest that there isless benefit in having more than 500 subjects when empirically keying abiodata inventory. In a study comparing the cross-validities of differentempirical keying procedures at different sample sizes, Devlin et al. (1992)found little increase in cross-validity for samples larger than 300. Althoughboth of these studies are quite notable, our study (a) examines manymore empirical keying procedures, (b) investigates the rational and hybridapproaches, and (c) investigates the item-level class of biodata scoringprocedures.

Research Question #7: Does Hybrid Keying Decrease the Sample SizeRequirements?

It is important to note that rational key development is not sample-size-dependent because data are not used to derive the response optionand item scoring weights. Instead, data are only used to estimate the

JEFFREY M. CUCINA ET AL. 401

criterion-related validity1 of a rational key not to create the rational key.In contrast, empirical keying requires a developmental sample of suffi-cient size to create a stable scoring key and a cross-validation sample ofsufficient size to estimate the cross-validity of the scoring key. Therefore,rational keying should require fewer cases in a dataset than empirical key-ing. Because the hybrid approach is a compromise between the empiricaland rational approaches, it also seems plausible that the rational qualitiesof a hybrid key could reduce its sample size requirements. This study willinvestigate this issue.

Overview of the Current Study

Our research compared rational, empirical, and hybrid keying proce-dures at sample sizes relevant for practice. Archival data from a large-scaleconcurrent validation study with 5,272 employees were used. The datasetcontains responses to a biodata instrument with a broad range of item con-tent as well as supervisory ratings of job performance for a large number ofprofessional and administrative jobs in the federal government. Previousmeta-analytic work with this dataset demonstrated that, when empiricallykeyed (using a single key), the instrument demonstrated validity for alloccupations and organizations studied (Gandy, Dye, & MacLane, 1994;Gandy, Outerbridge, Sharf, & Dye, 1989)2.

The analyses for this study were conducted in two phases, with alarge number of keying procedures studied in Phase 1 and with furtheranalyses conducted on a limited number of keying procedures in Phase 2.In Phase 1, we examined the criterion-related validity and intercorrelationsof numerous empirical, hybrid, and rational keying approaches to scoringbiodata. In total, 50 different scoring keys were created at seven differentdevelopmental sample sizes and were replicated, at each sample size, fivetimes. The results from Phase 1 were used to identify the main typesof biodata scoring procedures that differ from one another in terms ofcriterion-related validity.

One limitation of the results of Phase 1 is that only five replicationswere conducted per sample size, and only seven sample sizes were in-vestigated. Therefore, in Phase 2, six scoring keys were developed for106 sample sizes with 100 replications per sample size. The six scoring

1A power analysis using procedures outlined in Schmidt, Hunter, and Ury (1976) orCohen (1988) could be used to determine the sample size requirements for demonstrating astatistically significant relationship between scores on a rationally keyed biodata instrumentand scores on a criterion.

2Based on simulation work by May and Hittner (1997), this statistic gave the best powerand adequate type I error rates for the conditions most similar to this study.

402 PERSONNEL PSYCHOLOGY

TABLE 2Demographic Statistics

Demographic Frequency Percent

GenderMale 3,234 61.3Female 2,038 38.7

Race/national originWhite 4,037 76.6African American 770 14.6Hispanic 269 5.1Native American 33 0.6Asian American 158 3.0Missing 5 0.1

Mean Standard deviationAge 34.5 11.0

keys studied in Phase 2 were identified based on the results of Phase 1and their ease of implementation.

Method

Participants and Data Collection Procedure

An archival sample of 5,272 federal employees who participatedin a concurrent validation study of the Individual Achievement Record(IAR; Gandy et al., 1989, 1994) was used. All participants had re-cently been hired externally for entry-level professional and admin-istrative positions (e.g., investigators, management and program ana-lysts, production control workers, contract specialists and administrators,etc.). Demographic information is provided in Table 2. More informa-tion on the administration and procedures can be found in Gandy et al.(1994).

Biodata Instrument

This study used data from the experimental version of the IAR, whichcontained 139 items. Gandy and his colleagues (Gandy et al., 1989, 1994)developed the IAR, a biodata measure for federal government selection.The IAR was one component of the entry-level examination for about100 professional and administrative career jobs. Gandy et al. (1989)drew upon traditional biodata items and also developed new items tofit the federal government context. They deliberately included items thatwere hypothesized to measure a variety of cognitive, interpersonal, and

JEFFREY M. CUCINA ET AL. 403

motivational constructs. Job analysis data on professional and adminis-trative federal jobs were used to guide item development. Most responseoptions formed a continuum; however, some items had categorical op-tions. Both objective and subjective items were included. The items werereviewed to ensure job relatedness and address privacy concerns. Sampleitems and more information on the instrument can be found in Gandy etal. (1994).

Criterion

Research-based supervisory ratings of job performance served as thecriterion. The performance appraisal was based on the descriptive ratingscale (U.S. Department of Labor, 1974) and was successfully used bythe U.S. Employment Service in validation work (e.g., the GATB, U.S.Department of Labor, 1970). The measure included six items designed tobe applicable to a wide range of occupations and assessed job knowledge,accuracy, quality and quantity of work, and efficiency. A sample item canbe found in Gandy et al. (1994). Scores on the items were averaged intoa single criterion score, with a coefficient alpha of .90. A factor analysisdemonstrated that the items loaded heavily on a general performancefactor.

Phase 1

In this phase, the full dataset of 5,272 cases was randomly dividedinto smaller datasets with the following developmental sample sizes: 150,300, 500, 700, 1,000, 2,000, and 3,515, which represents two-thirds ofthe entire dataset. This process was repeated five times, yielding 35 to-tal datasets (five at each of the seven developmental sample sizes). The50 different scoring keys were generated for each of the 35 samples. Allsamples were cross-validated using a large holdout group of 1,757 cases(which represents one-third of the entire dataset). We chose to use a largeholdout because it should yield more stable cross-validities (due to theincreased power and diminished sampling error associated with such alarge holdout group).

Empirical Keying Procedures

The most commonly described empirical keying procedures are pre-sented in Table 1. The procedures used in this study are denoted inTable 1 with an asterisk (*) and all 50 (each of which were combina-tions of the various procedures in Table 1) are listed in Table 2. Afteritem scores were generated for each of the procedures, two variations of

404 PERSONNEL PSYCHOLOGY

scale scores were calculated: sum scores and stepwise regression scores.Sum scores were simply the sum of the scored items using unit weighting.A stepwise regression analysis was also performed on the scored itemsin order to select and differentially weight the items according to theirunique validities. Only data from the developmental sample were used inthe analyses to create the stepwise regression equations, and the predictedvalues served as the scale scores.

Rational Keying Procedures

Three psychologists developed a rational key. Item response optionswere first rated individually on a scale ranging from 1 = performance ispoor, consistently substandard to 10 = performance is extremely good,consistently superior. The raters then met to reach consensus for all itemresponse options. In addition, during the rational keying, the psychologistsidentified which dimension(s) each biodata item assessed and used thisinformation when developing a theory for each item. The four dimensionsof the biodata items described by Gandy et al. (1994) were used (i.e.,work competency, high school achievement, college achievement, andleadership skills). Because shrinkage would not occur when the rationalkeying with unit weighting method was used, we did not attempt to cross-validate this method, instead the validity from the large holdout groupwas used. However, because stepwise regression capitalizes on chance,cross-validation was used for the rational keying with stepwise regressionweighting of items method.

Hybrid Keying Procedure

Most biodata developers using the hybrid approach individually revieweach response option’s empirical weight and determine if it is consistentwith theory. Given the large number of keying procedures and sample sizesin this study, an individual review of each response option was not feasible.An approximation of the hybrid approach typically used by biodata devel-opers was used in this study. Specifically, empirical keying weights andrational keying weights were each standardized and then summed usingunit weighting to obtain hybrid keying weights for each response option.The approach used in this study approximates the scores that would beobtained if psychologists gave empiricism and theory equal considerationwhen determining the scoring key for each individual response option.There is precedence for using this approximation: Bergman, Drasgow,Donovan, Henning, and Juraska (2006) used it in a study examining scor-ing procedures for situational judgment tests.

JEFFREY M. CUCINA ET AL. 405

In order to demonstrate that our approximation of hybrid keying wassimilar to the traditional procedure for hybrid keying, we obtained ahybridized scoring key that was created during the original developmentof the biodata inventory and was developed by two psychologists using thetraditional procedure for hybrid keying with the point biserial raw weightsmethod. We then correlated each participant’s score on the traditionalhybrid key with our approximation using the hybrid key for the pointbiserial raw weights method. The correlation between the two hybridkeys was .959 (p < .001). In addition, the criterion-related validity of ourapproximation for hybrid keying (r = .317) was nearly identical to thatfrom the traditional hybrid key (r = .316). Thus, our approximation ofhybrid keying produces scores that are nearly identical to those from thetraditional procedure for hybrid keying.

Phase 2

The same dataset consisting of 5,272 cases used in Phase 1 was alsoused in Phase 2. Smaller datasets were created using the full dataset byrandomly selecting cases from the full dataset, without replacement. The106 smaller datasets ranged in size from 50 to 5,250 in increments of50 cases, except for the largest dataset, which contained all 5,272 cases. Atotal of 100 datasets were generated at each sample size. All option-levelempirical keying was carried out using the point biserial option weightsmethod because in Phase 1 it yielded validities that were comparable to theother procedures studied and because it could more easily be automatedusing SPSS syntax than the other procedures. The rational scoring keydeveloped in Phase 1 was also applied to each of the datasets. In addition,the same operationalization of the hybrid scoring key used in Phase 1 wasalso used in Phase 2, as were two item weighting families: unit weightingand stepwise regression weighting of items.

A series of SPSS macros were written to (a) randomly select 100 sam-ples of a particular size from the full dataset, (b) randomly assign each casein a sample to one of three thirds, (c) obtain the point biserial correlationsbetween each response option and the criterion, (d) compute hybridizedresponse option weights for each key, (e) score each participant’s re-sponses to create scale scores for each key using unit weighting of itemsand stepwise regression, (f) obtain the cross-validities for each procedure,and (g) compile standard deviations and averages of the cross-validitiesfrom each replication and each sample size. A triple cross-validation strat-egy was used, whereby an empirical key is created using two-thirds of adataset and is cross-validated on the remaining one-third of the dataset.The process is carried out three times for each of the three possible ar-rangements of the thirds into developmental and cross-validation samples.

406 PERSONNEL PSYCHOLOGY

The average of the three cross-validities is used as an estimate of the truecross-validity of an empirical key generated using the entire sample. Theanalyses were carried out over a period of approximately 12 months usingthree computers running SPSS in Windows.

Results

In the interest of parsimony, only the key results are presented inthis section. More information (e.g., correlation matrices, the results ofPCAs, statistical tests comparing validity coefficients, etc.) is availableupon request from the authors. The results of the analyses for Phase 1are presented in Table 3, which displays the cross-validity coefficientsfor the different procedures at each sample size. In Table 3, the mediancross-validities at the seven sample sizes studied are presented in columns,and each scoring method is presented in rows. As seen in the table, thereis a tendency for the cross-validity coefficients to increase as the samplesizes increases. For example, the empirical biserial raw weights with unitweighting method have a median cross-validity of .275 when the samplesize was 150 and median cross-validity of .352 at the largest sample size.The cross-validity coefficients for stepwise regression weighting of itemstend to be lower than unit weighting at smaller sample sizes; however,they have higher validities at the largest sample size. The rational keywith unit weighting of items method has a constant validity (r = .164)across all sample sizes because (unlike the other methods) the same keywas used at all sample sizes.

In Table 4, we present the results of Phase 2, whereby the empirical,hybrid, and rational approaches were studied at 106 sample sizes usingboth unit and stepwise weighting of items. The standard deviations of the100 triple cross-validities (from the 100 replications) are also displayedat each sample size. The standard deviations provide an indication of thevariability of the triple cross-validities and could be viewed as a proxy forthe power (or probability of detecting validity) at each sample size. Thestandard deviations are largest at the two smallest sample sizes (in the .160range) and decrease to less than .010 at the largest sample sizes, indicatingless variability in cross-validity estimates when very large samples areused. The results in Table 4 show a positive trend between validity andsample size with the trend asymptoting at the larger sample sizes. Thecross-validities asymptote quicker when unit weighting of items is usedas opposed to when the items are weighted with stepwise regression.Figure 2 presents the results of Table 4 graphically, whereby the meanvalidity coefficients at each sample size are plotted for each procedure.In Figure 2, plots of the mean triple cross-validities from Table 4 are

JEFFREY M. CUCINA ET AL. 407

TAB

LE

3C

ross

-Val

idit

ies

atD

iffer

entD

evel

opm

enta

lSam

ple

Size

sU

sing

aL

arge

(n=

1,75

7)H

oldo

utSa

mpl

e

Cor

rela

tion

with

Scor

ing

Scal

ing

sim

ilar

Proc

edur

epr

oced

ure

proc

edur

en

=15

0n

=30

0n

=50

0n

=70

0n

=1,

000

n=

2,00

0n

=3,

515

met

hods

a

Med

ian

cros

s-va

lidity

Med

ian

Rat

iona

l†R

atio

nal

Uni

t.1

64.1

64∗

.164

∗∗.1

64∗∗

.164

∗∗.1

64∗∗

.164

∗∗N

/ASt

epw

ise

.206

.246

∗∗.2

60∗∗

.321

∗∗.3

24∗∗

.371

∗∗.4

04∗∗

N/A

Cor

rela

tion

alE

mpi

rica

lU

nit

.275

∗.2

95∗∗

.319

∗∗.3

40∗∗

.337

∗∗.3

49∗∗

.352

∗∗.9

90∗∗

Bis

eria

lSt

epw

ise

.141

.233

∗∗.2

69∗∗

.301

∗∗.3

20∗∗

.385

∗∗.4

15∗∗

.920

∗∗

Hyb

rid

Uni

t.2

50∗

.258

∗∗.2

66∗∗

.281

∗∗.2

81∗∗

.298

∗∗.3

06∗∗

.990

∗∗

Step

wis

e.1

65.2

54∗∗

.286

∗∗.3

26∗∗

.337

∗∗.3

86∗∗

.409

∗∗.9

30∗∗

Phic

oeffi

cien

t—ra

ww

eigh

tsE

mpi

rica

lU

nit

.286

∗.3

11∗∗

.325

∗∗.3

55∗∗

.346

∗∗.3

51∗∗

.358

∗∗.9

70∗∗

Step

wis

e.1

62.2

16∗∗

.294

∗∗.2

98∗∗

.316

∗∗.3

81∗∗

.401

∗∗.9

30∗∗

Hyb

rid

Uni

t.2

72∗

.275

∗∗.2

90∗∗

.309

∗∗.3

08∗∗

.308

∗∗.3

17∗∗

.990

∗∗

Step

wis

e.1

65.2

18∗∗

.282

∗∗.3

18∗∗

.326

∗∗.3

75∗∗

.407

∗∗.9

30∗∗

Phic

oeffi

cien

t—un

itw

eigh

tsE

mpi

rica

lU

nit

.269

∗.2

90∗∗

.319

∗∗.3

47∗∗

.332

∗∗.3

39∗∗

.341

∗∗.9

70∗∗

Step

wis

e.1

87.2

10∗

.271

∗∗.3

05∗∗

.310

∗∗.3

68∗∗

.387

∗∗.9

00∗∗

Hyb

rid

Uni

t.2

55∗

.260

∗∗.2

84∗∗

.300

∗∗.3

04∗∗

.304

∗∗.3

14∗∗

.980

∗∗

Step

wis

e.1

85.2

30∗∗

.302

∗∗.3

27∗∗

.328

∗∗.3

76∗∗

.405

∗∗.9

30∗∗

Poin

tbis

eria

l—ra

ww

eigh

tsE

mpi

rica

lU

nit

.281

∗.3

18∗∗

.327

∗∗.3

43∗∗

.349

∗∗.3

50∗∗

.355

∗∗.9

70∗∗

Step

wis

e.1

49.2

21∗∗

.263

∗∗.3

06∗∗

.326

∗∗.3

86∗∗

.406

∗∗.9

20∗∗

Hyb

rid

Uni

t.2

77∗

.277

∗∗.2

91∗∗

.304

∗∗.3

08∗∗

.309

∗∗.3

17∗∗

.990

∗∗

Step

wis

e.1

72.2

45∗∗

.273

∗∗.3

16∗∗

.335

∗∗.3

81∗∗

.405

∗∗.9

40∗∗

cont

inue

d

408 PERSONNEL PSYCHOLOGYTA

BL

E3

(con

tinue

d)

Cor

rela

tion

with

Scor

ing

Scal

ing

sim

ilar

Proc

edur

epr

oced

ure

proc

edur

en

=15

0n

=30

0n

=50

0n

=70

0n

=1,

000

n=

2,00

0n

=3,

515

met

hods

a

Med

ian

cros

s-va

lidity

Med

ian

Poin

tbis

eria

l—un

itw

eigh

tsE

mpi

rica

lU

nit

.260

∗.3

08∗∗

.321

∗∗.3

35∗∗

.338

∗∗.3

39∗∗

.342

∗∗.9

70∗∗

Step

wis

e.1

58.2

05∗

.266

∗∗.3

11∗∗

.312

∗∗.3

75∗∗

.400

∗∗.9

00∗∗

Hyb

rid

Uni

t.2

52∗

.268

∗∗.2

84∗∗

.297

∗∗.3

01∗∗

.301

∗∗.3

11∗∗

.980

∗∗

Step

wis

e.1

63.2

23∗∗

.277

∗∗.3

23∗∗

.333

∗∗.3

84∗∗

.400

∗∗.9

10∗∗

Mea

ncr

iteri

onE

mpi

rica

lU

nit

.251

∗.2

48∗∗

.291

∗∗.3

16∗∗

.306

∗∗.3

38∗∗

.338

∗∗.9

60∗∗

Step

wis

e.1

77.2

24∗∗

.260

∗∗.3

04∗∗

.313

∗∗.3

80∗∗

.407

∗∗.8

80∗∗

Hyb

rid

Uni

t.2

17.2

30∗∗

.232

∗∗.2

47∗∗

.239

∗∗.2

67∗∗

.279

∗∗.9

80∗∗

Step

wis

e.1

87.2

63∗∗

.290

∗∗.3

10∗∗

.330

∗∗.3

87∗∗

.407

∗∗.8

90∗∗

Hor

izon

talp

erce

ntE

mpi

rica

lU

nit

.274

∗.2

62∗∗

.281

∗∗.3

18∗∗

.309

∗∗.3

33∗∗

.342

∗∗.9

70∗∗

Step

wis

e.1

68.2

49∗∗

.282

∗∗.2

92∗∗

.314

∗∗.3

80∗∗

.415

∗∗.9

00∗∗

Hyb

rid

Uni

t.2

33∗

.238

∗∗.2

34∗∗

.251

∗∗.2

50∗∗

.268

∗∗.2

79∗∗

.980

∗∗

Step

wis

e.1

59.2

51∗∗

.289

∗∗.2

98∗∗

.323

∗∗.3

88∗∗

.411

∗∗.9

00∗∗

Vert

ical

perc

ent

Em

piri

cal

Uni

t.2

73∗

.307

∗∗.3

09∗∗

.347

∗∗.3

32∗∗

.340

∗∗.3

41∗∗

.940

∗∗

5%St

epw

ise

.165

.220

∗∗.2

79∗∗

.319

∗∗.3

23∗∗

.362

∗∗.3

85∗∗

.860

∗∗

Hyb

rid

Uni

t.2

61∗

.277

∗∗.2

82∗∗

.299

∗∗.2

99∗∗

.297

∗∗.3

00∗∗

.980

∗∗

Step

wis

e.1

59.2

16∗∗

.282

∗∗.3

18∗∗

.333

∗∗.3

77∗∗

.396

∗∗.8

80∗∗

10%

Em

piri

cal

Uni

t.2

76∗

.293

∗∗.3

25∗∗

.344

∗∗.3

34∗∗

.339

∗∗.3

46∗∗

.910

∗∗

Step

wis

e.1

83.1

82∗

.244

∗∗.3

03∗∗

.284

∗∗.3

30∗∗

.342

∗∗.7

00∗∗

Hyb

rid

Uni

t.2

67∗

.263

∗∗.2

83∗∗

.297

∗∗.2

88∗∗

.292

∗∗.2

97∗∗

.970

∗∗

Step

wis

e.1

90.1

97∗

.276

∗∗.3

20∗∗

.328

∗∗.3

82∗∗

.405

∗∗.8

50∗∗

cont

inue

d

JEFFREY M. CUCINA ET AL. 409

TAB

LE

3(c

ontin

ued)

Cor

rela

tion

with

Scor

ing

Scal

ing

sim

ilar

Proc

edur

epr

oced

ure

proc

edur

en

=15

0n

=30

0n

=50

0n

=70

0n

=1,

000

n=

2,00

0n

=3,

515

met

hods

a

Med

ian

cros

s-va

lidity

Med

ian

Ass

igne

dw

eigh

tsE

mpi

rica

lU

nit

.260

∗.2

95∗∗

.316

∗∗.3

21∗∗

.313

∗∗.3

13∗∗

.304

∗∗.7

60∗∗

Step

wis

e.1

85.2

06∗

.266

∗∗.2

86∗∗

.297

∗∗.3

12∗∗

.303

∗∗.5

90∗∗

Hyb

rid

Uni

t.2

62∗

.261

∗∗.2

69∗∗

.271

∗∗.2

71∗∗

.267

∗∗.2

61∗∗

.920

∗∗

Step

wis

e.1

44.1

93∗

.260

∗∗.3

26∗∗

.329

∗∗.3

80∗∗

.408

∗∗.8

20∗∗

Net

wei

ghts

Em

piri

cal

Uni

t.2

77∗

.313

∗∗.3

20∗∗

.355

∗∗.3

45∗∗

.351

∗∗.3

60∗∗

.970

∗∗

Step

wis

e.1

49.1

82∗

.282

∗∗.2

98∗∗

.322

∗∗.3

72∗∗

.412

∗∗.9

10∗∗

Hyb

rid

Uni

t.2

70∗

.276

∗∗.2

89∗∗

.308

∗∗.3

06∗∗

.306

∗∗.3

15∗∗

.990

∗∗

Step

wis

e.1

86.2

47∗∗

.280

∗∗.3

16∗∗

.329

∗∗.3

81∗∗

.407

∗∗.9

30∗∗

Raw

wei

ghts

Em

piri

cal

Uni

t.2

93∗

.311

∗∗.3

31∗∗

.346

∗∗.3

47∗∗

.349

∗∗.3

51∗∗

.960

∗∗

Step

wis

e.1

80.2

02∗

.272

∗∗.3

04∗∗

.309

∗∗.3

62∗∗

.379

∗∗.8

70∗∗

Hyb

rid

Uni

t.2

77∗

.279

∗∗.2

94∗∗

.309

∗∗.3

09∗∗

.308

∗∗.3

17∗∗

.980

∗∗

Step

wis

e.1

76.2

02∗

.263

∗∗.3

19∗∗

.336

∗∗.3

63∗∗

.397

∗∗.9

30∗∗

Not

e.R

esul

tsfo

rth

edi

ffer

entk

eyin

gpr

oced

ures

are

pres

ente

dw

ithse

para

tero

ws

for

unit

wei

ghte

d(o

rsu

msc

ored

)an

dst

epw

ise

regr

essi

onw

eigh

ted

item

s.a V

alue

sin

this

colu

mn

repr

esen

tthe

med

ian

inte

rcor

rela

tion

betw

een

apa

rtic

ular

empi

rica

lsco

ring

met

hod

and

allo

fth

eot

her

scor

ing

met

hods

that

used

the

sam

esc

orin

gap

proa

ch(i

.e.,

empi

rica

lor

hybr

id)

and

item

-lev

elfa

mily

/sca

ling

proc

edur

e(i

.e.,

unit

orst

epw

ise)

.V

alue

sin

othe

rce

llsre

pres

entt

hem

edia

ncr

oss-

valid

ityat

each

sam

ple

size

,obt

aine

dus

ing

larg

esth

oldo

utsa

mpl

e(n

=1,

757)

;how

ever

,sig

nific

ance

valu

esar

eba

sed

onw

hatw

ould

beob

tain

edw

itha

smal

ler

hold

outs

ampl

eth

atw

ould

beon

e-ha

lfth

epa

rtic

ular

deve

lopm

enta

lsam

ple

size

(und

era

trip

lecr

oss-

valid

atio

nst

rate

gy).

∗ p<

.05;

∗∗p

<.0

1.

410 PERSONNEL PSYCHOLOGYTA

BL

E4

Trip

leC

ross

-Val

idit

ies

ofK

eyin

gP

roce

dure

sat

Diff

eren

tCom

bine

d(i

.e.,

Dev

elop

men

tala

ndH

oldo

ut)

Sam

ple

Size

s

Val

idity

coef

ficie

nts

Tri

ple

cros

s-va

lidat

ed

Rat

iona

lR

atio

nal

Em

piri

cal

Em

piri

cal

Hyb

rid

Hyb

rid

key

unit

key

step

wis

eke

yun

itke

yst

epw

ise

key

unit

key

step

wis

ew

eigh

ting

wei

ghtin

gw

eigh

ting

wei

ghtin

gw

eigh

ting

wei

ghtin

g

nM

ean

(SD

)M

ean

(SD

)M

ean

(SD

)M

ean

(SD

)M

ean

(SD

)M

ean

(SD

)

50.1

28(.

145)

.061

(.17

6).1

67(.

184)

.032

(.16

0).1

89(.

158)

.039

(.17

0)10

0.1

59(.

088)

.128

(.12

9).2

16(.

127)

.070

(.11

3).2

21(.

106)

.089

(.11

6)15

0.1

48(.

076)

.141

(.09

6).2

61(.

090)

.127

(.09

1).2

41(.

074)

.133

(.09

2)20

0.1

55(.

069)

.178

(.08

8).2

69(.

089)

.127

(.08

5).2

48(.

076)

.148

(.09

4)25

0.1

51(.

059)

.175

(.08

5).2

79(.

068)

.150

(.08

2).2

51(.

058)

.170

(.08

3)30

0.1

45(.

055)

.189

(.07

4).2

76(.

056)

.163

(.07

2).2

48(.

048)

.178

(.07

0)35

0.1

51(.

057)

.211

(.07

4).2

91(.

065)

.188

(.07

0).2

58(.

055)

.200

(.05

9)40

0.1

52(.

055)

.226

(.05

7).3

01(.

044)

.192

(.06

1).2

66(.

043)

.207

(.06

3)45

0.1

41(.

048)

.234

(.05

6).2

98(.

043)

.203

(.05

9).2

58(.

042)

.214

(.05

6)50

0.1

53(.

039)

.235

(.05

9).2

91(.

044)

.200

(.05

1).2

58(.

041)

.212

(.05

3)55

0.1

44(.

044)

.250

(.05

5).3

05(.

040)

.218

(.05

6).2

65(.

039)

.235

(.05

7)60

0.1

42(.

041)

.255

(.05

5).3

04(.

034)

.221

(.05

4).2

63(.

032)

.234

(.05

4)65

0.1

44(.

039)

.255

(.05

0).3

03(.

033)

.234

(.05

0).2

63(.

031)

.246

(.05

0)70

0.1

45(.

032)

.262

(.04

8).3

06(.

035)

.239

(.05

0).2

67(.

030)

.250

(.05

4)75

0.1

50(.

034)

.266

(.04

2).3

08(.

027)

.247

(.04

5).2

69(.

027)

.261

(.04

6)80

0.1

47(.

029)

.273

(.04

8).3

07(.

028)

.241

(.04

4).2

68(.

027)

.258

(.03

9)85

0.1

49(.

036)

.279

(.04

2).3

13(.

032)

.258

(.04

3).2

73(.

033)

.270

(.04

3)

cont

inue

d

JEFFREY M. CUCINA ET AL. 411

TAB

LE

4(c

ontin

ued)

Val

idity

coef

ficie

nts

Tri

ple

cros

s-va

lidat

ed

Rat

iona

lR

atio

nal

Em

piri

cal

Em

piri

cal

Hyb

rid

Hyb

rid

key

unit

key

step

wis

eke

yun

itke

yst

epw

ise

key

unit

key

step

wis

ew

eigh

ting

wei

ghtin

gw

eigh

ting

wei

ghtin

gw

eigh

ting

wei

ghtin

g

nM

ean

(SD

)M

ean

(SD

)M

ean

(SD

)M

ean

(SD

)M

ean

(SD

)M

ean

(SD

)

900

.151

(.03

3).2

87(.

041)

.316

(.02

7).2

59(.

042)

.276

(.02

9).2

80(.

041)

950

.143

(.03

1).2

74(.

039)

.306

(.02

6).2

59(.

042)

.267

(.02

6).2

71(.

036)

1,00

0.1

44(.

027)

.285

(.04

4).3

09(.

024)

.263

(.03

8).2

69(.

023)

.273

(.03

6)1,

050

.147

(.02

8).2

89(.

035)

.312

(.02

5).2

72(.

036)

.273

(.02

5).2

80(.

035)

1,10

0.1

53(.

026)

.294

(.03

4).3

12(.

024)

.272

(.03

6).2

75(.

023)

.288

(.03

5)1,

150

.150

(.02

9).3

05(.

035)

.316

(.02

1).2

83(.

036)

.276

(.02

1).2

99(.

031)

1,20

0.1

45(.

025)

.298

(.03

4).3

16(.

023)

.288

(.03

2).2

76(.

023)

.294

(.03

3)1,

250

.148

(.02

6).2

98(.

035)

.316

(.02

5).2

88(.

035)

.276

(.02

3).2

97(.

034)

1,30

0.1

47(.

024)

.303

(.03

3).3

14(.

020)

.289

(.02

7).2

75(.

020)

.301

(.02

8)1,

350

.144

(.02

4).3

07(.

032)

.316

(.01

9).2

95(.

030)

.274

(.01

9).3

03(.

031)

1,40

0.1

50(.

021)

.313

(.02

9).3

17(.

020)

.297

(.03

2).2

77(.

019)

.309

(.02

8)1,

450

.149

(.02

3).3

09(.

030)

.317

(.01

9).3

01(.

031)

.277

(.01

9).3

10(.

030)

1,50

0.1

44(.

021)

.316

(.03

1).3

16(.

016)

.304

(.02

5).2

76(.

017)

.310

(.02

7)1,

550

.150

(.02

2).3

16(.

029)

.319

(.01

7).3

08(.

029)

.280

(.01

8).3

16(.

030)

1,60

0.1

48(.

022)

.315

(.02

3).3

18(.

016)

.313

(.02

7).2

79(.

017)

.319

(.02

6)1,

650

.148

(.02

0).3

24(.

029)

.320

(.01

8).3

17(.

030)

.280

(.01

8).3

20(.

028)

1,70

0.1

47(.

022)

.323

(.02

7).3

17(.

017)

.311

(.02

7).2

77(.

017)

.323

(.02

7)1,

750

.150

(.01

8).3

23(.

025)

.321

(.01

5).3

17(.

024)

.281

(.01

4).3

24(.

024)

1,80

0.1

46(.

017)

.321

(.02

7).3

16(.

017)

.315

(.02

8).2

76(.

015)

.320

(.02

4)

cont

inue

d

412 PERSONNEL PSYCHOLOGY

TAB

LE

4(c

ontin

ued)

Val

idity

coef

ficie

nts

Tri

ple

cros

s-va

lidat

ed

Rat

iona

lR

atio

nal

Em

piri

cal

Em

piri

cal

Hyb

rid

Hyb

rid

key

unit

key

step

wis

eke

yun

itke

yst

epw

ise

key

unit

key

step

wis

ew

eigh

ting

wei

ghtin

gw

eigh

ting

wei

ghtin

gw

eigh

ting

wei

ghtin

g

nM

ean

(SD

)M

ean

(SD

)M

ean

(SD

)M

ean

(SD

)M

ean

(SD

)M

ean

(SD

)

1,85

0.1

46(.

019)

.324

(.02

6).3

17(.

015)

.320

(.02

7).2

77(.

015)

.325

(.02

5)1,

900

.150

(.01

7).3

26(.

024)

.319

(.01

4).3

21(.

024)

.279

(.01

4).3

29(.

025)

1,95

0.1

46(.

018)

.327

(.02

5).3

19(.

015)

.321

(.02

4).2

79(.

014)

.327

(.02

5)2,

000

.146

(.01

7).3

27(.

022)

.319

(.01

4).3

22(.

021)

.278

(.01

4).3

29(.

023)

2,05

0.1

44(.

016)

.328

(.02

5).3

17(.

015)

.323

(.02

7).2

77(.

014)

.328

(.02

5)2,

100

.144

(.01

7).3

30(.

020)

.318

(.01

4).3

32(.

022)

.277

(.01

4).3

34(.

021)

2,15

0.1

47(.

015)

.330

(.01

8).3

17(.

013)

.326

(.02

1).2

78(.

013)

.333

(.02

1)2,

200

.147

(.01

6).3

35(.

020)

.320

(.01

2).3

31(.

021)

.280

(.01

2).3

36(.

020)

2,25

0.1

46(.

017)

.335

(.01

9).3

19(.

014)

.328

(.02

1).2

79(.

013)

.332

(.02

1)2,

300

.145

(.01

7).3

37(.

021)

.320

(.01

1).3

36(.

020)

.280

(.01

2).3

39(.

019)

2,35

0.1

47(.

015)

.339

(.01

8).3

20(.

013)

.336

(.02

3).2

80(.

013)

.339

(.02

0)2,

400

.149

(.01

8).3

38(.

021)

.321

(.01

2).3

37(.

020)

.282

(.01

2).3

42(.

019)

2,45

0.1

47(.

015)

.339

(.02

1).3

21(.

012)

.342

(.01

9).2

81(.

012)

.344

(.01

6)2,

500

.145

(.01

5).3

38(.

018)

.320

(.01

3).3

38(.

021)

.280

(.01

3).3

41(.

020)

2,55

0.1

45(.

015)

.339

(.01

8).3

20(.

012)

.338

(.02

1).2

81(.

012)

.343

(.01

9)2,

600

.147

(.01

3).3

40(.

020)

.321

(.01

2).3

42(.

020)

.281

(.01

2).3

44(.

020)

2,65

0.1

48(.

013)

.342

(.02

1).3

22(.

012)

.343

(.01

9).2

82(.

012)

.345

(.01

8)

cont

inue

d

JEFFREY M. CUCINA ET AL. 413

TAB

LE

4(c

ontin

ued)

Val

idity

coef

ficie

nts

Tri

ple

cros

s-va

lidat

ed

Rat

iona

lR

atio

nal

Em

piri

cal

Em

piri

cal

Hyb

rid

Hyb

rid

key

unit

key

step

wis

eke

yun

itke

yst

epw

ise

key

unit

key

step

wis

ew

eigh

ting

wei

ghtin

gw

eigh

ting

wei

ghtin

gw

eigh

ting

wei

ghtin

g

nM

ean

(SD

)M

ean

(SD

)M

ean

(SD

)M

ean

(SD

)M

ean

(SD

)M

ean

(SD

)

2,70

0.1

45(.

014)

.343

(.01

8).3

21(.

012)

.346

(.01

7).2

81(.

012)

.349

(.01

8)2,

750

.147

(.01

3).3

46(.

018)

.321

(.01

1).3

45(.

019)

.282

(.01

0).3

48(.

016)

2,80

0.1

45(.

014)

.345

(.02

0).3

20(.

011)

.347

(.01

8).2

80(.

011)

.349

(.01

7)2,

850

.146

(.01

3).3

44(.

014)

.322

(.01

0).3

49(.

016)

.282

(.01

0).3

50(.

016)

2,90

0.1

47(.

012)

.346

(.01

8).3

20(.

010)

.348

(.01

6).2

81(.

010)

.349

(.01

6)2,

950

.149

(.01

2).3

48(.

016)

.322

(.01

0).3

50(.

016)

.283

(.01

0).3

52(.

015)

3,00

0.1

46(.

013)

.347

(.01

6).3

21(.

010)

.351

(.01

5).2

81(.

010)

.353

(.01

5)3,

050

.147

(.01

3).3

52(.

014)

.323

(.01

0).3

52(.

016)

.283

(.01

0).3

55(.

016)

3,10

0.1

47(.

013)

.350

(.01

3).3

24(.

008)

.355

(.01

4).2

84(.

009)

.356

(.01

4)3,

150

.147

(.01

2).3

50(.

015)

.321

(.00

9).3

52(.

016)

.282

(.00

9).3

55(.

015)

3,20

0.1

47(.

011)

.352

(.01

7).3

23(.

009)

.357

(.01

5).2

83(.

009)

.356

(.01

5)3,

250

.146

(.01

0).3

50(.

014)

.321

(.00

8).3

55(.

014)

.281

(.00

8).3

56(.

015)

3,30

0.1

47(.

011)

.353

(.01

4).3

22(.

008)

.358

(.01

5).2

83(.

008)

.359

(.01

4)3,

350

.147

(.00

9).3

51(.

014)

.322

(.01

0).3

56(.

015)

.283

(.00

9).3

58(.

014)

3,40

0.1

45(.

010)

.352

(.01

2).3

20(.

009)

.355

(.01

4).2

81(.

009)

.357

(.01

4)3,

450

.146

(.00

9).3

54(.

013)

.322

(.00

7).3

58(.

013)

.282

(.00

8).3

59(.

014)

3,50

0.1

45(.

010)

.353

(.01

3).3

20(.

008)

.358

(.01

4).2

81(.

008)

.358

(.01

4)3,

550

.147

(.01

0).3

54(.

013)

.322

(.00

7).3

59(.

012)

.283

(.00

8).3

61(.

011)

3,60

0.1

46(.

010)

.357

(.01

2).3

22(.

008)

.362

(.01

2).2

83(.

008)

.363

(.01

2)

cont

inue

d

414 PERSONNEL PSYCHOLOGYTA

BL

E4

(con

tinue

d)

Val

idity

coef

ficie

nts

Tri

ple

cros

s-va

lidat

ed

Rat

iona

lR

atio

nal

Em

piri

cal

Em

piri

cal

Hyb

rid

Hyb

rid

key

unit

key

step

wis

eke

yun

itke

yst

epw

ise

key

unit

key

step

wis

ew

eigh

ting

wei

ghtin

gw

eigh

ting

wei

ghtin

gw

eigh

ting

wei

ghtin

g

nM

ean

(SD

)M

ean

(SD

)M

ean

(SD

)M

ean

(SD

)M

ean

(SD

)M

ean

(SD

)

3,65

0.1

46(.

009)

.354

(.01

4).3

21(.

008)

.360

(.01

5).2

81(.

008)

.361

(.01

3)3,

700

.146

(.00

8).3

55(.

011)

.321

(.00

7).3

61(.

012)

.282

(.00

7).3

61(.

011)

3,75

0.1

46(.

009)

.357

(.01

1).3

23(.

007)

.362

(.01

2).2

83(.

007)

.364

(.01

1)3,

800

.148

(.00

7).3

60(.

011)

.323

(.00

7).3

64(.

014)

.284

(.00

7).3

65(.

011)

3,85

0.1

46(.

009)

.357

(.01

1).3

23(.

006)

.364

(.01

2).2

83(.

006)

.364

(.01

2)3,

900

.147

(.00

9).3

60(.

012)

.322

(.00

7).3

65(.

012)

.283

(.00

7).3

66(.

013)

3,95

0.1

44(.

009)

.360

(.01

2).3

22(.

007)

.366

(.01

1).2

83(.

006)

.365

(.01

1)4,

000

.148

(.00

8).3

62(.

011)

.324

(.00

7).3

68(.

010)

.285

(.00

7).3

69(.

010)

4,05

0.1

47(.

007)

.362

(.01

1).3

23(.

006)

.368

(.01

1).2

83(.

006)

.368

(.01

0)4,

100

.147

(.00

8).3

62(.

011)

.323

(.00

7).3

68(.

012)

.284

(.00

6).3

69(.

010)

4,15

0.1

47(.

007)

.361

(.01

1).3

23(.

006)

.368

(.01

0).2

84(.

006)

.370

(.01

0)4,

200

.147

(.00

8).3

62(.

010)

.323

(.00

5).3

68(.

010)

.284

(.00

6).3

69(.

011)

4,25

0.1

46(.

007)

.364

(.00

8).3

24(.

006)

.369

(.01

1).2

84(.

006)

.369

(.01

0)4,

300

.147

(.00

7).3

62(.

009)

.322

(.00

5).3

68(.

009)

.283

(.00

5).3

70(.

009)

4,35

0.1

47(.

006)

.363

(.01

0).3

23(.

006)

.370

(.00

9).2

84(.

006)

.371

(.00

9)4,

400

.147

(.00

6).3

64(.

009)

.323

(.00

5).3

70(.

010)

.284

(.00

5).3

70(.

009)

4,45

0.1

47(.

006)

.365

(.00

9).3

23(.

005)

.371

(.00

9).2

84(.

005)

.372

(.00

9)4,

500

.146

(.00

6).3

64(.

010)

.322

(.00

6).3

69(.

011)

.283

(.00

5).3

70(.

010)

4,55

0.1

46(.

006)

.364

(.00

7).3

23(.

004)

.371

(.00

7).2

83(.

004)

.372

(.00

7)4,

600

.148

(.00

5).3

66(.

008)

.323

(.00

4).3

72(.

008)

.284

(.00

3).3

73(.

009)

cont

inue

d

JEFFREY M. CUCINA ET AL. 415

TAB

LE

4(c

ontin

ued)

Val

idity

coef

ficie

nts

Tri

ple

cros

s-va

lidat

ed

Rat

iona

lR

atio

nal

Em

piri

cal

Em

piri

cal

Hyb

rid

Hyb

rid

key

unit

key

step

wis

eke

yun

itke

yst

epw

ise

key

unit

key

step

wis

ew

eigh

ting

wei

ghtin

gw

eigh

ting

wei

ghtin

gw

eigh

ting

wei

ghtin

g

nM

ean

(SD

)M

ean

(SD

)M

ean

(SD

)M

ean

(SD

)M

ean

(SD

)M

ean

(SD

)

4,65

0.1

46(.

005)

.365

(.00

9).3

23(.

005)

.372

(.00

9).2

84(.

004)

.373

(.00

8)4,

700

.147

(.00

5).3

69(.

008)

.323

(.00

4).3

74(.

009)

.284

(.00

4).3

74(.

008)

4,75

0.1

47(.

005)

.366

(.00

7).3

23(.

005)

.373

(.00

9).2

84(.

004)

.373

(.00

8)4,

800

.146

(.00

4).3

67(.

007)

.324

(.00

3).3

75(.

008)

.284

(.00

3).3

75(.

008)

4,85

0.1

47(.

004)

.368

(.00

8).3

23(.

003)

.375

(.00

7).2

84(.

003)

.375

(.00

7)4,

900

.147

(.00

4).3

69(.

007)

.323

(.00

3).3

74(.

007)

.284

(.00

3).3

76(.

006)

4,95

0.1

46(.

003)

.367

(.00

8).3

23(.

003)

.374

(.00

8).2

84(.

003)

.374

(.00

7)5,

000

.147

(.00

3).3

70(.

007)

.324

(.00

3).3

76(.

007)

.285

(.00

3).3

76(.

008)

5,05

0.1

47(.

003)

.370

(.00

7).3

23(.

003)

.375

(.00

7).2

84(.

003)

.377

(.00

7)5,

100

.147

(.00

3).3

70(.

006)

.323

(.00

3).3

77(.

007)

.284

(.00

2).3

77(.

006)

5,15

0.1

47(.

002)

.369

(.00

6).3

23(.

003)

.376

(.00

6).2

84(.

002)

.377

(.00

7)5,

200

.147

(.00

2).3

70(.

006)

.323

(.00

2).3

77(.

006)

.284

(.00

2).3

78(.

006)

5,25

0.1

47(.

001)

.371

(.00

5).3

23(.

002)

.378

(.00

6).2

84(.

001)

.379

(.00

6)5,

272

.147

(.00

0).3

71(.

006)

.323

(.00

2).3

78(.

006)

.284

(.00

1).3

79(.

006)

Not

e.V

alue

sin

cells

repr

esen

tth

em

ean

trip

lecr

oss-

valid

ity(o

rva

lidity

inth

eca

seof

ratio

nal

keyi

ngw

ithun

itw

eigh

ted

item

s)at

each

sam

ple

size

.T

hesa

mpl

esi

zes

repr

esen

tth

eto

tal

num

ber

ofca

ses

befo

reth

esa

mpl

ew

assp

litin

toth

irds

(for

the

trip

lecr

oss-

valid

atio

n).T

hest

anda

rdde

viat

ions

ofth

e10

0tr

iple

cros

s-va

liditi

esfo

rea

chre

plic

atio

nat

each

sam

ple

size

are

show

nin

pare

nthe

ses.

1E

mpi

rcal

and

hybr

idsc

ale

valid

ities

base

don

only

the

cros

s-va

lidat

ion

sam

ple

atea

chsa

mpl

esi

ze.

2R

atio

nal

scal

eva

liditi

esba

sed

onth

efu

llsa

mpl

e(i

.e.,

the

com

bina

tion

ofth

ecr

oss-

valid

atio

nan

dde

velo

pmen

tals

ampl

es)

atea

chsa

mpl

esi

ze.∗ p

<.0

5;∗∗

p<

.01.

416 PERSONNEL PSYCHOLOGY

Figure 2: The Triple Cross-Validities of the Main Different Types ofBiodata Scoring Are Plotted as a Function of Sample Size.

Note. The data points presented in this graph correspond to those presented in Table 4,which are the mean (across all 100 replications) triple cross-validities at each sample size.The sample sizes listed are the sizes of the total combined sample (i.e., the combineddevelopmental and cross-validation samples). Note that the rational unit-weighted method(which was the only method that does not capitalize on chance) was not triple cross-validated.

presented in six different panels, along with plots depicting the standarddeviation of triple cross-validity coefficients about each mean.

Research Question #1: Which Scoring Approach (i.e., Rational, Empirical,or Hybrid) Yields the Highest Criterion-Related Validity?

In general, rational keying with unit weighting tended to yieldcriterion-related validities that were much lower than the empirical andhybrid keying approaches. As seen in Table 4 and Figure 2, the rationalkey’s validity remained quite steady across the different sample sizes,hovering around .15. At sample sizes lower than about 1,600 cases, theempirical key with unit weights method yielded the highest triple cross-validities, with a few exceptions (mainly at the lowest sample sizes wherehybrid keying performed slightly better). Above that point, using step-wise regression to weight items (as opposed to unit weighting) led tohigher triple cross-validities. In general, at sample sizes larger than about1,600 cases, using hybrid keying with stepwise regression weighting ofitems yielded the highest criterion-related validities. However, this method

JEFFREY M. CUCINA ET AL. 417

Figure 3: Mean (Across all 100 Replications) Triple Cross-Validities Plottedas a Function of Sample Size.

Note. The plots are presented separately in six panels, one for each of the main typesof biodata scoring. The standard deviation around each mean (which were based on the100 replications) was computed for every sample size. Lines showing the upper and lowerbounds of an interval of plus or minus two standard deviations around each mean areplotted.

did not yield validities that were substantially larger than the rational key-ing with stepwise regression weights (at sample sizes between 1,650 and2,500 cases) and the empirical keying with stepwise regression weights(at sample sizes of 3,200 cases or larger).

As shown in Table 4 and Figure 3, the standard deviation of triplecross-validities was quite large at the smaller sample sizes. When usingunit weighting of items, the hybrid approach (although yielding lower

418 PERSONNEL PSYCHOLOGY

validities) had less variability in triple cross-validities than the empiricalapproach at sample sizes lower than about 750 cases. Thus, it appearsthat hybridization reduced the variability in the cross-validity coefficientsat the smaller sample sizes. At the larger samples (e.g., with 1,600 casesor more), hybridization did not have a substantial impact on the standarddeviation of triple cross-validities.

In summary, it appears that empirical keying with unit weights yieldsthe highest validities at sample sizes of 1,600 cases or less; however, hybridkeying with unit weights had less variability in validities at smaller samplesizes. At larger sample sizes, stepwise regression yielded the highest va-lidities and the hybrid keying approach tended to very slightly outperformthe empirical and rational approaches.

Research Question #2: Do the Different Empirical Keying Procedures (e.g.,Vertical Percent, Point Biserial, Mean Criterion, etc.) Have Different

Criterion-Related Validities?

In general, the different option-level empirical keying proceduresyielded similar cross-validities. In order to evaluate the differences incross-validities, each biodata scoring method was separately paired witheach of the remaining methods, and tests were conducted to evaluate thesignificance of the differences in cross-validities. The differences in mag-nitude of each pair of cross-validities for the different keying methodswere assessed using Meng, Rosenthal, and Rubin’s (1992) Z statistic3

for comparing two dependent correlation coefficients with a Bonferronicorrection to control for the large number of comparisons. In addition,the effect size of the difference between two correlation coefficients wasassessed using the q statistic, which is described by Cohen (1992). Ac-cording to Cohen (1992), the values of .10, .30, and .50 correspond tosmall, medium, and large effect sizes for the q statistic. No comparisonshad an effect size of .50 or higher or −.50 or lower.

The vast majority of the differences in cross-validities occurred be-tween the different approaches, classes, and the two families of item-levelweighting to biodata scoring, as opposed to the option-level families andmethods. For example, there were very few significant differences be-tween the cross-validities of the empirical keying methods (e.g., pointbiserial raw weights vs. vertical 5%, etc.) when the same item-levelweighting family (e.g., unit weighting of items) and approach (e.g., em-pirical approach) were used. In contrast, when the cross-validities of the

3Similar results have also been found in other studies where a single empirical keyfor a biodata instrument was applied across various organizations (e.g., Carlson, Scullen,Schmidt, Rothstein, & Erwin, 1999; Rothstein, Schmidt, Erwin, Owens, & Sparks, 1990).

JEFFREY M. CUCINA ET AL. 419

empirical keying procedures using one approach (e.g., the hybrid ap-proach) were compared to their counterparts using another approach (e.g.,the empirical approach), there were more significant differences. In sum-mary, it appears that the different option-level empirical keying methodsyielded similar criterion-related validities.

Research Question #3: Which Biodata Scoring Procedures ShouldPractitioners Use?

In general, there were very few differences between the option-levelempirical keying methods studied. All yielded similar cross-validities(with the exception of the mean criterion method at smaller sample sizes),and all were highly intercorrelated (see Research Question 4 below).Because there were little differences in the cross-validities of the empir-ical keying methods, any method should be justified for use. However,it might make sense for practitioners to use the simplest method of em-pirical keying. The vertical and horizontal percent families are somewhatcumbersome, as they require splitting the dataset into high and low cri-terion groups. The other families avoid this step, making them slightlymore efficient. Researchers might consider using the point biserial rawweights method because it can more easily be performed in statisticalprograms such as SPSS. In addition, point biserial correlation coefficientsare standard output in many item analysis programs. Weights for the meancriterion method can also easily be obtained; however, our results (as wellas those by Devlin et al., 1992) suggest that this method exhibits moreshrinkage for small sample sizes. This being said, the results of this studyshould not be taken as a recommendation that practitioners abandon theuse of the nonpoint biserial option-level keying methods. These methodsdo yield criterion-related validity and their use is justifiable, albeit morecumbersome than the point biserial raw weights method.

Research Question #4: Do the Different Biodata Scoring Procedures YieldSimilar (i.e., Highly Correlated) Scores?

In Phase 1, scores from the various keying procedures were positivelycorrelated in both developmental and cross-validation samples. The mag-nitude of the correlations was often quite high, ranging from the low.40s to the .90s for most of the procedures studied. In order to evaluatethe pattern of intercorrelations among the keying procedures, a PCA wasconducted for each replication at each sample size. The PCAs, in conjunc-tion with a bootstrap parallel analysis (using O’Connor’s, 2000, program),strongly suggest the presence of two components in the data. After bothvarimax and oblimin rotations, it appeared that one principal component

420 PERSONNEL PSYCHOLOGY

represented the sum scores and the second represented the stepwise scores.Furthermore, in the last column of Table 3, we present the median in-tercorrelations between each method and other similar methods. Theseintercorrelations were very high, typically in the .90s. Thus, it appearsthat the different scoring methods yield similar (i.e., highly correlated)scores.

Research Question #5: Does Sample Size Impact the Validity of the DifferentBiodata Scoring Approaches?

As shown in Figure 2, there is a relationship between sample sizeand triple cross-validities for the empirical and hybrid keying approacheswith unit weighting of items. A strong positive relationship exists up toabout 500 cases, after which the validities begin to asymptote. When step-wise regression is used to weigh items, the relationship between samplesize and validity is positive throughout the entire range of sample sizes;however, when more than about 1,600 cases are used, the increases invalidities associated with sample size begin to diminish. For example,when going from 50 to 1,600 cases, the hybrid keying with stepwise re-gression weighting of items method yields validities of .039 and .319,respectively (an increase of .280). Going from 1,600 cases to the fullsample size of 5,272 cases only increases the validity from .319 to .379(an increase of .060). Finally, as expected, the validity of the rationalkeying with unit weights method did not substantially depend on samplesize.

Research Question #6: What Are the Sample Size Requirements forEmpirical and Hybrid Keying?

In order to determine the sample size requirements for the keyingprocedures, we estimated the power of detecting a statistically significantcross-validity coefficient at each sample size. For each of the 100 replica-tions at every sample size, we compared the triple cross-validity coefficientto the critical value for a two-tailed significance test of a correlation co-efficient. We used the percentage of replications that were statisticallysignificant as an estimate of the power for that particular sample size.4

In Table 5, the minimum sample sizes (in increments of 50 cases) that

4As a check of the viability of our approach, we compared our estimated power for therational keying with unit weighting method and compared it to sample size requirementsin Cohen (1988) and found very similar values.

JEFFREY M. CUCINA ET AL. 421

TABLE 5Sample Size Requirements for Rational, Empirical, and Hybrid Keying

Power

80% 90%Validity at

nrequired Validity nrequired Validity n = 5,272

Rational key unit weightinga 400 .152 500 .153 .147Rational key stepwise weighting 300 .189 350 .211 .371Empirical key unit weighting 150 .261 200 .269 .323Empirical key stepwise weighting 300 .163 400 .192 .378Hybrid key unit weighting 150 .241 200 .248 .284Hybrid key stepwise weighting 300 .178 350 .200 .379

aEmpirical values are presented; similar values were obtained from a power analysis usingtables in Cohen (1988).

yielded powers of 80% and 90% are displayed for each of the six key-ing procedures along with the mean triple cross-validity coefficients atthose sample sizes. For comparison purposes, we present the validity atthe largest sample size in the right column of Table 5. We should notethat increasing the sample size beyond the minimum values in Table 5did increase validity, although the increase is curvilinear and begins toasymptote. Future biodata developers may wish to compute the changein utility for the job they are studying associated with increasing samplesizes beyond the minimum values we obtained.

Research Question #7: Does Hybrid-Keying Decrease the Sample SizeRequirements?

In general, when unit weighting of items was used, hybrid keyinghad some beneficial effects on criterion-related validity at smaller samplesizes; however, it did not substantially change the sample size requirementsneeded to obtain a specific level of validity. On average, empirical keyingwith unit weighting had higher criterion-related validities than hybridkeying, except at the two smallest sample sizes. However, there was morevariability in the empirical keying cross-validities at the smaller samplesizes. In essence, it appears that a biodata developer using empirical keyingfor a small sample has to take a “gamble” as the criterion-related validityvaries from sample to sample. Using hybrid keying reduces the variability;however, it also slightly reduces the average criterion-related validity. Asthe sample size increases, there is less variability in the criterion-relatedvalidities for empirical keying. Therefore, from a statistical standpoint,hybrid keying has some benefits at lower sample sizes but in general yields

422 PERSONNEL PSYCHOLOGY

lower criterion-related validity than empirical keying. When stepwiseregression weighting of items was used, hybrid keying did yield slightlyhigher validities than empirical keying. That said, stepwise regressiondid not outperform unit weighting of items until 1,600 cases were used.Thus, hybrid keying only decreased the sample sizes requirements whenstepwise regression and over 1,600 cases were used.

Discussion

This study extended research by Devlin et al. (1992) and Mitchelland Klimoski (1982) by comparing the validity of empirical, rational, andhybrid keying procedures at different sample sizes. In general, it appearsempirical keying yielded higher cross-validities than hybrid and rationalkeying at lower sample sizes when items were unit weighted. Hybridkeying increased the stability of cross-validities at low sample sizes butstill had a lower cross-validity (on average) when the sample size was verylow. When stepwise regression was used to differentially weight items atlarger sample sizes (over 1,600 cases), hybrid keying tended to slightlyoutperform the empirical and rational procedures, except at the largestsample sizes studied. One sobering finding was the low criterion-relatedvalidity obtained with rational keying. Similar to Mitchell and Klimoski,we found that empirical keying outperformed rational keying when itemsare unit weighted. The best estimate of the validity of the rational key(r = .147 from the largest sample) was much lower than the empiricalkey validity and would fall toward the bottom of third of Schmidt andHunter’s (1998) list of validities for selection instruments. This finding isconsistent with the criterion-related validity of other rationally developedself-report measures (e.g., rationally keyed personality measures and thepoint method of training and experience measures). As pointed out by ananonymous reviewer, it is also consistent with the established notion thatstatistical procedures have a history of outpredicting clinical judgments(which use a process that is similar to rational keying).

Hybrid scoring has been offered as a compromise between the em-pirical and rational approaches. Past researchers have advocated the useof hybrid procedures, yet little published empirical research exists onthis topic. In our study, hybrid scoring yielded validity coefficients thatlie between that of the empirical and rational approaches when itemswere unit weighted at sample sizes below 1,600 cases. Although hybridkeying did not increase criterion-related validity, it did reduce the vari-ability in cross-validities at lower sample sizes (compared to empiricalkeying).

Despite our empirical findings, there are several nonstatistical benefitsto the use of hybrid keying. First, if the scoring key for a biodata instrument

JEFFREY M. CUCINA ET AL. 423

was legally challenged, hybrid keying may be somewhat easier to defend,given the use of rationally guided option weights. Pure empirical option-level keying often results in at least a handful of response option weightsthat do not make rational sense. Although most of the response optionweights work correctly (presuming that the scoring key cross-validates),there are some that will be nonsensical. Using a hybrid approach preventsthese nonsensical weights from becoming part of the final inventory. Fur-thermore, response option weights that are nonsensical will have low facevalidity, even if they have predictive validity. Explaining how empiricalkeys are created to a layperson is a difficult task. We purport that thegreater the scoring key deviates from either lay or psychological theory,the lower the face validity of the scoring key (even if it has predictivevalidity). Thus, a purely empirical approach (although defensible) couldrequire work to educate opposing parties. Hybrid keying helps to ensurethat the scoring key makes sense and is interpretable and explainable. Italso can provide biodata developers with a better understanding of howbiodata items work. In essence, hybrid keying forces a biodata devel-oper to look closely at his or her items. This helps the biodata developerto learn about the items and build theories, which is helpful for futurebiodata inventory development.

Our study has important practical implications for biodata develop-ers. Using our results, biodata developers will be better informed whenmaking decisions regarding sample sizes. These types of decisions oftenrequire a balance between adequate power and minimizing burden on anorganization. In addition, given the large number of existing option-levelkeying methods that exist, a biodata developer often had little guidanceon which method to choose prior to our study. To address this problem, adecision tree is presented in Figure 4, which biodata developers can useto determine which scoring procedure to use (and the associated samplesize requirements) when developing a biodata inventory.

Most of our practical implications have been targeted toward personnelpsychologists. Psychologists in areas outside of I-O psychology might alsobe interested in this research, as several nonbiodata instruments (e.g., theMMPI) are developed using empirical procedures. Using the informationwe have provided on sample size requirements, scoring procedures, andso on, psychologists in other areas can use empirical approaches to refineexisting instruments or in the development of new instruments. Empiricalscoring approaches have and can be used for other types of instrumentsbesides biodata. For example, empirical scoring approaches can be usedwith situational judgment tests (e.g., Bergman, Drasgow, Donovan, &Juraska, 2003; Dalessio, 1994; Weekley & Jones, 1997) and personalityinventories (e.g., Cucina, 2007; Cucina, Vasilopoulos, & McElreath, 2008;Davis, 1997, 2001, 2002; Mead, 2000; Russell, personal communication,

424 PERSONNEL PSYCHOLOGY

Figure 4: Decision Tree for Determining Sample Size Requirements andIdentifying Biodata Scoring Methodology.

Note. The decision tree is intended to be a rough summary of our findings and a guide forpractitioners and biodata developers. As with any general guide, there are a number of otherfactors that could be taken into account when making decisions about developing a biodatascoring key. Thus, the decision tree should be viewed as a pedagogical tool as opposed toa set of absolute rules.

April 15, 2003; Thayer, 1977; Wallace, Clark, & Dry, 1956). They canalso be used in clinical and counseling settings and could be extended toareas of prediction outside of psychology.

There are a few limitations to this study. First, the biodata items usedwere designed to be representative of the different possible types of biodataitems and were developed based on theories and hypotheses about therelationships between life experiences and future job performance. Ourresults may not extend to other biodata item pools that are less theorybased, use a more narrow range of item types, or have a dramaticallydifferent sample. Another limitation was the use of incumbents, whoare less likely to fake compared to applicants. Furthermore, we did notinclude the configural scoring procedure in our study (Meehl, 1950; seealso Lubin & Osburn, 1957, 1960 and Osburn & Lubin, 1957) due tosample size limitations (at the lower sample sizes we studied) and the factthat this method is rarely seen in the academic and applied literature.

Future researchers may wish to compare the cross-validity of the quasi-rational (Mael & Hirsch, 1993) and factor analytic approaches to therational, empirical, and hybrid approaches. Furthermore, the construct

JEFFREY M. CUCINA ET AL. 425

validity of scores using different keying procedures could be explored.Although empirical keying yields higher criterion-related validities thanrational keying, it could be possible that rational keying yields scores withbetter construct validity. The fakability of different keying procedurescould also be investigated. Empirical keys can reduce score inflation dueto faking (Kluger, Reilly, & Russell, 1991; Mumford & Stokes, 1992);however, the impact of faking on hybrid measures has not been studied. Fi-nally, future researchers could further examine the validity of empiricallykeyed situational judgment and personality instruments.

REFERENCES

Aamodt MG, Pierce WL. (1987). Comparison of the rare response and vertical percentmethods for scoring the biographical information blank. Educational and Psycho-logical Measurement, 47, 505–511.

Bergman ME, Drasgow F, Donovan, MA, Juraska SE. (2003). Scoring of SituationalJudgment Tests. Paper presented at the 18th Annual Conference of the Society forIndustrial and Organizational Psychology, Orlando, FL.

Bergman ME, Drasgow F, Donovan, MA, Henning JB, Juraska SE. (2006). Scoring sit-uational judgment tests: Once you get the data, your troubles begin. InternationalJournal of Selection and Assessment, 14(3), 223–235.

Brown SH. (1994). Validating biodata. In Stokes GS, Mumford MD, Owens WA (Eds.),Biodata handbook. Palo Alto, CA: CPP.

Carlson KD, Scullen SE, Schmidt FL, Rothstein H, Erwin F. (1999). Generalizable bio-graphical data validity can be achieved without multi-organizational developmentand keying. PERSONNEL PSYCHOLOGY, 52(3), 731–755.

Cascio WF. (1982). Applied psychology in personnel management (2nd ed.). Reston, VA:Reston.

Cohen J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale,NJ: Erlbaum.

Cohen J. (1992). A power primer. Psychological Bulletin, 112(1), 155–159.Cucina JM. (2007, June). A comparison of alternative methods of scoring a broad-

bandwidth personality inventory to predict freshman GPA. International Public Man-agement Association for Human Resources Assessment Council (IPMAAC) StudentPaper Competition Winner, 2007. Paper presented at the 31st Meeting of IPMAAC,St. Louis, MO.

Cucina JM, Bayless JM, Thibodeaux HF, Busciglio HH, MacLane CN. (2009, April). Howto score biodata measures: A master tutorial. Master tutorial presented at the 24thMeeting of the Society for Industrial and Organizational Psychology, New Orleans,LA.

Cucina JM, Vasilopoulos NL, McElreath JM. (2008, April). Using empirical keying toscore personality measures. Paper presented at the 23rd Meeting of the Society forIndustrial and Organizational Psychology, San Francisco, CA.

Dalessio AT. (1994). Predicting insurance agent turnover using a video-based situationaljudgment test. Journal of Business and Psychology, 9, 23–32.

Davis B. (2001, June). Empirical keying of personality measures to maximize validity.Paper presented at the 25th Annual IPMAAC Conference on Professional PersonnelAssessment, Newport Beach, CA.

426 PERSONNEL PSYCHOLOGY

Davis BW. (1997). An integration of biographical data and personality research throughSherwood Forest empiricism: Robbing from personality to give to biodata. Unpub-lished doctoral dissertation, Louisiana State University.

Davis BW. (2002, April). Empirical keying an existing personality measure to improvepredictive validity. Paper presented at the 17th Annual Conference of the Societyfor Industrial and Organizational Psychology, Toronto, Canada.

Dean MA, Russell CJ. (2001, August). Bootstrap cross-validation efficiencies in person-nel selection. Paper presented at the Annual Academy of Management Meeting,Washington, DC.

Dean MA, Russell CJ, Farmer W. (2002). Noncognitive predictors of air traffic controllerperformance. In Eibfeldt H, Heil M, Broach D, (Eds.), Staffing the air traffic manage-ment (ATM) system: The selection of air traffic controllers (pp. 59–72). Brookfield,VT: Ashgate.

Devlin SE, Abrahams NM, Edwards JE. (1992). Empirical keying of biographical data:Cross-validity as a function of scaling procedure and sample size. Military Psychol-ogy, 4(3), 119–136.

Dunnette MD. (1962). Personnel management. Annual Review of Psychology, 13, 285–314.

England GW. (1961). Development and use of weighted application blanks. Minneapolis,MN: Industrial Relations Center, University of Minnesota.

England GW. (1971). Development and use of weighted application blanks. Minneapolis,MN: Industrial Relations Center, University of Minnesota.

Gandy JA, Dye DA, MacLane CN. (1994). Federal government selection: The IndividualAchievement Record. In Stokes GS, Mumford MD, Owens WA (Eds.), Biodatahandbook: Theory, research, and use of biographical information in selection andperformance prediction. Palo Alto, CA: CPP.

Gandy JA, Outerbridge AN, Sharf JC, Dye DA. (1989). Development and initial validationof the Individual Achievement Record (IAR). (PRD-90-01). Washington, DC: U.S.Office of Personnel Management.

Goldberg LR. (1972). Parameters of personality inventory construction and utilization: Acomparison of prediction strategies and tactics. Multivariate Behavioral ResearchMonographs, 72(2), 1–59.

Guion RM. (1965). Personnel testing. New York, NY: McGraw-Hill.Guttman L. (1941). An outline of the statistical theory of prediction. In Horst P (Ed.),

The prediction of personal adjustment. New York, NY: Social Science ResearchCouncil.

Hogan JB. (1994). Empirical keying of background data measures. In Stokes GS, MumfordMD, Owens WA (Eds.), Biodata handbook. Palo Alto, CA: CPP.

Hough L, Paullin C. (1994). Construct-oriented scale construction: The rational approach. InStokes GS, Mumford MD, Owens WA, (Eds.), Biodata handbook: Theory, research,and use of biographical information in selection and performance prediction. PaloAlto, CA: CPP.

Hunter JE, Hunter RF. (1984). Validity and utility of alternative predictors of job perfor-mance. Psychological Bulletin, 96(1), 72–98.

Karas M, West J. (1999) Construct-oriented biodata development for selection to a differ-entiated performance domain. International Journal of Selection and Assessment,7, 86–96.

Kelley TL. (1939). The selection of upper and lower groups for the validation of test items.Journal of Educational Psychology, 30(1), 17–24.

Kluger AN, Reilly RR, Russell CJ. (1991). Faking biodata tests: Are option-keyed instru-ments more resistant? Journal of Applied Psychology, 76(6), 889–896.

JEFFREY M. CUCINA ET AL. 427

Lecznar WB, Dailey JT. (1950). Keying biographical inventories in classification testbatteries. American Psychologist, 5, 279.

Lefkowitz J, Gebbia MI, Balsam T, Dunn L. (1999). Dimensions of biodata and their rela-tionships to item validity. Journal of Occupational and Organizational Psychology,72, 331–350.

Lubin A, Osburn HG. (1957). A theory of pattern analysis for the prediction of a quantitativecriterion. Psychometrika, 22(1), 63–73.

Lubin A, Osburn HG. (1960). The use of configural analysis for the prediction of a quali-tative criterion. Educational & Psychological Measurement, 20, 275–282.

Mael FA, Hirsch AC. (1993). Rainforest empiricism and quasi-rationality: Two approachesto objective biodata. PERSONNEL PSYCHOLOGY, 46(4), 719–738.

Malone MP. (1978). Predictive efficiency and discriminatory impact of verifiable biograph-ical data as a function of data analysis procedure. Unpublished doctoral dissertation,University of Minnesota.

May K, Hittner JB. (1997). Tests for comparing dependent correlations revisited: A MonteCarlo study. Journal of Experimental Education, 65(3), 257–270.

Mead AD. (2000). Properties of a resampling validation technique for empirically scoringpsychological assessments. Unpublished doctoral dissertation, University of Illinoisat Urbana-Champaign.

Meehl PE. (1950). Configural scoring. Journal of Consulting Psychology, 14, 165–171.

Meng XL, Rosenthal R, Rubin DB. (1992). Comparing correlated correlation coefficients.Psychological Bulletin, 111(1), 172–175.

Miner JB. (1965). Studies in management education. New York, NY: Springer.Mitchell TW, Klimoski RJ. (1982). Is it rational to be empirical? A test of meth-

ods for scoring biographical data. Journal of Applied Psychology, 67(4), 411–418.

Mumford MD, Owens WA. (1987). Methodology review: Principles, procedures, andfindings in the application of background data measures. Applied PsychologicalMeasurement, 11(1), 1–31.

Mumford MD, Stokes GS. (1992). Developmental determinants of individual action: The-ory and practice in applying background measures. In Dunnette MD, Hough LM(Eds.), Handbook of industrial and organizational psychology. Palo Alto, CA: Con-sulting Psychologists Press.

O’Connor BP. (2000). SPSS, SAS, and MATLAB programs for determining the numberof components using parallel analysis and Velicer’s MAP test. Behavior ResearchMethods, Instruments, & Computers, 32, 396–402.

Olson-Buchanan JB, Drasgow F, Moberg PJ, Mead AD, Keenan PA, Donovan MA.(1998). Interactive video assessment of conflict resolution skills. PERSONNEL

PSYCHOLOGY, 51, 1–24.Osburn HG, Lubin A. (1957). The use of configural analysis for the evaluation of test

scoring methods. Psychometrika, 22(4), 359–371.Reiter-Palmon R, Connelly MS. (2000). Item selection counts: A comparison of empirical

key and rational scale validities in theory-based and non-theory-based item pools.Journal of Applied Psychology, 85(1), 143–151.

Rothstein HR, Schmidt FL, Erwin FW, Owens WA, Sparks CP. (1990). Biographical datain employment selection: Can validities be made generalizable? Journal of AppliedPsychology, 75(2), 175–184.

Schmidt FL, Hunter JE. (1998). The validity and utility of selection methods in personnelpsychology: Practical and theoretical implications of 85 years of research findings.Psychological Bulletin, 124(2), 262–274.

428 PERSONNEL PSYCHOLOGY

Schmidt FL, Hunter JE, Ury VW. (1976). Statistical power in criterion-related validationstudies. Journal of Applied Psychology, 61(4), 473–485.

Schoenfeldt LF. (1999). From dust bowl empiricism to rational constructs in biographicaldata. Human Resource Management Review, 9, 147–167.

Sharf JC. (1994). The impact of legal and equal employment opportunity issues on per-sonal history inquiries. In Stokes GS, Mumford MD, Owens WA (Eds.), Biodatahandbook: Theory, research, and use of biographical information in selection andperformance prediction. Palo Alto, CA: CPP.

Stead WH, Shartle CL. (1940). Occupational counseling techniques: Their developmentand application. New York, NY: American.

Stokes GS, Hogan JB, Snell AF. (1993). Comparability of incumbent and applicant samplesfor the validation of biodata keys: The influence of social desirability. PERSONNEL

PSYCHOLOGY, 46, 739–762.Stokes GS, Searcy CA. (1999). Specification of scales in biodata form development: Ra-

tional vs. empirical and global vs. specific. International Journal of Selection andAssessment, 7(2), 72–85.

Strong EK. (1926). An interest test for personnel managers. Journal of Personnel Research,5, 194–203.

Taylor CW, Ellison RL. (1967). Biographical predictors of scientific performance. Science,155(3766), 1075–1080.

Telenson PA, Alexander RA, Barrett GV. (1983). Scoring the biographical informationblank: A comparison of three weighting techniques. Applied Psychological Mea-surement, 7(1), 73–80.

Thayer PW. (1977). Somethings old, somethings new. PERSONNEL PSYCHOLOGY, 30,513–524.

Tompkins SS, Miner JB. (1957). The Tompkins-Horn picture arrangement test. New York,NY: Springer.

U.S. Department of Labor. (1970). Manual for the USES general aptitude test battery.Section III: Development. Washington, DC: Author.

U.S. Department of Labor. (1974). Descriptive rating scale (Form MA-7–66, Rev. 3–74).Washington, DC: U.S. Department of Labor, Manpower Administration.

Wallace SR, Clarke WV, Dry RJ. (1956). The activity vector analysis as a selector of lifeinsurance salesmen. PERSONNEL PSYCHOLOGY, 9, 337–344.

Weekley JA, Jones C. (1997). Video-based situational testing. PERSONNEL PSYCHOLOGY,50, 25–49.

Copyright of Personnel Psychology is the property of Wiley-Blackwell and its content may not be copied or

emailed to multiple sites or posted to a listserv without the copyright holder's express written permission.

However, users may print, download, or email articles for individual use.