5
The interpretation of the EPQ Lie scale scores under honest and faking instructions: A multiple-group IRT-based analysis Pere J. Ferrando * , Cristina Anguiano-Carrasco Universidad ‘Rovira i Virgili’, Facultad de Psicologia, Campus Sant Pere de Sacelades, C/Carretera de Valls, s/n 43007 Tarragona, Spain article info Article history: Received 2 June 2008 Received in revised form 9 December 2008 Accepted 15 December 2008 Available online 12 January 2009 Keywords: Lie scale Theta-shift hypothesis Instructed faking Multiple-group analysis abstract This study proposes and assesses a model-based general hypothesis on the interpretation of the EPQ Lie scores when obtained under standard conditions and under instructed faking. The base model is a multi- ple-group item-response-theory (IRT) model that is parameterized here as a factor-analytic model. A Spanish translation of the Lie scale was administered to a total group of 762 undergraduate students under standard instructions (401 respondents) and under faking-good instructions (361 respondents). Preliminary results at the total-score level were in agreement with previously reported evidence. Results obtained with the model-based analysis agreed with the general hypothesis, and suggested, among other things, that: (a) the Lie scores measure a common unitary factor under both types of conditions, (b) the measurement characteristics of the items remain essentially invariant in both conditions, and (c) the fac- tor means change substantially in the expected direction. Limitations of the study, implications of the results, and future lines of research are discussed. Ó 2008 Elsevier Ltd. All rights reserved. 1. Introduction There is still a ‘‘situational vs. traited” controversy regarding the interpretation of the scores on the ‘‘Lie” scales that most personal- ity batteries include. According to the ‘‘situational” position, these scores mainly reflect temporary reactions that are a function of the demands of the testing situation and that are not related to sub- stantive personality variance (e.g. Ellingson, Sackett, & Hough, 1999). On the other hand, the ‘‘traited” position considers that these scores reflect substantive personality characteristics that have a certain degree of consistency across time and situations (e.g. Furnham, 1986; McCrae & Costa, 1983; McFarland & Ryan, 2000). Lie scales in Eysenck’s questionnaires, the measures consid- ered here, have attracted a good deal of attention regarding this controversy (Furnham, 1986). In particular, this study is based on the 21-item Lie scale which is included in the Eysenck question- naires for adults (EPQ-A, Eysenck & Eysenck, 1975, and EPQ-R; Ey- senck, Eysenck, & Barrett, 1985). While most Lie scales were developed from a purely ‘‘situa- tional” position, with little regard to theory, and with the sole aim of detecting dissimulation, the EPQ Lie scale was developed using factor-analytic procedures and with the explicit aim of mea- suring a unitary trait (Eysenck & Eysenck, 1976, chap. 11). In agree- ment with this development, empirical evidence shows that the EPQ Lie scale has a clear positive-manifold factor structure which is essentially invariant across gender, and that its scores have an acceptable degree of internal consistency (Eysenck & Eysenck, 1975, 1976; Ferrando, Chico, & Lorenzo, 1997). Overall, when the Lie scale is administered using standard instructions and under neutral conditions in which there is low motivation for faking, there seems to be agreement that its scores measure essentially a unitary personality trait (Ferrando et al., 1997; Katz & Francis, 1991; Loo, 1995; Lajunen & Scherler, 1999). However, such consen- sus no longer exists regarding the labelling and conceptualization of this trait. Paulhus (1991) considers it as a variable of propensity to deliberately create a more positive image, and identifies it with the impression management component of social desirability. Ey- senck and Eysenck (1975, 1976) conceptualize this trait as con- formism or ‘‘social naiveté”. McCrae and Costa (1983) consider that the Lie scores measure need for approval. Finally some authors believe that the trait can be better conceptualized as lack of self in- sight (Dicken, 1963; Eysenck, Eysenck, & Shaw, 1974). Interpretation of Lies scores under faking-good-motivation con- ditions (in high-stakes assessment or under appropriate instruc- tions) is more complex. Eysenck and Eysenck (1975, 1976) hypothesized that, in this case, the Lie scale behaves as it should and serves to detect dissimulation. This double interpretation is quite general, and allows different hypotheses to be considered. The most complex scenario would be that, in this case, the scale would measure a different factor (or perhaps more than one) with different item measurement properties (Michaelis & Eysenck, 1971). Conceptually this last point would mean that, under fak- ing-motivation conditions, respondents would attach a different 0191-8869/$ - see front matter Ó 2008 Elsevier Ltd. All rights reserved. doi:10.1016/j.paid.2008.12.013 * Corresponding author. Tel.: +34 977558079; fax: +34 977558088. E-mail address: [email protected] (P.J. Ferrando). Personality and Individual Differences 46 (2009) 552–556 Contents lists available at ScienceDirect Personality and Individual Differences journal homepage: www.elsevier.com/locate/paid

The interpretation of the EPQ Lie scale scores under honest and faking instructions: A multiple-group IRT-based analysis

Embed Size (px)

Citation preview

Page 1: The interpretation of the EPQ Lie scale scores under honest and faking instructions: A multiple-group IRT-based analysis

Personality and Individual Differences 46 (2009) 552–556

Contents lists available at ScienceDirect

Personality and Individual Differences

journal homepage: www.elsevier .com/locate /paid

The interpretation of the EPQ Lie scale scores under honest and fakinginstructions: A multiple-group IRT-based analysis

Pere J. Ferrando *, Cristina Anguiano-CarrascoUniversidad ‘Rovira i Virgili’, Facultad de Psicologia, Campus Sant Pere de Sacelades, C/Carretera de Valls, s/n 43007 Tarragona, Spain

a r t i c l e i n f o a b s t r a c t

Article history:Received 2 June 2008Received in revised form 9 December 2008Accepted 15 December 2008Available online 12 January 2009

Keywords:Lie scaleTheta-shift hypothesisInstructed fakingMultiple-group analysis

0191-8869/$ - see front matter � 2008 Elsevier Ltd. Adoi:10.1016/j.paid.2008.12.013

* Corresponding author. Tel.: +34 977558079; fax:E-mail address: [email protected] (P.J. Fer

This study proposes and assesses a model-based general hypothesis on the interpretation of the EPQ Liescores when obtained under standard conditions and under instructed faking. The base model is a multi-ple-group item-response-theory (IRT) model that is parameterized here as a factor-analytic model. ASpanish translation of the Lie scale was administered to a total group of 762 undergraduate studentsunder standard instructions (401 respondents) and under faking-good instructions (361 respondents).Preliminary results at the total-score level were in agreement with previously reported evidence. Resultsobtained with the model-based analysis agreed with the general hypothesis, and suggested, among otherthings, that: (a) the Lie scores measure a common unitary factor under both types of conditions, (b) themeasurement characteristics of the items remain essentially invariant in both conditions, and (c) the fac-tor means change substantially in the expected direction. Limitations of the study, implications of theresults, and future lines of research are discussed.

� 2008 Elsevier Ltd. All rights reserved.

1. Introduction

There is still a ‘‘situational vs. traited” controversy regarding theinterpretation of the scores on the ‘‘Lie” scales that most personal-ity batteries include. According to the ‘‘situational” position, thesescores mainly reflect temporary reactions that are a function of thedemands of the testing situation and that are not related to sub-stantive personality variance (e.g. Ellingson, Sackett, & Hough,1999). On the other hand, the ‘‘traited” position considers thatthese scores reflect substantive personality characteristics thathave a certain degree of consistency across time and situations(e.g. Furnham, 1986; McCrae & Costa, 1983; McFarland & Ryan,2000). Lie scales in Eysenck’s questionnaires, the measures consid-ered here, have attracted a good deal of attention regarding thiscontroversy (Furnham, 1986). In particular, this study is based onthe 21-item Lie scale which is included in the Eysenck question-naires for adults (EPQ-A, Eysenck & Eysenck, 1975, and EPQ-R; Ey-senck, Eysenck, & Barrett, 1985).

While most Lie scales were developed from a purely ‘‘situa-tional” position, with little regard to theory, and with the soleaim of detecting dissimulation, the EPQ Lie scale was developedusing factor-analytic procedures and with the explicit aim of mea-suring a unitary trait (Eysenck & Eysenck, 1976, chap. 11). In agree-ment with this development, empirical evidence shows that theEPQ Lie scale has a clear positive-manifold factor structure which

ll rights reserved.

+34 977558088.rando).

is essentially invariant across gender, and that its scores have anacceptable degree of internal consistency (Eysenck & Eysenck,1975, 1976; Ferrando, Chico, & Lorenzo, 1997). Overall, when theLie scale is administered using standard instructions and underneutral conditions in which there is low motivation for faking,there seems to be agreement that its scores measure essentiallya unitary personality trait (Ferrando et al., 1997; Katz & Francis,1991; Loo, 1995; Lajunen & Scherler, 1999). However, such consen-sus no longer exists regarding the labelling and conceptualizationof this trait. Paulhus (1991) considers it as a variable of propensityto deliberately create a more positive image, and identifies it withthe impression management component of social desirability. Ey-senck and Eysenck (1975, 1976) conceptualize this trait as con-formism or ‘‘social naiveté”. McCrae and Costa (1983) considerthat the Lie scores measure need for approval. Finally some authorsbelieve that the trait can be better conceptualized as lack of self in-sight (Dicken, 1963; Eysenck, Eysenck, & Shaw, 1974).

Interpretation of Lies scores under faking-good-motivation con-ditions (in high-stakes assessment or under appropriate instruc-tions) is more complex. Eysenck and Eysenck (1975, 1976)hypothesized that, in this case, the Lie scale behaves as it shouldand serves to detect dissimulation. This double interpretation isquite general, and allows different hypotheses to be considered.The most complex scenario would be that, in this case, the scalewould measure a different factor (or perhaps more than one) withdifferent item measurement properties (Michaelis & Eysenck,1971). Conceptually this last point would mean that, under fak-ing-motivation conditions, respondents would attach a different

Page 2: The interpretation of the EPQ Lie scale scores under honest and faking instructions: A multiple-group IRT-based analysis

P.J. Ferrando, C. Anguiano-Carrasco / Personality and Individual Differences 46 (2009) 552–556 553

meaning to the items. Evidence revised below, however, suggests itis more reasonable to start with a more parsimonious hypothesis.As a base hypothesis we shall consider here the ‘‘theta-shift” model(e.g. Zickar & Drasgow, 1996). First we assume that the Lie scoresmeasure the same common trait under neutral and faking-moti-vating conditions, and that the item measurement properties re-main invariant. Second, we assume that, under faking-motivatingconditions, respondents are able to respond to the items as if theyhave a trait value different from their true value. So, under theseconditions, the true-trait level is temporarily changed to providemore socially desirable item scores. Methodologically, the hypoth-esis stated so far concerns several aspects of measurement andstructural invariance across the two conditions. The assumptionthat the same trait is measured under both conditions refers tothe aspect of Configurational Invariance. Furthermore, the assump-tion that the item measurement properties remain invariant refersto the aspect of Measurement Invariance (see e.g. Millsap & Mere-dith, 2007). Finally, the assumption that the trait levels changeacross both conditions implies non-invariance of the structuralparameters (trait means and perhaps trait variances). This lastassumption concerns the aspect of Structural Invariance.

So far, empirical evidence on the measurement and structuralinvariance issues of the Lie scale has mainly been based on analy-ses at the total-score level. Most of the assessment of measurementinvariance has been based on reliability estimates, and the resultsare generally consistent. Internal consistency of the Lie scores re-mains relatively high and is quite stable across different levels offaking conditions (Eysenck et al., 1974; Eysenck & Eysenck,1976). In addition the exploratory factor analysis which Michaelisand Eysenck (1971) carried out in high and low motivation groupssuggested a very similar factor pattern in both groups.

As for structural changes, evidence clearly suggests that wheninstructed to fake good, or under highly motivating conditionsfor doing so, respondents are able to substantially modify theirLie scores (as much as one standard deviation or more) in a moreconformist and socially desirable direction (Furnham, 1986;Michaelis & Eysenck, 1971). Results concerning variances are lessclear. It is possible that faking-motivating conditions affect differ-ent respondents differently, thus increasing inter-individual differ-ences, and so score variance (McCrae & Costa, 1983; McFarland &Ryan, 2000). However, it is also possible that they tend to givemore stereotyped and similar answers for some items and so in-crease the homogeneity of the scores (Cowles, Darling, & Skanes,1992). In general results tend to support the first interpretation,and somewhat larger dispersions are generally obtained under fak-ing-good motivating conditions (Cowles et al., 1992; Eysenck et al.,1974).

While the evidence discussed above is valuable, it has clear lim-itations. First it does not assess measurement invariance at theitem level (some items might remain invariant whereas othersdo not). Second, the assessment is purely descriptive and doesnot provide evidence regarding whether the scale measures thesame or different traits under the different conditions. Finally, dif-ferences in means and specially variances might reflect restriction-of-range (floor and ceiling) effects, and the reliability estimates aredependent on score variances.

The present study deals with the interpretation of the Lie scoresunder neutral and faking-good conditions, and assesses the mea-surement and structural invariance aspects derived from the the-ta-shift hypothesis. The study uses a two-group situationaldesign based on standard and instructed faking-good conditions.It aims to extend previous evidence and makes three main contri-butions with respect to previous studies. First, the analysis is not atthe scale level but at the item level. Second, the study is model-based and the appropriateness of the base theta-shift hypothesisis assessed by conducting a model-data fit investigation. Third,

the model allows for a separate and detailed analysis of the mea-surement and structural components of the hypothesis.

1.1. The model and rationale

The model used in the present study is the multiple-groupextension of the two-parameter normal-ogive item-response-theory (IRT) model (Muthén & Christoffersson, 1981), which is for-mulated using an underlying-variables factor-analytic parameteri-zation. The description that follows is non-technical and focuses onthe usefulness of the model as well as on the meaning and inter-pretation of the model parameters and results. Technical discus-sions can be found in Muthén and Christoffersson (1981) andMillsap and Tein (2004).

Standard factor analysis (FA) is a model for continuous andunlimited variables so it cannot be correctly applied to binary itemresponses. Although in practice it could work well in some cases, inothers, particularly when some of the items are extreme, it tends togive rise to differential attenuation problems because of the differ-ent proportions of item endorsement. A more appropriate ap-proach in this case is to assume that there are continuous-unlimited latent response variables which underlie the observedbinary responses, and that the standard FA model holds for theseresponse variables. Furthermore, the observed binary responsesare assumed to arise by dichotomization of the latent responsesat a certain item threshold. This approach partly avoids the differ-ential attenuation problems, gives more correct and less biasedestimates of the factor loadings, and provides a more accurateassessment of the appropriateness of the model.

The multiple-group extension of the normal-ogive model for-mulated as an underlying-variables FA model is now described.Consider the Lie scale, made up of 21 binary-scored items. It is as-sumed that the items measure a common personality trait h. Thescale is administered in two-groups: group 1, standard, honestinstructions, and group 2, faking-good instructions. Let xðgÞij be theobserved binary score of respondent i, belonging to group g, onitem j and let x�ðgÞj be the corresponding underlying response vari-able. The standard one-common factor model is assumed to holdfor the underlying responses:

x�ðgÞij ¼ kðgÞj hi þ eðgÞij

where the parameters kðgÞj ’s are the factor loadings, and the eðgÞj ’s arethe measurement errors.

The relation between the observed binary response and the cor-responding underlying variable is governed by the following stepfunction:

xðgÞij ¼1; ifx�ðgÞij > sðgÞj

0; otherwise

( )

where sðgÞj is a threshold parameter which can be interpreted as anindex of item location or item difficulty. Indeed, the higher the va-lue of sðgÞj , the higher the trait level must be to endorse the item.

Overall, each item is characterized by three measurementparameters: the threshold, the factor loading and the variance ofthe measurement errors. These parameters indicate how the itemfunctions as a measure of the trait. Therefore, if these three itemparameters are found to be invariant in both groups, this meansthat this item functions in the same form as a measure under stan-dard and faking conditions. If this result is obtained for all of theitems, it is interpreted as the Lie scale measuring the common traitin the same form under both conditions. This complete form ofmeasurement invariance based on the FA model is known as strictfactorial invariance (Millsap & Meredith, 2007).

It has been found in applied research that strict factorial invari-ance is hard to obtain. In well-designed personality scales, invari-

Page 3: The interpretation of the EPQ Lie scale scores under honest and faking instructions: A multiple-group IRT-based analysis

554 P.J. Ferrando, C. Anguiano-Carrasco / Personality and Individual Differences 46 (2009) 552–556

ance of the item thresholds and loadings is an attainable goal.However the additional condition of invariant error variances ismore difficult to achieve (Muthén & Lehman, 1985). Conceptually,the item-error variance is thought to reflect random disturbancefluctuations, including incidental behaviours that arise in thetest-taking situation and minor individual differences. Thesedeterminants may differ in both groups (Muthén & Lehman,1985). The condition of invariant thresholds and loadings, but dif-ferent error variances is known as strong factorial invariance (Mill-sap & Meredith, 2007). If this condition is achieved, the result isinterpreted as the item measuring the same trait in both groups,with the same item-trait relations, but with different degree ofprecision.

Analysis of a model such as the one described is usually two-stages. In the first stage the model is fitted separately in bothgroups. Provided that the unidimensional solution holds in bothgroups with similar factor structures, we proceed to the secondstage which is the multiple-group analysis. First the strong-invari-ance solution is assessed. If an acceptable fit is reached, the nestedmore restricted strict-invariance model is then fitted.

The structural parameters of the model are the means and vari-ances of h in each group. In the present study they are identifiedand estimated by fixing the factor mean and variance to 0 and 1,respectively in group 1. The means and variances in group 2 canthen be identified and estimated relative to the fixed values ingroup 1 (Millsap & Tein, 2004; Muthén & Christoffersson, 1981).Mean differences in the expected direction would support the the-ta-shift hypothesis. Larger variance estimates in group 2 wouldsupport the hypothesis of increasing differentiation under faking-motivating conditions. In both cases the structural estimates canbe considered to be more valid indicators of the ‘true’ respondents’levels than the raw scores, given that the first are not potentiallyaffected by restriction-of-range effects.

Table 1Descriptive statistics at the scale score level.

G1: Standard (N1 = 401) G2: Faking (N2 = 361)

Mean 10.43 (10.13; 10.73) 18.71 (18.48; 18.95)SD 3.66 (3.43; 3.94) 2.71 (2.53; 2.93)q̂xx 0.74 (0.71; 0.77) 0.82 (0.80; 0.85)ES 2.57

Note: 90% Confidence intervals within brackets; q̂xx: estimated reliability (Cron-bach’s a); ES: standardized effect size (Cohen’s d).

2. Method

2.1. Participants and procedure

Respondents were 762 undergraduate students from the Psy-chology and Social Sciences faculties of a Spanish university, andwere assigned to two-groups: group 1 (N = 401; 318 female, meanage 21.75): standard, honest instructions, and group 2 (N = 361;305 female, mean age 22.13): faking-good instructions. The ques-tionnaires were administered in paper and pencil version by thesame person in all cases, and were completed voluntarily in class-room groups of 25–60 students. The administration was anony-mous, and the respondents had to provide only two particulars:gender and age.

In group1, the instructions given before administration werethose provided in the Spanish EPQ manual (Seisdedos, 1984). Theyemphasize that there are no good or bad answers and adviserespondents to give honest answers without being too elaborate.In group 2 we used the instructions detailed in Eysenck et al.(1974). The respondent is instructed to imagine him – or herselfin the place of a job applicant who is applying for a job that he/she really wants. S/he should try to give a good impression whenanswering by putting what s/he thinks the employer would likehim/her to be regardless of the truthful answer.

2.2. Measures

The study used the 21-item Lie scale of a Spanish translation ofthe EPQ-R (Aguilar, Tous, & Andrés, 1990). The number and order ofthe items in the questionnaire corresponds exactly to those of theoriginal version (Eysenck et al., 1985).

3. Results

The analysis of data proceeded in three stages. In the first stage,the analyses were at the total-scale score level. Means, variancesand reliabilities were estimated in each group, and the results werecompared to those obtained in previous studies. The other two-stages dealt with the FA model and were those described above:separate-group analysis and multiple-group analyses. In the multi-ple-group stage, once the level of measurement invariance thatwas supported by the data was determined, the structural hypoth-eses on group mean and variance factor differences were assessed.

3.1. Preliminary analyses

Table 1 shows the descriptive statistics and reliability estimatesof the total-scale scores in both groups. Results were obtained withall the items scored in the same direction, so that a larger scoremeans a more socially desirable responding. First, as far as themeans are concerned, the results agree with the existing evidence.The mean in group 1 (10.43) is not far from the values usuallyfound in normal-range samples when using the EPQ Spanish adap-tation under standard instructions (10.51:Aguilar et al., 1990;10.30: Ferrando et al. 1997; 9.93: Seisdedos, 1984). The differencesfound in group 2 are also consistent with previous evidence. Thereis a large mean difference of more than 2.5 standard deviations(see the effect size) in the expected direction.

The standard deviation in group 1 (3.66) is also similar to theestimates obtained in adult Spanish samples under standardinstructions (which range from 3.32 to 4.32). However, the devia-tion in group 2 is somewhat smaller. No attempt is made to inter-pret this result, because we believe it mainly reflects a restriction-of-range (ceiling) effect. In effect, note that the mean score ingroup 2 is quite close to the top end of the scale (i.e. 21). Finally,the reliability estimate in group 1 (0.74) is similar to that obtainedin previous standard analyses in Spanish samples (which rangefrom 0.71 to 0.80). It is clearly larger in group 2, and more so ifwe take into account that the score variance is somewhat reducedin this group. If this factor is corrected, then the expected reliabilityin group 2 provided that scores in this group had the same varianceas those in group 1 would be 0.90. This result suggests that thescale behaves as a more precise measure under faking-goodinstructions.

Next, the FA model was fitted separately in both groups. Com-plete results are available from the authors, and are only summa-rized here. In both cases the unidimensional model had areasonably good fit, and, the factor solutions showed a similar po-sitive-manifold structure with all the loadings being positive andgenerally substantial. These results suggest that the theta-shiftmodel with at least strong measurement invariance is a good base-line model for starting the main multiple-group analyses.

3.2. Multiple-group analyses

The multiple-group models were fitted using mean-correctedweighted least squares estimation (WLSM) as implemented in

Page 4: The interpretation of the EPQ Lie scale scores under honest and faking instructions: A multiple-group IRT-based analysis

P.J. Ferrando, C. Anguiano-Carrasco / Personality and Individual Differences 46 (2009) 552–556 555

the Mplus program, version 1.04 (Muthén & Muthén, 1999). WLSMis a simplified alternative to the standard WLS estimation that gen-erally works well even when applied to scales of realistic lengthand not very large sample sizes (Muthén, 1993), which is the pres-ent case. Table 2 shows the model-fit results corresponding to thestrong-invariance initial model and the nested strict-invariancemodel.

Results in Table 2 are fairly clear, as there is agreement amongthe different measures of fit. Both the point and interval estimatesof the RMSEA, and the goodness-of-fit measures (NNFI and CFI)suggest that the fit of the strong-invariance model is reasonablygood, above all taking into account the size of the model: analysesof this type are usually applied to small sets of 5–10 items. How-ever, here we are dealing with a set of 21-items. The strict-invari-ance model no longer fits as well as it did the strong model.Although both models are nested, the scaled chi-square statisticsobtained under WLSM estimation are not suitable for conductinga chi-square difference test. Even so, the remaining measures sug-gest a small but noticeable deterioration in the goodness-of-fit. So,it seems safer to assume that the data only support the hypothesisof strong measurement invariance of the factorial solution. If thestrong-invariance model is accepted, inspection of the factor anditem-error variance estimates in both groups suggest that theitems tend to behave as more precise measures in group 2, a resultwhich agrees with the reliability estimates on Table 1.

Table 3 shows the threshold and loading measurement esti-mates obtained under the strong-invariance solution. Accordingto the threshold values, there are no extreme items in this data.The pattern of loadings is quite similar to that obtained by Ferran-do et al. (1997) in a Spanish study of the Lie scale based on a largesample. In particular, the only item with a clearly weak loading is

Table 2Assessment of fit of the multiple-group models.

Model v2 d.f. RMSEA 90% C.I. NNFI CFI

Strong 948.33 397 0.06 (0.05;0.06) 0.92 0.93Strict 1134.30 417 0.07 (0.06;0.07) 0.90 0.90

Note: v2: WLSM chi-square goodness-of-fit statistic; d.f.: degrees of freedom;RMSEA and 90% C.I.: point and interval estimate of the root mean squared error ofapproximation; NNFI: non-normed fit index; CFI: comparative fit index.

Table 3Strong-invariance factorial solution for the 21 Lie items.

Item Abbreviated item content s k

1 (15) Keep your promise �0.97 0.612 (23) Habits good and desirable 0.52 0.383 (39) As a child, do without grumbling 0.45 0.244 (62) Wash before a meal �0.34 0.275 (86) Practice what you preach 0.42 0.496 (98) Always admit a mistake �0.77 0.477 (4) Taken somebody’s prize �1.02 0.538 (10) Helping yourself more than share of anything �0.92 0.259 (19) Blamed someone for your fault �0.57 0.6210 (27) Taken someone else’s thing 0.27 0.6811 (32) Talk about things you don’t know �0.73 0.5812 (44) Broken or lost someone else’s thing 0.50 0.6213 (49) Boast a little �0.29 0.4914 (53) Said bad or nasty things about anyone 0.78 0.5215 (57) As a child, cheeky to your parents �0.72 0.2216 (66) Cheated at a game 0.16 0.6217 (71) Taken advantage of somebody �0.42 0.7718 (77) Dodge paying taxes 0.52 0.3619 (82) Insisted on your own way 1.39 0.0820 (89) Late for an appointment or work 0.94 0.4721 (98) Put off until tomorrow 1.04 0.57

Note: Within brackets is the position of the item in the EPQ-R.

the same in both studies: ‘‘Have you ever insisted on having yourown way?”

The assessment of the structural hypotheses is now summa-rized. The estimated factor mean and standard deviation in group2 were 5.37 and 2.61, and the corresponding 90% confidence inter-vals were (4.10; 6.64), and (1.79; 3.24). As discussed above, themean and standard deviation were fixed at 0 and 1 in group 1.Assessment of whether or not these fixed values lie within the cor-responding confidence interval is equivalent to a formal test ofequality in both groups. Clearly, there are substantial differencesbetween the factor means and standard deviations in both groups.As for the means, the difference goes in the expected direction, andthe effect size based on the factor estimates is considerably largerthan that in Table 1 based on the scale scores. This result agreeswith the explanation based on ceiling effects which was given forthe total-score results. As for the standard deviations, the disper-sion of the factor scores is estimated to be larger in group 2. Thisresult goes in the opposite direction to that obtained in Table 1,but it would make sense if we accept the interpretation that thevariance of the scale scores in group 2 mainly reflects a restric-tion-of-range ceiling artefact. The increased estimated dispersionin group 2 would support the hypothesis that the faking instruc-tions affect individuals differentially.

4. Discussion

This article attempted to assess rigorously some measurementand structural invariance hypotheses concerning the interpretationof the EPQ Lie scores under standard and faking-good instructions.We assumed that the Lie scores measure a consistent trait and thatthe measurement characteristics of the items remain essentiallyinvariant across the different faking-motivating conditions. Thesemeasurement invariance assumptions agree with the ‘‘traited” po-sition mentioned at the beginning of the paper. However, we alsoassumed that the levels of this trait can temporarily change underfaking-motivating conditions (the theta-shift hypothesis), and thisstructural non-invariance assumption agrees with the ‘‘situational”position. Unlike previous studies, the present analyses were mod-el-based, and the appropriateness of the hypotheses consideredcould be rigorously assessed.

The results obtained are compatible with the assumptionsabove. As a summary they suggest that: (a) the Lie scores measurea common trait both under standard and faking-instructed condi-tions, (b) the measurement characteristics of all of the Lie items re-main essentially invariant under both conditions, but the itemstend to function more precisely as measures under faking-goodconditions, (c) faking instruction produces a temporary change ofthe factor levels of the respondents towards a more conformist, so-cially desirable responding, and this change is reflected in a sub-stantial mean difference in the factor scores, and (d) This theta-shift change is not constant, and the instructions affect differentrespondents differently. Results (a) and (b) are perhaps of moretheoretical interest, and contribute to an understanding of howthe Lie scores must be interpreted. The reasons why the items ap-pear to function more accurately under faking-good conditions area topic of interest for future research.

Results (c) and (d) are of more practical interest and agree withthe purposes for which the Lie scale is used in applications. Indeed,one of the main purposes is to detect which individuals are mostaffected by the faking-motivation conditions to the extent thattheir scores on the content scales are invalidated. However, the re-sult that faking instructions (and perhaps other faking-motivatingconditions) affect individuals differentially is not sufficient on itsown to allow us to detect which individuals are most affected. Awithin-subject design for measuring change would be appropriate

Page 5: The interpretation of the EPQ Lie scale scores under honest and faking instructions: A multiple-group IRT-based analysis

556 P.J. Ferrando, C. Anguiano-Carrasco / Personality and Individual Differences 46 (2009) 552–556

when studying this issue in the future. Validity studies relating theLie scores with external variables would be also useful.

The limitations of the present study should be acknowledged.First, models are, at best, only simplified representations of reality,and the result that a model has an acceptable fit can not be inter-preted as a confirmation of the hypothesis it assesses. Rather, thisresult only means that it provides a plausible explanation that iscompatible with the data. Second, both the level of measurementinvariance and the structural between-group changes may dependon the conditions that define the groups and the type of respon-dents. Therefore, generalizations from the present study are neces-sarily limited. Further cross-cultural studies based on differentsamples and types of motivating conditions for faking are needed.In particular, it would be of clear interest to assess whether thepresent results would hold not only under instructed faking condi-tions, but also in real applicant samples.

The ‘‘internal” analysis used in this study suggests that the Liescores consistently measure a unitary trait. However, it does notprovide further information on how this trait can be conceptual-ized or labelled. To answer this question, it seems that validitystudies are needed to assess which personality dimensions theLie scores are related to (McFarland and Ryan, 2000; Paulhus,1991). Paulhus (1991), for example, summarised a series of factoranalyses in which the Lie scores were related to scores on othermeasures of social desirability. Together with Wiggins Sd and Paul-hus impression management scale scores, the Lie scores appearedto define a common factor which was labelled as ‘‘Impression Man-agement”. These results can be used in further ‘‘external” exten-sions of the present analyses. In this way, the model used herecan be extended to incorporate external variables (for examplescores on the Wiggins and Paulhus scales) and used to assesswhether they measure essentially the same construct as the Liescores.

Acknowledgments

This research was partially supported by a grant from the Span-ish Ministry of Science and Technology (No. SEJ2005-09170-C04-04/PSIC) with the collaboration of the European Fund for theDevelopment of Regions.

References

Aguilar, A., Tous, J. M., & Andrés, A. (1990). Adaptación y estudio psicométrico delEPQ-R. Anuario de Psicología, 46, 101–118.

Cowles, M., Darling, M., & Skanes, A. (1992). Some characteristics of the simulatedself. Personality and Individual Differences, 13, 501–510.

Dicken, C. (1963). Good impression, social desirability and acquiescence assuppressor variables. Educational and Psychological Measurement, 23,699–720.

Ellingson, J. E., Sackett, P. R., & Hough, L. M. (1999). Social desirability corrections inpersonality measurement: Issues of applicant comparisons and constructvalidity. Journal of Applied Psychology, 84, 155–166.

Eysenck, H. J., & Eysenck, S. B. G. (1975). Manual of the Eysenck PersonalityQuestionnaire. London: Hodder & Stoughton.

Eysenck, H. J., & Eysenck, S. B. G. (1976). Psychoticism as a dimension of personality.New York: Crane-Russak.

Eysenck, S. B. G., Eysenck, H. J., & Barrett, P. T. (1985). A revised version of thepsychoticism scale. Personality and Individual Differences, 6, 21–29.

Eysenck, S. B. G., Eysenck, H. J., & Shaw, L. (1974). The modification of personalityand Lie scores by special ‘honestly’ instructions. British Journal of Social andClinical Psychology, 13, 41–50.

Ferrando, P. J., Chico, E., & Lorenzo, U. (1997). Dimensional analysis of the EPQ-R Liescale with a Spanish sample: Gender differences and relations to N, E, and P.Personality and Individual Differences, 23, 631–637.

Furnham, A. (1986). Response bias, social desirability and dissimulation. Personalityand Individual Differences, 7, 385–400.

Katz, Y. J., & Francis, L. J. (1991). The dual nature of the EPQ Lie scale? A studyamong university students in Israel. Social Behavior and Personality, 9,217–222.

Lajunen, T., & Scherler, H. R. (1999). Is the EPQ Lie scale bidimensional? Validationstudy of the structure of the EPQ Lie scale among Finnish and Turkish universitystudents. Personality and Individual Differences, 26, 657–664.

Loo, R. (1995). Cross-cultural validation of the dual nature of the EPQ Lie scale witha Japanese sample. Personality and Individual Differences, 18, 297–299.

McCrae, R. R., & Costa, P. T. (1983). Social desirability scales: More substance thanstyle. Journal of Consulting and Clinical Psychology, 51, 882–888.

McFarland, L. A., & Ryan, A. M. (2000). Variance in faking across noncognitivemeasures. Journal of Applied Psychology, 85, 812–821.

Michaelis, W., & Eysenck, H. J. (1971). The determination of personality inventoryfactor patterns and intercorrelations by changes in real-life motivation. Journalof Genetic Psychology, 118, 223–234.

Millsap, R. E., & Meredith, W. (2007). Factorial invariance. Historical perspectivesand new problems. In R. Cudeck & R. C. MacCallum (Eds.), Factor analysis at 100(pp. 131–152). Mahwah: LEA.

Millsap, R. E., & Tein, J. Y. (2004). Assessing factorial invariance in ordered-categorical measures. Multivariate Behavioral Research, 39, 479–516.

Muthén, B. (1993). Goodness of fit with categorical and other nonnormal variables.In K. A. Bollen & J. S. Long (Eds.), Testing structural equation models(pp. 205–234). Newbury Park: Sage.

Muthén, B., & Christoffersson, A. (1981). Simultaneous factor analysis ofdichotomous variables in several groups. Psychometrika, 46, 407–419.

Muthén, B., & Lehman, J. (1985). Multiple group IRT modeling: Applications to itembias analysis. Journal of Educational Statistics, 10, 133–142.

Muthén, L. K., & Muthén, B. (1999). Mplus user’s guide. Los Angeles: Muthén andMuthén.

Paulhus, D. L. (1991). Measurement and control of response bias. In J. P. Robinson, P.R. Shaver, & L. S. Wrightsman (Eds.), Measures of personality and socialpsychological attitudes (pp. 17–59). San Diego: Academic Press.

Seisdedos, N. (1984). EPQ: Cuestionario de personalidad para niños y adultos. Madrid:TEA Ediciones.

Zickar, M. J., & Drasgow, F. (1996). Detecting faking on a personality instrumentusing appropriateness measurement. Applied Psychological Measurement, 20,71–87.