5
Acquiescence and social desirability as item response determinants: An IRT-based study with the Marlowe–Crowne and the EPQ Lie scales Pere J. Ferrando * , Cristina Anguiano-Carrasco Research Center for Behavioral Assessment, Rovira i Virgili University, Facultad de Psicologia, Carretera Valls s/n, 43007 Tarragona, Spain article info Article history: Received 7 July 2009 Received in revised form 9 December 2009 Accepted 14 December 2009 Available online 12 January 2010 Keywords: Acquiescence Social desirability Marlow–Crowne scale Lie scale Item factor analysis Item response theory abstract The present study aims to shed some light on an old controversy about the joint impact of acquiescence and social desirability on item responses. There are two main hypotheses: (a) of the two biases, social desirability is by far the prime determinant, and (b) acquiescence operates in all sorts of items, including those impacted by social desirability. A new methodology is harnessed to assess these hypotheses in an empirical study based on two well known social desirability scales: the Marlowe–Crowne social desir- ability scale and the Lie scale of Eysenck’s questionnaires. In both scales, the results suggest that even items which primarily measure social desirability can also be impacted to some extent by acquiescence. Ó 2009 Elsevier Ltd. All rights reserved. 1. Introduction Acquiescent responding (AR) and socially desirable responding (SDR) are considered to be the most important response biases in personality measurement (e.g. Paulhus, 1991). Furthermore, in real test taking situations, both types of bias are expected to operate simultaneously to a greater or lesser extent (e.g. Hofstee, Ten Berge, & Hendricks, 1998). However, the question of how AR and SDR jointly impact item responses has received surprisingly little attention in the vast literature on response biases. A review shows that two general positions have been considered. The first position assumes that, of both biases, SDR is by far the prime determinant. So, AR would mainly operate in ‘‘neutral” items, and its impact would be negligible in items that evoke the tendency to give so- cially desirable responses (Edwards, 1961; Jackson & Messick, 1961; Stricker, 1963). The second position assumes that AR oper- ates in all sorts of items. So, items that are impacted by SDR can also be impacted by AR (Couch & Keniston, 1960; Diers, 1964). To assess which of the positions above is the most correct, a rel- atively simple approach would be to study the impact of AR on re- sponses to social desirability (SD) scales (or sets of items intended to measure solely SD). If the first position is correct, the impact of AR on items that are almost pure measures of SD should be negligi- ble. However, if the second position is correct, some impact should be detected. And according to the most extreme formulation (Couch & Keniston, 1960; Diers, 1964), this impact should be the same as is generally found in scales that measure substantive dimensions. This approach has been considered in a few studies (Bernhardson, 1970; Diers, 1964; Edwards, 1961; Jackson & Messick, 1961). How- ever, these studies were mostly at the total scale level, and their re- sults on the specific issue considered here were far from conclusive. Methodologically, the main problem encountered was how to clearly separate the impact of SDR from the additional impact of AR. At present, probably the best procedure for controlling AR is to use a balanced set of items (Ray, 1983). In a well-designed bal- anced set, all the items are positively worded. However, half of the items measure in one direction of the dimension, whereas the other half measure in the opposite direction. In recent decades, factor analytic procedures that are based on balanced sets and which allow the impact of AR to be assessed in detail have been proposed (Billiet & McClendon, 2000; Ferrando, Lorenzo-Seva, & Chico, 2003). These procedures provide new tools for addressing this issue at the item level. However, they treat the item responses as continuous variables and require fully balanced scales, while most well-known SD measures use binary items and are only par- tially balanced (i.e. the number of positively keyed and negatively keyed items is different). Recently, Ferrando, Lorenzo-Seva, and Chico (2009) proposed a factor analysis (FA) based procedure that can be used with binary items and partially balanced scales and which is the one we shall use here. The main purpose of the present study is to assess which of the two positions above is the most appropriate, and uses the direct approach based on analyzing well-known SD measures. The study 0191-8869/$ - see front matter Ó 2009 Elsevier Ltd. All rights reserved. doi:10.1016/j.paid.2009.12.013 * Corresponding author. Tel.: +34 977558079; fax: +34 977558088. E-mail address: [email protected] (P.J. Ferrando). Personality and Individual Differences 48 (2010) 596–600 Contents lists available at ScienceDirect Personality and Individual Differences journal homepage: www.elsevier.com/locate/paid

Acquiescence and social desirability as item response determinants: An IRT-based study with the Marlowe–Crowne and the EPQ Lie scales

Embed Size (px)

Citation preview

Page 1: Acquiescence and social desirability as item response determinants: An IRT-based study with the Marlowe–Crowne and the EPQ Lie scales

Personality and Individual Differences 48 (2010) 596–600

Contents lists available at ScienceDirect

Personality and Individual Differences

journal homepage: www.elsevier .com/locate /paid

Acquiescence and social desirability as item response determinants:An IRT-based study with the Marlowe–Crowne and the EPQ Lie scales

Pere J. Ferrando *, Cristina Anguiano-CarrascoResearch Center for Behavioral Assessment, Rovira i Virgili University, Facultad de Psicologia, Carretera Valls s/n, 43007 Tarragona, Spain

a r t i c l e i n f o a b s t r a c t

Article history:Received 7 July 2009Received in revised form 9 December 2009Accepted 14 December 2009Available online 12 January 2010

Keywords:AcquiescenceSocial desirabilityMarlow–Crowne scaleLie scaleItem factor analysisItem response theory

0191-8869/$ - see front matter � 2009 Elsevier Ltd. Adoi:10.1016/j.paid.2009.12.013

* Corresponding author. Tel.: +34 977558079; fax:E-mail address: [email protected] (P.J. Fer

The present study aims to shed some light on an old controversy about the joint impact of acquiescenceand social desirability on item responses. There are two main hypotheses: (a) of the two biases, socialdesirability is by far the prime determinant, and (b) acquiescence operates in all sorts of items, includingthose impacted by social desirability. A new methodology is harnessed to assess these hypotheses in anempirical study based on two well known social desirability scales: the Marlowe–Crowne social desir-ability scale and the Lie scale of Eysenck’s questionnaires. In both scales, the results suggest that evenitems which primarily measure social desirability can also be impacted to some extent by acquiescence.

� 2009 Elsevier Ltd. All rights reserved.

1. Introduction

Acquiescent responding (AR) and socially desirable responding(SDR) are considered to be the most important response biases inpersonality measurement (e.g. Paulhus, 1991). Furthermore, in realtest taking situations, both types of bias are expected to operatesimultaneously to a greater or lesser extent (e.g. Hofstee, TenBerge, & Hendricks, 1998). However, the question of how AR andSDR jointly impact item responses has received surprisingly littleattention in the vast literature on response biases. A review showsthat two general positions have been considered. The first positionassumes that, of both biases, SDR is by far the prime determinant.So, AR would mainly operate in ‘‘neutral” items, and its impactwould be negligible in items that evoke the tendency to give so-cially desirable responses (Edwards, 1961; Jackson & Messick,1961; Stricker, 1963). The second position assumes that AR oper-ates in all sorts of items. So, items that are impacted by SDR canalso be impacted by AR (Couch & Keniston, 1960; Diers, 1964).

To assess which of the positions above is the most correct, a rel-atively simple approach would be to study the impact of AR on re-sponses to social desirability (SD) scales (or sets of items intendedto measure solely SD). If the first position is correct, the impact ofAR on items that are almost pure measures of SD should be negligi-ble. However, if the second position is correct, some impact shouldbe detected. And according to the most extreme formulation (Couch

ll rights reserved.

+34 977558088.rando).

& Keniston, 1960; Diers, 1964), this impact should be the same as isgenerally found in scales that measure substantive dimensions.This approach has been considered in a few studies (Bernhardson,1970; Diers, 1964; Edwards, 1961; Jackson & Messick, 1961). How-ever, these studies were mostly at the total scale level, and their re-sults on the specific issue considered here were far from conclusive.Methodologically, the main problem encountered was how toclearly separate the impact of SDR from the additional impact of AR.

At present, probably the best procedure for controlling AR is touse a balanced set of items (Ray, 1983). In a well-designed bal-anced set, all the items are positively worded. However, half ofthe items measure in one direction of the dimension, whereasthe other half measure in the opposite direction. In recent decades,factor analytic procedures that are based on balanced sets andwhich allow the impact of AR to be assessed in detail have beenproposed (Billiet & McClendon, 2000; Ferrando, Lorenzo-Seva, &Chico, 2003). These procedures provide new tools for addressingthis issue at the item level. However, they treat the item responsesas continuous variables and require fully balanced scales, whilemost well-known SD measures use binary items and are only par-tially balanced (i.e. the number of positively keyed and negativelykeyed items is different). Recently, Ferrando, Lorenzo-Seva, andChico (2009) proposed a factor analysis (FA) based procedure thatcan be used with binary items and partially balanced scales andwhich is the one we shall use here.

The main purpose of the present study is to assess which of thetwo positions above is the most appropriate, and uses the directapproach based on analyzing well-known SD measures. The study

Page 2: Acquiescence and social desirability as item response determinants: An IRT-based study with the Marlowe–Crowne and the EPQ Lie scales

P.J. Ferrando, C. Anguiano-Carrasco / Personality and Individual Differences 48 (2010) 596–600 597

is solely concerned with the results obtained when these measuresare administered in neutral conditions (i.e. voluntarily and byusing standard instructions). It focuses on the item level (unlikeprevious studies), and uses the FA procedure proposed by Ferrandoet al. (2009). It is hoped that this new analytical approach can pro-vide clearer results on this issue.

The theoretical approach adopted here considers AR and SD astruly individual-difference variables that can be modelled as com-mon factors (Eysenck & Eysenck, 1976; Ferrando et al., 2009)essentially independent from each other. Some authors (Hofsteeet al., 1998; Stricker, 1963) have suggested that AR and SD are ex-pected to be positively related, because they share a tendency toconformity or compliance. Empirical evidence, however, suggeststhey are essentially uncorrelated (Greenwald & Clausen, 1970;Stricker, 1963).

The two SD measures considered here are: the 21-item Lie scalewhich is included in the Eysenck Personality Questionnaires foradults (EPQ-A, Eysenck & Eysenck, 1975, and EPQ-R; Eysenck, Ey-senck, & Barrett, 1985), and the 33-item Marlowe–Crowne socialdesirability (MCSD) scale (Crowne & Marlowe, 1960). Both use bin-ary items and are partially balanced. Of the 21 Lie items, 15 de-scribe undesirable behaviours and the remaining 6 describedesirable behaviours. Of the 33 MCSD items, 18 are socially desir-able and 15 undesirable. Both scales are very popular, have re-ceived a great deal of attention, and have been intensively usedin research. At the theoretical level, when administered under neu-tral conditions, both were intended to measure a general, unitarytrait of SD mainly conceptualized as need for social approval(Crowne & Marlowe, 1960; Eysenck & Eysenck, 1975; Eysenck &Eysenck, 1976). This intention can be seen when the item contentsare inspected. They mainly reflect approved behaviours that arebelieved to be unlikely to occur (desirably keyed items) or atti-tudes and practices that are socially undesirable but common, suchas minor dishonesties, bad thoughts, weaknesses of character, etc.(undesirably keyed items).

Paulhus (1991) distinguished two components of SD within thegeneral dimension. The first component, self-deceptive positivity,is conceptualized as an honest but overly positive view of oneself.The second component, Impression management, is more relatedto the traditional view of SD, and means that the respondent delib-erately tailors his/her answers to create a more positive social im-age. According to Paulhus (1991), the Lie scales in Eysenck’squestionnaires are almost pure measures of Impression manage-ment, whereas the MCSD scale tends to load more on the Impres-sion management component, but is a less pure measure that alsomeasures the first component to some extent. The procedures usedhere can also provide information on this issue.

Indeed, both scales have been factor analyzed in a large numberof studies that aimed to assess their appropriate dimensionality.Reviews are provided in Ferrando, Chico, and Lorenzo (1997) andLajunen and Scherler (1999) (Lie) and in Barger (2002) and Ferran-do and Chico (2000) (MCSD). In this paper we shall not discussthese studies. Rather, we shall discuss various key points of howthe scales were developed and analyze the characteristics of theiritems. This type of analysis has been considerably neglected inboth scales.

The MCSD scale was developed by using judge agreement anditem-total discriminatory power criteria (Crowne & Marlowe,1960). Its item stem analysis suggests ‘a priori’ three types of prob-lems. First, most item stems are long and complex. Second, 14-itemstems are negatively worded, and the negative wording containsthe modifiers ‘‘not” or ‘‘never”. Furthermore, some of these itemsinclude double negatives (3 in the Spanish adaptation). Evidenceshows that negatively worded items of this type tend to have poormeasuring properties and might give rise to problems whenassessing dimensionality and structure. Third, the content of three

pairs of items is highly redundant. Items 5 and 10 refer to lack ofconfidence in one’s own abilities. Items 6 and 22 refer to doingthings in one’s own way. Finally, items 16 and 20 refer to admittingone’s own errors. Pairs of redundant items such as these are likelyto give rise to artifactual or semantic doublet factors (e.g. Kline,1998).

The EPQ Lie scale was developed using FA procedures with theexplicit aim of measuring a unitary factor (Eysenck & Eysenck,1976, chap. 11). So, it is expected to have a clearer and strongerFA structure than the MCSD scale. Analysis of the item stemsshows that they are all positively worded. They are also generallyshorter and simpler, and refer to more specific behaviours thanthose of the MCSD scale.

1.1. The model and procedure

This section aims solely to provide a conceptual non-technicaldiscussion that helps to interpret the present results. A more de-tailed methodological description can be found in Ferrando et al.(2009).

The general model used here is a bidimensional (content–acqui-escence) item FA model intended for binary items, in which theitem factor regressions are modelled as ogives. This model is aFA formulation of the bidimensional normal-ogive item responsetheory (IRT) model. For binary items, it is a more plausible modelthan the conventional FA model, which treats the direct binaryscores as if they were continuous and unlimited variables. In prac-tice, the main advantage of fitting this model instead of using con-ventional FA is that it partly avoids the attenuation problemscreated by the differential difficulties (i.e. proportions of endorse-ment) of the items. It (a) gives more correct estimates of the factorloadings, and (b) provides a more accurate assessment of thedimensionality of the data and of the appropriateness of the model.

The model has two stages: calibration and scoring. The calibra-tion stage is a sequential FA in which (a) the acquiescence (ACQ)factor is first estimated, (b) its effect is partialled-out from the cor-relation matrix, and (c) the content factor is estimated from thecorrected correlation matrix free from acquiescence. The proce-dure for fitting the model is minimum rank factor analysis (MRFA;Ten Berge & Kiers, 1991). MRFA is a least-squares FA procedurewhich has the additional advantage that it estimates the propor-tion of common variance accounted for by each of the two factors.These estimates are particularly relevant here for assessing the rel-ative impact of the ACQ and SD factors.

The loadings on the first estimated ACQ factor are obtained byusing a balanced subset of items, and they are all expected to bepositive (the higher the acquiescence level, the greater the ten-dency to agree with all of the items). Next, the content factor isestimated from the ‘‘clean” correlation matrix, and it is expectedto be bipolar: the positively keyed items are expected to have po-sitive loadings, and the negatively keyed items, negative loadings.

We turn now to the scoring stage. In the model used here thefactor scores are nonlinear transformations of the item scores,and their precision generally varies at different trait levels. How-ever, as in standard FA, a single, marginal measure of reliabilitycan be obtained for each factor. This measure can be used as anauxiliary index for assessing the relative importance of SDR andAR. Finally, the factor scores can be used for further validity assess-ment. Previous studies on the relations between the Lie and theSDS scales were based on correlations between the raw scores.Now, if AR also impacts the responses, these raw scores in fact re-flect the mixed influence of both: SDR and AR. If this is so, assess-ment would be clearer if the relations between the SD factor scoreson the one hand and the ACQ factor scores on the other were as-sessed separately.

Page 3: Acquiescence and social desirability as item response determinants: An IRT-based study with the Marlowe–Crowne and the EPQ Lie scales

Table 2Two-factor solution: L scale.

Item Abbreviated item content Dir. F1 F2

4 Taken somebody’s prize u �0.532 0.30210 Helping yourself more than share of anything u �0.277 0.05015 Keep your promise d 0.560 0.34919 Blamed someone for your fault u �0.597 0.19723 Habits good and desirable d 0.464 0.08127 Taken someone else’s thing u �0.636 0.26732 Talk about things you don’t know u �0.532 0.23639 As a child, do without grumbling d 0.296 0.25244 Broken or lost someone else’s thing u �0.528 0.40049 Boast a little u �0.402 0.23053 Said bad or nasty things about anyone u �0.421 0.21257 As a child, cheeky to your parents u �0.220 0.04662 Wash before a meal d 0.367 0.15266 Cheated at a game u �0.593 0.17971 Taken advantage of somebody u �0.701 0.21377 Dodge paying taxes u �0.451 0.02282 Insisted on your own way u �0.271 0.13186 Practice what you preach d 0.622 0.41289 Late for an appointment or work u �0.438 0.31593 Put off until tomorrow u �0.520 0.03898 Always admit a mistake d 0.524 0.111

Note: u means undesirable items; d means desirable items.

598 P.J. Ferrando, C. Anguiano-Carrasco / Personality and Individual Differences 48 (2010) 596–600

2. Method

2.1. Participants and procedure

The total group of participants consisted of 2322 undergraduatestudents from the Psychology and Social Sciences faculties of aSpanish university (mean age 22; 75% female). Of these, 1981 wereadministered the EPQ-R Lie scale, 847 the MCSD scale, and 506both measures. The questionnaires were administered in paperand pencil version, and were completed voluntarily in classroomgroups of 25–60 students. For both measures, and in all cases,the instructions given before administration were the standardinstructions provided in the Spanish EPQ manual.

2.2. Measures

The study used the Spanish adaptation of the Lie scale by Agu-ilar, Tous, and Andrés (1990) and the Spanish adaptation of theMCSD scale by Ferrando and Chico (2000). In both cases the num-ber and order of the items correspond to those in the original ver-sion. However, in the case of MCSD scale items 5, 6 and 20 wereomitted from the analyses for the reasons discussed above.

3. Results

3.1. Calibration stage

The FA model was fitted separately to the N = 1981 (Lie) andN = 847 (MCSD) subgroups. In the first step, we fitted the one-fac-tor model (i.e. content only) and the two-factor model (content andAR) to the data, and assessed the improvement of fit when two fac-tors were used instead of one. We used two goodness-of-fit mea-sures which are appropriate for the least-squares procedure: thegamma goodness-of-fit index and the root mean square of thestandardised residuals (McDonald & Mock, 1995). The results arein the upper panel of Table 1.

The results in Table 1(a) can be summarized as follows. In bothscales, (a) the one-factor model fits reasonably well, and (b), thereis a slight but noticeable improvement in fit when two factors areused instead of one. These results suggest that both scales essen-tially measure a single content factor, which agrees with theexpectations. However, the results support the hypothesis thatthey also measure a secondary, residual factor. The main factorwas hypothesized to be the content factor (SD) that the scales in-tend to measure, and the second factor was hypothesized to bethe ACQ factor. The plausibility of these hypotheses was then as-sessed by examining the two-factor solutions. They are shown inTables 2 and 3.

In both solutions, the main factor has the perfect bipolar struc-ture that would be expected of the content factor. The items that

Table 1Goodness-of-fit results and relative importance of the two factors.

MCSDS Lie scale

c-GFI RMSR-S c-GFI RMSR-S

(a) Goodness-of-fit results1 Factor solution 0.94 0.07 0.94 0.082 Factor solution 0.95 0.06 0.96 0.06

F1 (Content) F2 (Acq) F1 (Content) F2 (Acq)

(b) Measures of relative importancePercentage of C.V. 28 9 38 8Marginal rxx 0.87 (0.18) 0.67 (0.06) 0.83 (0.19) 0.5 (0.05)

Note: c-GFI = gamma goodness-of-fit index, RMSR-S = root mean square of thestandardised residuals. C.V. = Common variance. Within brackets, marginal rxx

based on a single item.

describe undesirable behaviours have negative loadings, whilethe loadings of those which describe socially desirable behavioursare positive. Furthermore, in both solutions the loadings on thisfactor are, in general, acceptably high for personality items. Theaverage of the loadings (in absolute value) on this factor is 0.47(Lie) and 0.41 (MCSD). As expected, the structure of the contentfactor of the Lie scale is somewhat stronger than that of the MCSDscale.

In both scales, the second factor is far weaker. Experience inFA suggests that a minimum of 3–4 variables with loadings largerthan 0.30 are needed if a factor is to be adequately defined(McDonald, 1985). Both scales comply with this minimumrequirement but little more. The absolute values of the averageloadings are virtually the same in both scales: 0.20 (Lie) and0.19 (MCSD). As for the interpretation, in the Lie scale all theloadings on this factor are positive, which is only to be expectedif this factor is the ACQ factor. In the MCSD scale, however, threeitems have negative loadings (all three below 0.20 in absolute va-lue). Inspection of the stems shows that these items are preciselythe three that use double negative modifiers (in the Spanish ver-sion): ‘‘never, and nobody” (item 4), ‘‘no and never” (item 29) and‘‘never and nothing” (item 33). A plausible explanation would bethat respondents considered a negative response to these items asagreement with the stems. This is only a conjecture, of course. Be-cause none of the 3 items was part of the balanced subset, theanalysis was performed again without them. As in the full analy-sis, the ‘content’ factor was perfectly bipolar. However, this timeall the 27 loadings on the second factor were positive. So, overall,it seems reasonable to interpret this second factor as the ACQfactor.

The lower panel in Table 1 provides further evidence on the rel-ative importance of both factors. The first row shows the percent-ages of common variance accounted for by each factor as estimatedby MRFA. The percentage contributed by the content factor israther high in both scales. In fact, it is higher than the percentagesobtained in previous studies with measures of substantive traits(see Ferrando et al., 2009). This result reinforces the hypothesisthat SD is a truly individual-differences variable. The percentageis larger in the Lie scale, which ‘a priori’ was expected to have aclearer and stronger structure. As for the second factor (ACQ), thepercentage is far lower and about the same for both scales.

Page 4: Acquiescence and social desirability as item response determinants: An IRT-based study with the Marlowe–Crowne and the EPQ Lie scales

Table 3Two-factor solution: MCSDS.

Item Abbreviated item content Dir. F1 F2

1 Investigate before voting d 0.245 0.0222 Help someone in trouble d 0.562 0.1133 Hard to work if not encouraged u �0.418 0.4424 Never disliked anyone d 0.412 �0.1617 Careful about manner of dress d 0.203 0.2948 At home, as good manners as in a restaurant d 0.498 0.0029 Get into a movie without paying u �0.381 0.41510 Give up because thought too little of my ability u �0.152 0.26411 Like gossiping u �0.332 0.39212 Rebel against people with authority u �0.512 0.32613 Always a good listener d 0.625 0.01414 Never ‘‘play sick” u �0.250 0.36915 Take advantage of somebody u �0.591 0.38816 Always admit a mistake d 0.661 0.01217 Practice what you preach d 0.399 0.08018 Get along with loud mouthed, obnoxious people d 0.024 0.18619 Get even rather than forgive and forget u �0.496 0.40721 Always courteous d 0.530 0.02922 Insisted on my own way u �0.370 0.12023 Sometimes, feel like smashing things u �0.238 0.29924 Let someone be punished for my wrong-doings d 0.505 0.00625 Never resent being asked to return a favor d 0.405 0.06626 Never irked when others express different ideas d 0.567 0.03427 Check the safety of my car, bike, motorbike, . . . d 0.337 0.02628 Jealous of the other’s good fortune u �0.383 0.30729 Never tell someone off d 0.422 �0.19430 Irritated by people who ask favors of me u �0.498 0.19531 I have never been punished without cause d 0.413 0.01832 When having a misfortune they got what they deserved u �0.212 0.31533 Never hurt someone’s feelings d 0.608 �0.153

Note: u means undesirable items; d means desirable items.

P.J. Ferrando, C. Anguiano-Carrasco / Personality and Individual Differences 48 (2010) 596–600 599

3.2. Scoring stage

Analyses at this stage were based on the group of 506 respon-dents who had been administered both measures. Factor scoreswere first obtained for both factors. Next, the marginal reliabilityestimates were obtained for both sets of factor scores. They areshown in the second row of Table 1(b), and clearly agree withthe above results. The reliability of the content scores is acceptablein both scales whereas the reliability of the ACQ scores is far lower.Previous studies based on the MCSD and Lie raw scores, both withthe original and the Spanish versions, obtained reliability estimatesin the range 0.72–0.82 (MCSD), and 0.71–0.80 (Lie). So, for bothscales, the content reliability estimates based on the (theoretically)more accurate factor scores are larger than the previous estimatesbased on raw scores. The fact that the ACQ scores are much lessreliable agrees with the interpretation that the ACQ is a weak, sec-ondary factor. Finally, we note that the reliability estimates of bothscales are not directly comparable; because the MCSD scale has(here) 30 items and the Lie scale 21. So, the estimates werestepped-down to 1-item by using the Spearman–Brown prophecy.The results suggest that, when equated for test length, the scoresprovided by both scales are equally reliable both for the contentfactor and for the ACQ.

Next, we obtained the correlations between the Lie contentscores and the MCSD content scores on the one hand, and the LieACQ scores and the MCSD ACQ scores on the other. In the first casethe correlation was 0.66, and, when corrected for attenuation usingthe reliability estimates in Table 1(b), it increased to 0.78. This re-sult, which agrees with those obtained in previous studies (Davies,French, & Keogh, 1998), suggests that the content dimensions thatboth scales measure share a considerable degree of overlap but arenot exactly the same. It supports the Paulhus hypothesis that bothscales measure somewhat different aspects of the general SDdimension.

The correlation between the ACQ factors scores derived fromboth scales was 0.24, and increased to 0.41 when corrected forunreliability. This agrees with the results obtained by Ferrando,Condon, and Chico (2004) and reinforces the hypothesis that ARhas some degree of convergent validity although possibly not en-ough to be of clear practical interest.

4. Discussion

The relations between personality variables are generally quitecomplex. Therefore, most studies that aim to assess which of twoalternative extreme explanations is the most appropriate concludethat, with real data, these polarized positions are too simplistic.This also seems to be the case here.

Overall, the present results suggest that both the MCSD and theLie scales essentially measure a single dimension. As discussedabove, this result was expected for the Lie scale. As for the MDSDscale, we note that essential unidimensionality was obtained herewith a trimmed set of 30 items from which 3 redundant items hadbeen omitted. In both scales, the structure of the main factor wasclear and fairly strong. Clearer and stronger, in fact, than moststructures obtained with scales designed to measure substantivetraits. This result supports the hypothesis that both SD scales mea-sure a truly individual-differences variable. The results also suggestthat the main factors that both scales measure are highly related,but are not exactly the same, which agrees with Paulhus’s state-ment (1991) that they measure somewhat different aspects ofSD. Finally, we note that the Lie scale showed a stronger structurethan the MCSD scale, and that the Lie main factor explained a lar-ger percent of common variance. Given that the Lie scale wasdeveloped using an FA approach, this was expected.

As for the main purpose of the study, the contest between thetwo positions, the results obtained with both scales suggest that

Page 5: Acquiescence and social desirability as item response determinants: An IRT-based study with the Marlowe–Crowne and the EPQ Lie scales

600 P.J. Ferrando, C. Anguiano-Carrasco / Personality and Individual Differences 48 (2010) 596–600

AR impacts the item responses to some degree. The factor scores onthe second factor (ACQ) also showed a mild degree of convergentvalidity. So, the results support the second hypothesis. However,as for the first point, the factor identified in both scales as ACQ israther weak. As for the second point, even though the positive cor-relation between the ACQ scores might upwardly bias the correla-tions between raw scale scores of SD measures, the confoundingeffect is expected to be small.

Theoretically, the finding that essentially pure measures of SDcan also be impacted by AR seems to be of interest. However, itis not clear from our results whether developing a balanced SDscale that also controls for AR is worth the effort in practice. Inprinciple we believe that it is. Personality measures in generalare not too strong psychometrically. So, to obtain ‘‘cleaner” SDmeasures is always desirable even if the factor that is partialled-out is residual. So, if taking into account AR reduces measurementerror to some extent, by however little, the use of balanced scales isjustified. However, further evidence is needed to support thisposition.

The limitations of the study should be pointed out. First, most ofthe evidence collected here is internal. It is based on FA solutionsand on the relations between the factor scores. The proposed mod-els fit well, and the results obtained agree quite well with expecta-tions. However, the good fit and the expected results only showthat our model provides a plausible explanation that is compatiblewith the data. Further research including external variables (i.e.external validity studies) would be of interest to strengthen theinterpretation of the present results. Finally, it would be of clearapplied interest to assess whether the present results would holdin other samples, of different ages and socio-cultural levels, andnot limited to university students.

References

Aguilar, A., Tous, J. M., & Andrés, A. (1990). Adaptación y estudio psicométrico delEPQ-R. Anuario de Psicología, 46, 101–118.

Barger, S. D. (2002). The Marlowe–Crowne affair: Short forms, psychometricstructure, and social desirability. Journal of Personality Assessment, 79, 289–305.

Bernhardson, C. S. (1970). Social desirability as a confounding variable in thereversed item approach to studying acquiescence in the MMPI. Canadian Journalof Behavioral Science, 2, 148–156.

Billiet, J. B., & McClendon, M. J. (2000). Modeling acquiescence in measurementmodels for two balanced sets of items. Structural Equation Modeling, 7, 608–628.

Couch, A., & Keniston, K. (1960). Yeasayers and naysayers: Agreeing response set asa personality variable. The Journal of Abnormal and Social Psychology, 60,151–174.

Crowne, D., & Marlowe, D. (1960). A new scale of social desirability independent ofpsychopathology. Journal of Consulting Psychology, 24, 349–354.

Davies, M. F., French, C. C., & Keogh, E. (1998). Self-deceptive enhancement andimpression management correlates of EPQ-R dimensions. The Journal ofPsychology, 132, 401–406.

Diers, C. J. (1964). Social desirability and acquiescence in response to personalityitems. Journal of Consulting Psychology, 28, 71–77.

Edwards, A. (1961). Social desirability or acquiescence in the MMPI? A case studywith the SD Scale. The Journal of Abnormal and Social Psychology, 63, 351–359.

Eysenck, H. J., & Eysenck, S. B. G. (1975). Manual of the Eysenck PersonalityQuestionnaire. London: Hodder and Stoughton.

Eysenck, H. J., & Eysenck, S. B. G. (1976). Psychoticism as a dimension of personality.New York: Crane-Russak.

Eysenck, S. B. G., Eysenck, H. J., & Barrett, P. T. (1985). A revised version of thePsychoticism scale. Personality and Individual Differences, 6, 21–29.

Ferrando, P. J., & Chico, E. (2000). Adaptación y análisis psicométrico de la escala dedeseabilidad social de Marlowe y Crowne. Psicothema, 12, 383–389.

Ferrando, P. J., Chico, E., & Lorenzo, U. (1997). Dimensional analysis of the EPQ-R Liescale with a Spanish sample: Gender differences and relations to N, E, and P.Personality and Individual Differences, 30, 641–656.

Ferrando, P. J., Condon, L., & Chico, E. (2004). The convergent validity ofacquiescence: An empirical study relating balanced scales and separateacquiescence scales. Personality and Individual Differences, 37, 1331–1340.

Ferrando, P. J., Lorenzo-Seva, U., & Chico, E. (2003). Unrestricted factor analyticprocedures for assessing acquiescent responding in balanced, theoreticallyunidimensional personality scales. Multivariate Behavioral Research, 38,353–374.

Ferrando, P. J., Lorenzo-Seva, U., & Chico, E. (2009). A general factor-analyticprocedure for assessing response bias in questionnaire measures. StructuralEquation Modeling, 16, 364–381.

Greenwald, H. J., & Clausen, J. D. (1970). Test of relationship between yeasaying andsocial desirability. Psychological Reports, 27, 139–141.

Hofstee, W. K. B., Ten Berge, J. M. F., & Hendricks, A. A. J. (1998). How to scorequestionnaires. Personality and Individual Differences, 25, 897–910.

Jackson, D. N., & Messick, S. (1961). Acquiescence and desirability as responsedeterminants on the MMPI. Educational and Psychological Measurement, 21,771–790.

Kline, P. (1998). The new psychometrics: Science, psychology, and measurement. NewYork: Routledge.

Lajunen, T., & Scherler, H. R. (1999). Is the EPQ Lie scale bidimensional? Validationstudy of the structure of the EPQ Lie scale among Finnish and Turkish universitystudents. Personality and Individual Differences, 26, 657–664.

McDonald, R. P. (1985). Factor analysis and related methods. Hillsdale: LEA.McDonald, R. P., & Mock, M. M. C. (1995). Goodness of fit in item response models.

Multivariate Behavioral Research, 30, 23–40.Paulhus, D. L. (1991). Measurement and control of response bias. In J. P. Robinson, P.

R. Shaver, & L. S. Wrightsman (Eds.), Measures of personality and socialpsychological attitudes (pp. 17–59). San Diego: Academic Press.

Ray, J. J. (1983). Reviving the problem of acquiescent response bias. Journal of SocialPsychology, 121, 81–96.

Stricker, L. J. (1963). Acquiescence and social desirability response styles, itemcharacteristics, and conformity. Psychological Reports, 12, 319–341.

Ten Berge, J. M. F., & Kiers, H. A. L. (1991). A numerical approach to the approximateand the exact minimum rank of a covariance matrix. Psychometrika, 56,309–315.