2014 It_s Not Me, It is You-Miscomprehention Linguistic in Survey

8/10/2019 2014 It_s Not Me, It is You-Miscomprehention Linguistic in Survey

1/26


2/26

Article

Its Not Me, Its You:

Miscomprehension in Surveys

Ben Hardy1 and Lucy R. Ford2

Abstract

The ubiquity of surveys in organizational research means that their quality is of paramount

importance. Commonly this has been addressed through the use of sophisticated statisticalapproaches with scant attention paid to item comprehension. Linguistic theory suggests that while

everyone may understand an item, they may comprehend it in different ways. We explore this in two

studies in which we administered three published scales and asked respondents to indicate what

they believed the items meant, and a third study that replicated the results with an additional scale.

These demonstrate three forms of miscomprehension: instructional (where instructions are not

followed), sentential (where the syntax of a sentence is enriched or depleted as it is interpreted), and

lexical (where different meanings of words are deployed). These differences in comprehension are

not appreciable using conventional statistical analyses yet can produce significantly different results

and cause respondents to tap into different concepts. These results suggest that item interpretation

is a significant source of error, which has been hitherto neglected in the organizational literature.We suggest remedies and directions for future research.

Keywords

survey research, quantitative research, construct validation procedures

How satisfied are you with the pay you receive for your job? (Tsui, Egan, & OReilly, 1992). This

seems a straightforward question that anyone who is employed should be able to understand and

answer. But what does it actually mean? Is it asking whether you are happy with the pay you receive

for your job, or whether you think the amount you earn is fair for the work you do? Or something

else? No doubt you will understand both the question and its meaning. The crucial issue, however, is

whether others understand it in exactly the same way as you.

Organizational researchers often solicit the opinion of others through surveys. This frequently

involves administering a stimulus, in the form of a question or statement,1 and allowing the partici-

pant to choose from a limited menu of responses. Closed questions of this nature allow a verbal

1The Open University Business School, Milton Keynes, United Kingdom2

Saint Josephs University, Philadelphia, PA, USA

Corresponding Author:

Lucy R. Ford, Saint Josephs University, 5600 City Avenue, Philadelphia, PA 19131, USA.

Email: [email protected]

Organizational Research Methods

1-25

The Author(s) 2014

Reprints and permission:

sagepub.com/journalsPermissions.nav

DOI: 10.1177/1094428113520185orm.sagepub.com

at University of Huddersfield on June 5, 2014orm.sagepub.comDownloaded from
http://www.sagepub.com/journalsPermissions.navhttp://orm.sagepub.com/http://orm.sagepub.com/http://orm.sagepub.com/http://orm.sagepub.com/http://orm.sagepub.com/http://orm.sagepub.com/http://www.sagepub.com/journalsPermissions.nav


3/26

(analog) signal to be converted to a numerical (digital) output, through the allocation of ordinal num-

bers to Likert scale responses, and the consequent output to be subjected to statistical examination.

The advantage of this process of transduction is also its weakness, as the simple numerical output

masks infelicities in comprehension of the instruction, question, or response. Individuals agreeing

with a statement may not necessarily be agreeing with the same thing as other respondents, and

even sophisticated statistics may not detect these differing interpretations. The whole enterprise of

survey research rests on the assumption that there is an unbroken chain of comprehension from the

mind of the researcher, through the survey instrument, to the mind of the respondent, and back again.

Miscomprehension at any stage in this process introduces error.

A survey that is used as the basis for strategy or policy and that is poorly constructed and ignores

different interpretations of questions could have profoundly negative effects. As a consequence, a

great deal of effort has been put into improving standards of measurement. This research has mainly

focused on using statistics to assay scale quality, with little attention paid to the stimulus questions

themselves and the way in which individuals comprehend them.

This article examines sources and types of linguistic miscomprehension in survey research, usingpublished, multi-item scales. We begin with a brief review of scale development and some of the

principles of linguistics. We then present three studies that explore miscomprehension in survey

research. The first study shows that while participants understand survey questions, they understand

them in different ways. Using existing linguistic theory we code the results into three forms of mis-

comprehension. The second study tests this taxonomy by presenting respondents with a stimulus

question and asking them to select, from a list of possible interpretations, the interpretation of the

question that most closely matches their own. We find that participants commonly depart from the

strict syntax of the item in their interpretations. This threatens construct validity and can have impli-

cations for item score on the scale itself and can impact on other scalesin this case turnover inten-

tion. In the third study we replicate the findings of the first two studies using a different measure toestablish that our findings are not particular to our scale selection. These three studies demonstrate

that respondents interpret items differently, that this threatens construct validity, and yet is not

apparent when standard statistical tests to assess factor structure and validity are used. We then

examine the import of these findings for organizational research, suggest remedies, and outline

directions for future research.

A Brief Review of Scale Development

The process of scale development has been discussed in a number of texts (e.g., DeVellis, 2003;

Hinkin, 1998). These generally aim to fulfill the American Psychological Association guidelines,

which center around content validity, criterion-related validity, construct validity, and internal con-

sistency reliability (Hinkin, 1998).

The first step is to define the concept of interest and its domain. Poorly specified concepts and

inadequate domain sampling will guarantee an inadequate scale. The next steps are elegantly sum-

marized by Hinkin (1998). They begin with developing items that either inductively or deductively

sample the conceptual domain (Hinkin, 1995). If the resulting items are poorly developed, then it is

unlikely that the subsequent stages of the developmental processes will remedy this. Unfortunately,

this critical step of item development is seldom accorded appropriate emphasis (Schriesheim,

Powers, Scandura, Gardiner, & Lankau, 1993), with DeVellis (2003) suggesting that researchers

often throw together or dredge up items and assume they constitute a suitable scale (p. 11).

Hinkin (1998) advocates a more rigorous process, where parsimonious, readily comprehensiblequestions are written and construct validity is examined using multiple samples and techniques such

as exploratory and confirmatory factor analysis. Despite the importance of the initial stages of item

development (Hinkin, 1995), greater emphasis is often placed on the statistical assessment of the

2 Organizational Research Methods

http://orm.sagepub.com/http://orm.sagepub.com/http://orm.sagepub.com/http://orm.sagepub.com/


4/26

psychometric properties of the scale (Rossiter, 2002) and its relationship to other variables in the

nomological net (Borsboom, Mellenbergh, & van Heerden, 2004).

Researchers may choose instead to use published measures. Passage through the peer review pro-

cess has typically been perceived as evidence of scale quality. However, Ford and Scandura (2007),

in an examination of a compilation of organizational measures (Fields, 2002), found that the major-

ity of scales contained one or more threats to construct validity, suggesting that published measures

are not without flaws. Using a published measure also involves disembedding it from its original

context, potentially increasing the risk of error. Comprehension (u) is a function of syntax (s) and

context (c);u f(s c), and so the same syntax may be comprehended differently in different con-

texts. For example, saying break a leg means something different in a theater dressing room and

an operating theater. The importance of linguistics in item interpretation has not been widely dis-

cussed although there are some existing sources that do address the issue (Schwarz, 1999), and it

is to this topic we now turn.

A Brief Review of Linguistic Theory Surrounding Item Interpretation

Surveys hinge on comprehension. If the respondent does not understand the survey question in

exactly the same way as the researcher then the instrument is not measuring what the researcher

intended. This interface between researcher and respondent is of critical importance and where lin-

guistic problems of interpretation manifest themselves. Communication depends on one persons

statements being understood by another. This, however, is not enough, as understanding what

another person is saying is one thing, while understanding exactly what they mean is another.

Researchers in the pragmatic tradition of semantics make a clear distinction between what is said

and the context (both social and linguistic) in which it is said, the one influencing the other. The

interplay between the contextual and literal was articulated by Grice (1975) and subsequently mod-ified and extended by other authors (e.g., Jaszczolt, 2005; Sperber & Wilson, 1986). The principle

underpinning this field is that when we interpret a sentence, we flesh out the bare syntax of a sen-

tence, drawing on our experience, context, and environment. This process, however, is neither uni-

form nor predictable, varying across individuals and situations. These variations mean that survey

questions may be fleshed out by individuals to give meanings other than the one intended by the

questions author. Consequently, this article is concerned with cataloging these differences and

examining their impact on survey research.

Types of ErrorThreats to comprehension, and hence validity, fall into two basic categories: instructional and inter-

pretive. Interpretive errors can then be broken down into two further categories commonly used

within the psycholinguistic literature (e.g., Hernandez, 2001). One concerns the comprehension

of the full sentence, or sentential comprehension, and the other the comprehension of individual

words, or lexical comprehension.

Instructional Miscomprehension.This is the most easily understood source of error, where the respon-

dent either does not read/follow the instructions for completing the survey or they misunderstand the

instructions (Tourangeau, Rips, & Rasinski, 2000). This failure to follow or understand instructions

may not be evident in surveys with a numerical output, yet instructions are of pivotal importance.For some surveys they provide direction as to what is actually being measured. In others, the instruc-

tions might contain the experimental manipulation through a change in wording. In either case, the

results of the research are affected if the instructions are ignored. In short, instructions are important.

Hardy and Ford 3



5/26

Interpretive Miscomprehension: Sentential.When hearing a question we attempt to understand what

it means. This involves making decisions, both conscious and unconscious, about what the ques-

tioner is actually trying to ask. For example, the question Have you had lunch? at a 2 pm meeting

is likely to be interpreted as Have you had lunch today? rather than Have you ever had lunch?

Understanding a question, therefore, is not a banausic process of literally answering what has been

asked, but rather one of applying contextual information in order to answer appropriately. Crucially,

we must reach the same interpretation as the author of the question, otherwise we are likely to

answer a different question from the one the author intended. This is sentential miscomprehension.

Respondents might enrich or deplete the meaning of questions, or both. When an item is enriched,

the respondent adds additional information to the stimulus. For example, if the question Consider-

ing everything how satisfied are you with your current job situation? (Tsui et al., 1992) is inter-

preted by the respondent as Would you stay in your job if someone offered you something

else?,2 they have enriched the sentence to include elements of turnover intention that were not

intended. This may mean that the respondent is actually answering a question about turnover inten-

tion as opposed to just job satisfaction.Just as sentences can be enriched they can also be depleted. The question How fair or unfair are

the procedures used to determine pay rates? (Sweeney & McFarlin, 1993) shows depletion if inter-

preted as How fair is your pay? The respondent has clearly understood the fairness element of the

question but not the procedural part and has, effectively, turned a procedural justice item into a

distributive justice one.

Interpretive Miscomprehension: Lexical. This form of miscomprehension concerns the meaning of

the words themselves. One persons definition of a word does not necessarily accurately map onto

that of anothers because they are drawing on a variety of educational, cultural, social, contextual, or

gender-specific definitions.The wordsatisfactionhas two historical meanings (Simpson & Weiner, 1989). One is with refer-

ence to desires or feelings (Simpson & Weiner, 1989, p. 502) and is described as The action of grat-

ifying (an appetite or desire) to the full a sense of pleasurable gratification (p. 502); the other, with

reference to obligations (p. 502), is a more transactional sensation of obligation having been fulfilled.

Depending on exposure and knowledge, individuals interpreting the wordsatisfaction may draw on

one or the other interpretation, or a blend of both. The issue for survey researchers is that it is very

difficult to know which definition the respondent is drawing on. For example, two different respon-

dents may both agree with the statement Are you satisfied with this company but one might

be agreeing that they like the company while the other feels that the company has met its obligations.

The possibility of lexical miscomprehension resulting from this polysemy, or multiple meanings,

is not restricted to the wordsatisfied. The Collins English Dictionary lists 43,636 different nouns and

14,190 different verbs. The average noun has 1.74 meanings and the average verb 2.11 (Fellbaum,

1990), suggesting that there is plenty of opportunity for respondents to draw on more than one

meaning.

Lexical miscomprehension can introduce primary error, where the items miscomprehension

affects its score, and also secondary error, where miscomprehension causes collinearity between

scales. This could occur if, for example, a question about satisfaction were included in a model along

with an instrument for turnover intention. The responses of those using a transactional interpretation

of satisfaction should correlate with turnover intention but the responses of those taking a gratifica-

tional view may not.

These three forms of miscomprehension, instructional, sentential, and lexical, have the potentialto introduce considerable error into the measurement process. We shall now turn to two studies that

provide evidence for the existence of these forms of miscomprehension and a third, which confirms

our preliminary findings using a different scale.




6/26

Study 1: Respondent Interpretation of Survey Question

This study aims to ascertain whether the forms of miscomprehension outlined previously occur with

questions used in organizational research.

Method

The scales used in this study were selected on the basis of three criteria. First, Ford and Scan-

dura (2007) did not identify any threats to construct validity in these scales in their analyses of

all the scales contained in Fieldss (2002) book of organizational measures. Second, they con-

tained a mix of both questions and statements so that comparisons could be made between a

satisfaction scale using questions (Tsui et al., 1992) and one containing statements (Agho,

Price, & Mueller, 1992; drawn from a longer measure in Brayfield & Rothe, 1951). Finally

they were brief. This last point was of particular importance in order to minimize the potential

for survey fatigue. Two multiple item scales for job satisfaction (Agho et al., 1992, 6 items;Tsui et al., 1992, 6 items) and one for procedural justice (McFarlin & Sweeney, 1992, 4 items)

were used. The job satisfaction measures were also different in that one (Tsui et al., 1992) is a

general measure of job satisfaction that measures specific facets such as satisfaction with the

work itself, supervision, and coworkers, while the other (Agho et al., 1992) was intended as an

affective measure of job satisfaction. The papers in which these scales appeared have been

widely cited in organizational research and the scales themselves used frequently in subse-

quent research.

The survey was administered using Qualtrics (2007). The respondents were first asked to explain

what they thought the survey question meant, in a free-text, open-ended response format, imagining

that they were explaining the item to a non-native English speaker and attempting to convey the truemeaning of the item. We then administered the same three scales in their usual format with Likert-

type responses. Finally, we captured standard demographic data such as gender, educational level,

and native language.

Sample.For this initial exploratory study, we used three convenience samples of participants. First,

the authors sent an invitation to participate to their personal contacts, with a request that partici-

pants forward the survey to others. We used this method as variance in pragmatic inference is uni-

versal (Sperber & Wilson, 1986), and so a nonrandom sampling method was appropriate. We also

wanted respondents who had the intellectual capacity to think through the meaning of items care-

fully. Therefore, sampling our own contacts made sense, as our contacts are typically well

educated.

Our final sample comprised a total of 115 respondents. Forty-one of these were native speakers

of British English (BrE) (average age 34.3,SD 11.9; 42%female, 58%had a masters degree

of higher), 40 were native speakers of American English (AmE) (average age 37.1, SD 11.6;

52%female, 60%had a masters degree of higher). We selected speakers of British English and

American English as we were concerned that there might be differences in interpretation between

these two forms that are strongly represented in organizational research and represent the two

forms of English that are taught internationally. We also asked members of the RMNet listserv

(a listserv restricted to members of the Research Methods Division of the Academy of Manage-

ment) to complete the survey (n 34, average age 44.5, SD 12.2; 48%female, 91%had a

masters degree or higher) as those with an interest in research methodology may assist colleaguesin developing surveys. The BrE and AmE samples were broadly similar in terms of age, gender

profile, and educational attainment, while the RMNet sample was slightly older and more highly

educated, as would be expected.

Hardy and Ford 5



7/26

Classification of Open-Ended Responses.Responses were classified manually using NVivo 8 (2008) by

two coders and interrater reliability statistics (Cohens kappa, k; Cohen, 1960, as implemented in

SPSS) were calculated.

Instructional miscomprehension was identified and coded when a respondent failed to follow the

instructions. The instructions were in a bordered box at the top of the first page and clearly stated in

bold, italicized, block capitals: DO NOT ATTEMPT TO ANSWER THE QUESTION. Respon-

dents who answered the question anyway demonstrated instructional miscomprehension, as did

those who did not follow the ensuing instructions to explain the item.3

In order to identify sentential miscomprehension we examined the responses for deviation from

the syntax of the question, in the form of enrichment or depletion. Enrichment was defined as the

respondent venturing beyond a strict syntactic interpretation of the question by including other con-

ceptual elements. Depletion, on the other hand, was defined as the absence of an element of the

question in the answer provided. Depending, of course, on the nature of the item, it is possible to

simultaneously enrich and deplete an item. For instance, a respondent who interprets the question

How fair or unfair are the procedures used to determine pay rates? (Sweeney & McFarlin,1993) as How transparent is compensation? simultaneously depletes the question by not asking

about the (un)fairness of pay rates (i.e., removing a conceptual element) and also enriches it by

expanding from pay to compensation.

Lexical miscomprehension is difficult to apprehend, as it is impossible to know what mental

schema the respondent is drawing on, so interpretation of the response can only be made by infer-

ence from the rest of the sentence. The definition of satisfaction in the question How satisfied are

you with the nature of the work you perform (Tsui et al., 1992) was coded as pleasurable if

words indicating pleasure (e.g., happy) were included in the interpretation and transactional if the

interpretation included phrases indicating that it matched or met their expectations. Questions where

no classification could be made were coded as neutral.Other words that proved tractable to classification for lexical miscomprehension included ques-

tions where a vague term such as oftenormostwas used, and there were responses that quantified

what these words meant, for exampleoftenbeing interpreted as 3/5 days. Lexical ambiguity could

also be seen in interpretations of such items as I like my job better than the average worker (Agho

et al., 1992), which begged the question of how you define the average worker. The differing

referents for the average worker were readily classifiable in the responses.

Finally, analyses were conducted using SPSS to establish the reliability and dimensionality of the

measures used in the survey.

ResultsStatistical Tests for Dimensionality. Coefficient alpha (Cronbach, 1951) for the measures ranged

between .79 and .91, which was consistent with or better than alphas previously reported for these

measures (Fields, 2002). In addition, we used exploratory factor analysis to establish unidimension-

ality of each measure (Conway & Huffcutt, 2003). We used maximum likelihood extraction and

direct oblimin rotation. All items had factor loadings exceeding .40, with almost all exceeding

.50, indicating undimensionality of each measure.

Results of Coding Open-Ended Responses

Instructional Miscomprehension. This was readily detected when respondents answered the ques-

tion rather than describing what the question meant. Eight respondents consistently answered thequestions instead of describing them (BrE 3/41; AmE 2/40; RMNet 3/34). RMNet members in par-

ticular demonstrated another form of instructional miscomprehension. Nine of 34 wrote responses

such as job satisfaction/facet is pay, procedural justice, or Need a Likert scale response,




8/26

which are clearly incompatible with the instructions. Overall, 17 of 115 respondents (15%) pro-

vided answers suggesting that they had not read the instructions properly.

Sentential Miscomprehension.Examination of the sample revealed evidence of both enrichment anddepletion, with little difference in degree of miscomprehension between the three groups of respon-

dents. Accordingly they were combined. The coding was completed by two raters, with excellent

interrater reliability (k 0.85-0.90). Depletion was particularly evident with the items from the pro-

cedural justice scale (Sweeney & McFarlin, 1993) where respondents ignored the procedural ele-

ment of the question, such that How fair or unfair are the procedures used to determine pay

rates? was interpreted as How fair is your pay? Effectively this turned a procedural justice item

into a distributive justice one. Overall 27% of respondents ignored the procedural element of the

question (BrE 30%, AmE 32%, RMNet 16%).

Five respondents said that they did not understand the item How fair or unfair are the procedures

used to communicate performance feedback. In spite of this they were still able to provide a numer-ical response in the multiple-choice section, despite the option not to do so. Therefore their lack of

understanding is undetectable in the statistical data. Up to 44%of respondents depleted any given

item, as can be seen in Table 1.

Table 1. Number of Respondents Coded as Depleting or Enriching Items.

Depletion(k 0.87)

Enrichment(k 0.85)

Scale Item N(%) N(%)

Tsui, Egan, andOReilly (1992)

1. How satisfied are you with the opportunities which exist inthis organization for advancement or promotion

51 (44) 46 (40)

2. How satisfied are you with the nature of the work youperform

20 (17) 40 (35)

3. How satisfied are you with the person who supervisesyouyour organizational superior

5 (4) 39 (34)

4. How satisfied are you with your relations with others in theorganization with whom you workyour coworkers andpeers

11 (10) 34 (30)

5. How satisfied are you with the pay you receive for your job 35 (30) 35 (30)

6. Considering everything, how satisfied are you with yourcurrent job situation 20 (17) 24 (21)

Sweeney andMcFarlin (1993)

1. How fair or unfair are the procedures used to communicateperformance feedback

28 (24) 38 (33)

2. How fair or unfair are the procedures used to determinepay rates

10 (9) 30 (26)

3. How fair or unfair are the procedures used to evaluateperformance

19 (17) 28 (24)

4. How fair or unfair are the procedures used to determinepromotions

26 (23) 38 (33)

Agho, Price, andMueller (1992)

1. I am often bored with my job 26 (23) 63 (55)2. I feel fairly well satisfied with my present job 5 (4) 26 (23)

3. I am satisfied with my job for the time being 7 (6) 60 (52)4. Most days I am enthusiastic about my work 12 (10) 38 (33)5. I like my job better than the average worker does 14 (12) 24 (21)6. I find real enjoyment in my work 0 (0) 0 (0)

Mean 19.3 (17) 37.5 (33)

Hardy and Ford 7



9/26

Enrichment was far more common, which is understandable as linguistic theory suggests that

individuals are more likely to augment basic sentence syntax to recover meaning. Table 1 demon-

strates that items were enriched by 21%to 55%of the respondents. There was no clear pattern to

the enrichment, with the exception of the statement I am satisfied with my job for the time

being (Agho et al., 1992), which seemed to trigger an association with turnover intention. This

enrichment was not uniform, however, as one respondent interpreted it as I am currently happy

with my job, but I may look for a new job in the future and another as Ill be out of here at the

first opportunity. Despite these very different interpretations, both these respondents agreed

with this item in their Likert response. It is clear then that the numerical output from multiple item

scales can mask considerable linguistic variance, as two opposing interpretations had the same

score (4/5).

No statistical difference was detectable between the depleted or enriched responses and the rest of

the sample on the basis of a Mann-Whitney U test (a nonparametric comparison test appropriate fornon-normal small samples) applied to the Likert responses for each category. So individuals are

answering different questions, sometimes radically so, and yet this is undetectable statistically.

Lexical Miscomprehension.Despite the intractability of detecting lexical miscomprehension, as pre-

viously noted, there was clear evidence of differing interpretations of the wordsatisfaction, as can be

seen in Table 2.

Other examples of lexical miscomprehension can be seen with the use of vague terms (Tourangeau

et al., 2000) such as the wordoftenin the item I am often bored with my job (Agho et al., 1992). For

example,oftenwas interpreted as varying from More than 33%of the time to 99%of my work.

The wordmostin the statement Most days I am enthusiastic about my work was interpreted withresponses as varied as 4/5 days, at least 3 days/week, and more often than not.

The final element of lexical miscomprehension examined was the comparators for the item

I like my job better than the average worker does. Thirty-four percent compared themselves to

Table 2. Classification of Satisfaction by Item.

Pleasurable(k 0.88)

Transactional(k 0.79) Neutral

Scale Item N(%) N(%) N(%)

Tsui, Egan, andOReilly(1992)

1. How satisfied are you with the opportunities whichexist in this organization for advancement orpromotion

22 (19) 24 (21) 69 (60)

2. How satisfied are you with the nature of the workyou perform

63 (55) 5 (4) 47 (41)

3. How satisfied are you with the person whosupervises youyour organizational superior

39 (34) 30 (26) 46 (40)

4. How satisfied are you with your relations withothers in the organization with whom you workyour co-workers and peers

47 (41) 21 (19) 47 (41)

5. How satisfied are you with the pay you receive foryour job 28 (24) 41 (36) 46 (40)

6. Considering everything, how satisfied are you withyour current job situation

53 (46) 5 (4) 57 (50)

Agho, Price,and Mueller(1992)

2. I feel fairly well satisfied with my present job3. I am satisfied with my job for the time being

43 (43)44 (38)

10 (9)6 (5)

62 (54)65 (48)

Mean 42.4 (38) 37.5 (18) 17.8 (16)




10/26

the general population, 18% to their peers, and 4% to people in similar jobs. Social comparison

theory suggests that the choice of referent is critical to attitude formation (Riordan & Shore,

1997). As with sentential miscomprehension there was no statistical difference between categories

on the basis of the Mann-Whitney test applied to the Likert responses for each category.

Study 1 Results Summary.Analysis of the qualitative results provides evidence for all three types of

miscomprehension and suggests that this miscomprehension may not be readily detectable statisti-

cally. The linguistic ambiguity within these scales is therefore a potentially significant but typically

undetectable source of error.

Study 2, Phase 1: Respondent Self-Classification Into Types of

Miscomprehension

In order to ensure that the miscomprehension observed in the first study was not an artifact of thecoding and classification process, we ran a second study to verify our results.

Method

Study 2 was survey based and had three parts. The first was the normal presentation of the scales

(i.e., with a modified Likert response scale), and the last section asked for demographic information.

The second section, however, was very different.

Participants were presented with the items and then invited to choose the interpretation that most

closely matched their understanding of the original item. We derived these alternative interpreta-

tions from the open-ended responses gathered in Study 1. In order to assess sentential miscompre-

hension, each stimulus item was presented with several response options, including a neutralparaphrase of the original item (N), an enriched response (E), a depleted response (D), a response

that was both depleted and enriched (D&E), and an option for respondents who did not agree any

of the choices were consistent with their understanding (I). A sample item is presented in Table 3.

Table 3. Sample Items for Study 2, Phase 1.

Sentential Miscomprehension Lexical Miscomprehension

Actualitem

How satisfied are you with thenature of the work you perform?

Actualitem

I feel fairly well satisfied with mypresent job

N How satisfied are you by the type of work you do?

N My present job leaves me fairly wellsatisfied

E How satisfied are you with your joband its responsibilities?

P My present job makes me fairly happy

D How satisfied are you with your job? T My present job meets myexpectations

D&E How satisfied are you that you aredoing a good job?

I None of the interpretations offeredmatch my interpretation of thequestion/statement

I None of the interpretations offered

match my interpretation of thequestion/statement

Note: Sentential miscomprehension: N neutral paraphrase of the original item; E an enriched response; D a depletedresponse (D); D&E a response that was both depleted and enriched; I an option for respondents who did not agree anyof the choices were consistent with their understanding (I); Lexical miscomprehension: N neutral interpretation; P plea-surable interpretation; T transactional interpretation; I a response for not agreeing with any of the choices.

Hardy and Ford 9



11/26

Each stimulus was presented twice with differing sets of possible interpretations (i.e., neutral,

enriched, etc.) on each occasion, in order to verify that the respondents selections were not simply

an artifact of the specific choices presented. If a respondent is interpreting the item accurately, they

should select the neutral option both times.

Similarly when assessing lexical miscomprehension we focused on the different meanings of the

wordsatisfaction. Respondents were offered neutral (N), pleasurable (P), and transactional (T) inter-

pretations and a response for not agreeing with any of the choices (I). A sample item is presented in

Table 3. As with sentential miscomprehension each stimulus was presented twice with differing

response options. If linguistic ambiguity has no effect on survey research, then respondents should

either pick the neutral item, which was a paraphrasing of the initial question or, at least, all choose

the same non-neutral option.

The second test for lexical miscomprehension included the use of three items with ambiguous

modifying terms: I amoftenbored with my job, Mostdays I am enthusiastic about my work,

and I like my job better than the average worker does (Agho et al., 1992). Respondents were

presented with different options for each term indicating different frequencies, time periods, orcomparative groups, respectively. For example, for the item I am often bored with my job,

responses ranged from More than 33%of the time I am bored with my job to More than 75%

of the time I am bored with my job.

The validity of the response choices was checked by sending them to a university linguistics

professor. She classified the responses into the sentential and lexical miscomprehension cate-

gories. She accurately classified 98% of the response choices into the same category as the

authors, suggesting that the choices accurately mirrored sentential and lexical miscomprehension

as outlined previously.

The sample for Study 2 came from two sources. First, we again sent the survey to some of our

contacts and asked them to forward it to other working adults and collected 165 valid responses fromthis group. We then collected an additional 100 responses using a paid Qualtrics panel of working

adults. This allowed us to check whether the phenomena observed were an artifact of our sampling

method or whether they also occurred in a broader sample that is likely to be more representative of

the population of working adults.

Results

Dimensions of the Sample.Two hundred sixty-five valid responses were received. Most of the respon-

dents were natives of either the UK (37%) or US (52%) with the remaining 11%from a mix of other

countries. The average age was 42 (SD 11.5), 57.7%were female and 87.5%had a college degree,

with 52%having a masters degree or higher.

Statistical Tests for Scale Validity.As in Study 1 numerical responses to the first portion of the sur-

vey where the items were administered in their conventional format were subjected to statistical

analysis. Coefficient alpha values were somewhat higher than in Study 1, varying between .87

and .91.

We established scale dimensionality in this sample by conducting confirmatory factor analysis

using LISREL (Joreskog & Sorbom, 2006) to examine factor structure. We looked at each of the

job satisfaction scales (Agho et al., 1992; Tsui et al., 1992) on its own in combination with the

measure of procedural justice (Sweeney & McFarlin, 1993) as we did not expect to find that a

three-factor model including both measures of job satisfaction would provide a satisfactory fitto the data. In both cases, all items loaded significantly on the latent variable, and acceptable fit

statistics were obtained (Comparative Fit Index [CFI] of .95 and .98; standardized root mean

square residual [SRMR] of .043 and .059), although in both cases the chi-square test was




12/26

significant. Chi-square can be problematic (Joreskog, 1969) as it is very sensitive to both sample

size and violations of distributional assumptions. Garson (2009) advises that chi-square test sig-

nificance can be overlooked if other fit measures indicate good fit. Given that other fit statistics

were consistently within acceptable range (Hair, Black, Babin, Anderson, & Tatham, 2006), and

the reliability coefficients were strong, we concluded that the scales demonstrated sufficiently

good fit that if we were conducting a substantive analysis using these data, we would consider that

we had confirmed the unidimensionality of each scale.

Sentential Miscomprehension.Over 90%of the respondents found that one of the four interpretations

of the stimulus item offered (i.e., neutral, enriched, depleted, or enriched and depleted) coincided

with their interpretation of the stimulus item. If respondents had followed the strict syntax of the

stimulus item then they should have selected the neutral response. While this was the modal category

and was, on average, selected 49% of the time (range 24%-68%), other categories, principally

enrichment, were also commonly chosen. Table 4 shows the number and percentage of respondents

demonstrating sentential miscomprehension.These results confirm that respondents routinely go beyond the strict syntactic meaning of items.

Moreover this interpretive process does not appear to be uniform and some items are enriched or

depleted to differing degrees. These data accord with the results of Study 1.

The first section of the survey contained the scales in their conventional format (i.e., with Likert

responses). This allowed us to see whether different interpretations produced significantly different

scores. For 9 of 30 items the different interpretations produced significantly different scores on the

Kruskall-Wallis test, with these demonstrating mild to moderate effect sizes of between r0.15 to

0.29 (Cohen, 1977). A Bonferroni correction was applied to post hoc Mann-Whitney results, such

that differences are reported at a .01 level of significance. The categories that differed from the

neutral interpretation and the direction in which they differed are shown in Table 4.We conducted character, syllable, and word counts for each item and calculated a number of

readability measures (after Jensen, 2009) to explore the impact of item length and complexity on com-

prehension. These were then compared to the number of neutral interpretations of each item as fewer

neutral interpretations suggests greater miscomprehension. No significant relationship was observed,

suggesting that item length and complexity did not correlate with miscomprehension.

Lexical Miscomprehension. Over 90% of respondents found that the options that they were pre-

sented with matched their own interpretation of the item. Analysis of respondents understand-

ing of the word satisfied, presented in Table 5, suggests that there was significant deviation

from the neutral response. Fifty-six percent of respondents selected the neutral option

which contained the word satisfied rather than any interpretation. Twenty-four percent of

respondents selected a pleasurable interpretation of the wordsatisfied, and 14%chose the trans-

actional interpretation.

The number of respondents choosing pleasurable or transactional interpretations was lower than

in Study 1. Statistical analysis of the responses to the different forms of lexical miscomprehension

using the Kruskal-Wallis test demonstrated a significant difference for only 1 of 16 items, again,

with a mild to moderate effect size (r .18, see Table 4).

Differing lexical interpretations were also observed. The wordoftenin the item I am often bored

with my job was interpreted as being 33%to 75%of the time.Mostin Most days I am enthusiastic

about my work was either 3 of 5 or 4 of 5 days with a significant number understanding most as

more often than not. Finally for the item I like my job better than the average worker does, anumber of different options were given to define the average worker. Most respondents selected

either the typical worker in this country or the average person, but 14% selected either

coworkers or peers as their comparator.

Hardy and Ford 11



13/26

Although the differences in categories was significant when analyzed using the Kruskal-Wallis

test as an omnibus test, none of the categories reached significance in post hoc analysis using

Mann-Whitney with a Boniferroni correction, indicating that there is no significant difference in

scale score based on respondent interpretation of the item.

These data present strong evidence that individuals interpret words differently. Some regardbeing satisfied as a pleasurable experience whereas others regard it as a transactional one, even

when the original intent of the author was one or the other. Modifying words such as oftenare simi-

larly open to interpretation as individuals draw on different mental schemata.

Table 4. Number and Percentage (in Parentheses) of Respondents Demonstrating Various Forms of SententialMiscomprehension.

Scale/item Pr. N E D D&E I

Tsui, Egan, and OReilly (1992)1. 1 164 (61) 58 (21) 22 (8) 15 (5) 6 (2)

2 145 (55) 45 (17) 46 (17) 19 (7) 8 (3)2. 1 180 (68) 47 (17) 13 (4) 20 (7) 4 (1)

2 168 (63) 28 (10) 37 (13) 23 (8) 9 (3)3. 1 163 (61) 36 (13)*a 30 (11) 23 (8) 12 (4)*a

2 130 (49) 66 (25)*a 16 (6) 37 (14)*a 14 (5)4. 1 175 (66) 31 (11) 35 (13) 16 (6) 6 (2)

2 102 (38) 49 (18) 46 (17) 58 (22) 8 (3)5. 1 131 (49) 39 (14) 75 (28) 15 (5) 5 (1)

2 89 (33) 94 (35)*a 40 (15) 20 (7) 20 (7)

6. 1 152 (57) 8 (3) 79 (29) 24 (9) 2 (0)2 119 (45) 22 (8) 69 (26) 43 (16) 9 (3)Sweeney and McFarlin (1993)

1. 1 96 (36) 60 (22) 72 (27) 20 (7) 16 (6)2 97 (37) 89 (33) 30 (11) 17 (6) 29 (11)

2. 1 123 (46) 63 (23) 29 (11) 41 (15) 7 (2)2 155 (58) 67 (25) 19 (7) 12 (4) 10 (3)

3. 1 88 (33) 56 (21) 38 (14) 75 (28) 5 (1)2 152 (58) 41 (15) 41 (15) 14 (5) 14 (5)

4. 1 64 (24) 78 (29) 41 (15) 52 (19) 27 (10)2 74 (28) 87 (33) 54 (20) 27 (10) 18 (6)

Agho, Price, and Mueller (1992)

1. 1 74 (28) 81 (30)*a

71 (26) 18 (6)*a

20 (7)2 155 (58) 34 (12)*a 38 (14) 7 (2) 30 (11)

2. 1 147 (56) 58 (22) 31 (11)*b 16 (6) 10 (3)2 150 (56) 27 (10) 66 (25)*a 12 (4) 9 (3)

3. 1 158 (60) 69 (26) 19 (7) 5 (1) 12 (4)2 147 (55) 57 (21) 23 (8) 22 (8) 14 (5)

4. 1 127 (48) 74 (28) 23 (8) 30 (11) 8 (3)2 100 (38) 86 (32) 26 (9) 28 (10)*b 23 (8)*b

5. 1 147 (55) 42 (15) 17 (6) 43 (16)*b 14 (5)2 91 (34) 82 (31) 15 (5) 40 (15) 35 (13)

Grand mean 128.7 (48.9) 55.8 (21.2) 38.7 (14.7) 26.4 (10) 13.4 (5)

Note: Pr. presentation (each item was presented twice with a set of interpretations for each presentation); N neutral;E enriched; D depleted; D&E depleted and enriched; I inappropriate interpretation.*Denotes significant differences in score compared to neutral category.aDenotes mean below the neutral response mean.bDenotes mean above the neutral response mean.




14/26

Study 2, Phase 2

Phase 1 of the study demonstrated primary effects, where differences in interpretation resulted in

significant differences in score. We have already raised the possibility of secondary effects, where

differences in interpretation affect other constructs. So to explore this possibility we repeated Phase

1 and incorporated a measure of turnover intention.

Method

The survey used was exactly the same as that used in Phase 1, but with the addition of two items from

the Camman, Fischman, Jenkins, and Klesh (1983) scale measuring turnover intention. The surveywas administered using Amazons Mechanical Turk (mTurk) to obtain a sample of respondents who

were currently employed. mTurk has been used effectively in various fields, including both linguis-

tic and psychology studies (see Mason & Suri, 2012; Sprouse, 2011).

Results

Of the 250 respondents who completed the survey, 39 failed checks built in to test for careless

responding, leaving a final sample size of 211 valid responses. The checks included 3 items scattered

through the survey such as If you are reading this, please select disagree. Participants were dis-

carded from the sample if they failed two or more of these checks and they completed the survey

very quickly. This is the consistent with the methods described by Meade and Craig (2012) to detectand eliminate cases in which the respondent is not attending to the content at all.

Of respondents, 51.2%were female, the average age was 35.2 (SD 10.9), 95.3%of respondents

were American, 78.7%had a college degree or higher, with 15.4%having a masters degree or higher.

Table 5. Number and Percentage of Respondents Demonstrating Lexical Miscomprehension of the WordSatisfied.

Scale/item Pr. N P T I

Tsui, Egan, and OReilly (1992)1. 1 145 (55) 63 (23) 48 (18) 7 (2)

2 139 (53) 78 (29) 36 (13) 9 (3)2. 1 141 (53) 87 (33) 26 (9) 9 (3)

2 176 (67) 41 (15) 35 (13) 10 (3)3. 1 151 (57) 38 (14) 59 (22) 14 (5)

2 156 (59) 54 (20) 45 (17)*a 8 (3)4. 1 118 (44) 103 (39) 24 (9) 18 (6)

2 169 (65) 53 (20) 32 (12) 6 (2)5. 1 127 (48) 55 (20) 76 (28) 6 (2)

2 79 (30) 95 (36) 78 (29) 10 (3)

6. 1 151 (57) 89 (33) 18 (6) 5 (1)2 139 (52) 78 (29) 34 (12) 12 (4)Agho, Price, and Mueller (1992)

2. 1 163 (63) 44 (17) 37 (14) 14 (5)2 176 (67) 54 (20) 23 (8) 8 (3)

3. 1 168 (63) 59 (22) 14 (5) 23 (8)2 183 (69) 40 (15) 23 (8) 16 (6)

Grand mean 148.8 (56.7) 64.4 (24.5) 38 (14.4) 10.9 (4.1)

Note: Pr. presentation (each item was presented twice with a set of interpretations for each presentation); N neutral;P pleasurable; T transactional; I inappropriate interpretation.*Denotes significant differences in score compared to neutral interpretation.aDenotes mean below the neutral response mean.

Hardy and Ford 13



15/26

Sentential Effects on Turnover Intention. We noted in Study 1 that the item I am satisfied with my

job for the time being (Agho et al., 1992) seemed to trigger an association with turnover

intention in some respondents. In Study 2, one of the enriched interpretations reflects this.

Respondents therefore could choose between a neutral interpretation of the item and an

enriched version, At the moment I am satisfied with my job and I am not looking for a new

one. There was no significant difference detectable in item score across the interpretations,

suggesting that the impact of the miscomprehension is not directly detectable. Accordingly

we selected this item for analysis.

We compared the turnover intention scores of those who selected the enriched interpretation of

the item with those who selected the neutral interpretation. Using a Mann-Whitney test to compare

the scores for these two groups we found a difference that approached significance, U 1,867,

z1.88,p .060,r.13, whereby those enriching the item scored higher on turnover intention.

This suggests that sentential miscomprehension can have indirect effects, as such a result is unlikely

to have happened by chance (Nickerson, 2000).

Lexical Effects on Turnover Intention. We explored the possibility that lexical miscomprehension

might also have indirect effects by examining the relationship between differing interpretations

of the wordsatisfactionand turnover intention. We chose the satisfaction item that best reflected

the global concept of job satisfactionConsidering everything, how satisfied are you with your

current job situation? In Phase 1 there was no significant difference in item score among the

types of interpretation offered.

We found a significant difference across groups when comparing the transactional interpretation

to the pleasurable interpretation, U 569,z 2.95,p .013,r 0.20, and also when comparing

the transactional interpretation to the neutral one, U 1,208, z 2.38,p .017,r .16.

These results show that differences that do manifest themselves on item score may, as linguistictheory suggests (Schwarz, 1999), reflect differing cognition and have indirect effects.

Study 2 Results Summary.The findings of Phase 1 of Study 2 confirm the findings in Study 1, suggest-

ing that they are not the product of researcher confirmation bias (see Nickerson, 1998). Furthermore,

Phase 2 of Study 2 demonstrates that miscomprehension may also have indirect effects.

Study 3: Replication of Results With Spreitzers Empowerment Scale

To allay concerns that the findings demonstrated were an artifact of the scales selected we replicated

Studies 1 and 2 with Spreitzers (1995) 12-item empowerment scale. This scale was developed usingbest practices for scale development and the author reported strong evidence of construct validity.

The scale includes four subdimensions of psychological empowerment: meaning, competence, self-

determination, and impact.

Method

The replication was carried out in two phases. Phase 1 involved administering the 12 items of the

scale to a pool of 29 participants, as in Study 1. The results of this first phase were then used to create

the item interpretations for the second phase, which replicated Study 2. In this phase we constructed

neutral, enriched, depleted, and depleted and enriched interpretations for each item in Spreitzers

(1995) scale and asked respondents to select the interpretation that most closely matched their owninterpretation. We again collected demographic data and asked the respondents to answer the scale

in its original format. Data were collected from 100 employed workers through Amazons Mechan-

ical Turk.




16/26


17/26

The reasons why individuals did not read the instructions is unclear, but it seems probable that

familiarity with the task and the seeming obviousness of what needs to be done are likely. The

highly educated nature of the sample may mean that they are more regularly surveyed. They may

also take mental shortcuts as they believe they know what is required. This would seem partic-

ularly true for the RMNet group for whom surveys are likely to be common currency.

The fact that a large number of RMNet members (12/34) did not read the instructions is under-

standable but nonetheless a cause for concern. When developing scales, researchers routinely ask

their colleagues opinions on various matters and use them to help generate items. This result sug-

gests that Schriesheim and colleagues (1993) were correct when they asserted that academic col-

leagues might not be the ideal assistants.

Sentential Miscomprehension

The results suggest significant sentential miscomprehension. This is most dramatically observed in

the procedural justice scale (Sweeney & McFarlin, 1993) where (depending on the item) 21% to31%of respondents appeared to miss the process element and hence answer a question about dis-

tributive justice. The figure was rather lower for the RMNet group, which we believe may be

because they are more likely to be sensitized to the procedural element of the question. The fact that

the group that ignored the procedural component was indistinguishable statistically from the respon-

dents interpreting the question correctly means that any results from this scale would contain sub-

stantial and hidden error. However, as the literature is replete with studies in which procedural and

distributive justice are highly correlated (Colquitt, 2001), this study perhaps sheds some new light on

the source of some of this collinearity.

Study 2 showed that enrichment, depletion, or a combination of the two were demonstrated by

about half of respondents. This suggests that around half of respondents deviate from the strict syn-tax of the item and alter it according to their own understanding.

The impact of sentential miscomprehension is difficult to ascertain. For 9 of the 30 items, differ-

ing interpretations produced significantly different results on the traditional presentation of the item.

This effect, however, was inconsistent. It seems likely that the process of enrichment or depletion

meant that respondents tapped into other concepts when responding. For example, the item I am

satisfied with my job for the time being was interpreted 22% of the time as At the moment I

am satisfied with my job and I am not looking for a new one, an interpretation that, as Phase 2

of Study 2 shows, also taps into turnover intention. The implications of this are both direct and indi-

rect. Directly, some respondents tap into a construct other than job satisfaction. Indirect effects

might potentially occur when job satisfaction and turnover intention are included in the same study.

The linguistic interaction for some (but not all) of the participants would potentially interfere with

the validity of the results obtained.

Sentential miscomprehension therefore has the potential to introduce substantial error. This may

not be detectable using conventional scale appraisal techniques, as the scales in this study had high

coefficient alphas and were unidimensional in factor analyses. Despite this, this error has consider-

able impact on construct validity and sporadic impact on scale score.

Lexical Miscomprehension

Although lexical miscomprehension is difficult to ascertain, the different understandings of the word

satisfaction and the variation in meaning of the wordoften in this study demonstrates that there islinguistic variance among respondents.

What is the impact of this? It seems likely that respondents are answering somewhat different

questions as a result of lexical miscomprehension of the word satisfied. Again this raises the




18/26

possibility that some respondents may be tapping into different constructs when they interpret items.

This will have both primary effects on the scale itself and secondary effects on any model incorpor-

ating constructs similar to those being unconsciously tapped into by the respondent.

Lexical miscomprehension is most clearly observed in the different interpretations of often.

When responding to the statement I am often bored with my job, if the threshold for often is

one-third of the time then you are likely to respond differently to this question than would someone

for whom often means 99%of the time.

The results demonstrate that individuals have different interpretations of individual words, yet

this is not appreciable using conventional statistical techniques. The impact of this is that respon-

dents are drawing on quite different conceptual schema and referents, with attendant diminution

of construct validity.

Consequences and Remedies

The findings of this study make uncomfortable reading for those of us involved in survey research.The degree of variance observed linguistically would be serious cause for concern if it was observed

numerically. All four measures used in this series of studies have been published in high-quality

journals and frequently cited. They have furthermore been subjected to significant previous con-

struct validation analyses. And yet they all contain linguistic threats to validity.

The reliability and validity of measurement instruments and surveys is of pivotal importance. If

the measures do not measure what they purport to, or do not do so accurately, then recommendations

based upon research that employs these measures are likely to be flawed. So what is to be done? We

propose strategies to minimize each form of miscomprehension for those developing new scales and

those using existing scales. We begin by providing principles to undergird item construction.

Instructional Miscomprehension.Putting borders around the instructions and using bold type or capitals

do not seem to completely eliminate instructional miscomprehension, as these were all used in Study

1 to little effect. The literature on manipulation checks however offers a possible direction. Oppen-

heimer, Meyvis, and Davidenko (2009) used an effective system to detect failure to follow instruc-

tions. Participants were presented with a survey about sports participation. The instructions indicated

that participants should ignore the first question in the survey and instead click on the page title.

Those who had not read the instructions were able to proceed with the survey normally, allowing

researchers to compare the responses of those who read the instructions with those who did not. This

is similar to the captcha or reverse Turing test advocated by Mason and Suri (2012) where a

particular response is requested by the survey item to prove that the participant is paying attention

and motivated. Use of these approaches improves engagement, reducing the proportion of invalid

responses from 48.6%to 2.5%(Kittur, Chi, & Suh, 2008).

There is a danger that these seemingly counterintuitive instructions (e.g., to ignore a particular

item) might confuse respondents. Instructing participants to pay close attention to the items and

informing them that tests that measure their carefulness or attentiveness are being used may help.

Care should also be taken to ensure that such tests do not interfere with respondents capacity to

answer other items in the survey. There is also a potential danger that selecting only those respon-

dents who obey all the instructions excludes particular groups, for example those with a particular

personality trait. Nonetheless, tests of this sort are a useful addition to any survey to ensure that par-

ticipants have attended to the instructions and are sufficiently motivated.

Sentential Miscomprehension. When looking at the structure of the item, it is important to eschew

vague words such asmany,most,often, orsometimes. These have no formal quantity and so repre-

sent an open invitation to miscomprehension. Try to use a quantity so that instead of an item that

Hardy and Ford 17



19/26

says Most days I am enthusiastic about my work use I am enthusiastic about my work at least

75% of the time. When asking for a comparison, ensure that the comparator is clear. The UK

Office of National Statistics Wellbeing Survey (Office of National Statistics, 2013), for example,

contains an item Overall, how satisfied are you with your life nowadays? Nowadaysis a vague

term. A better item would be Overall, how happy have you been with your life over the last three

months.

Comprehension may also be improved by the use of bold or italicized elements of items (see

Christian & Dillman, 2004). As those respondents not following instructions have already been

eliminated it seems likely that those remaining in the sample are more motivated and attentive. The

use of bold or italics may help emphasize elements of the item, for example, How fair or unfair are

theproceduresused to determine pay rates?, thus avoiding some of the problems witnessed with

the Sweeney and McFarlin (1993) scale.

Lexical Miscomprehension.There is a great deal of extant advice on item construction (e.g., DeVellis,

2003; Groves et al., 2004) and our recommendations are intended to supplement this, not supplant it.We first consider the actual words used in the item.

The tendency for multiple interpretations of the same word (polysemy) that we have observed

with the wordsatisfactionsuggests that care should be used to avoid words with multiple meanings.

The number of different meanings of a particular word can be examined using a dictionary. As an

example, the wordhappymight be preferred to the wordsatisfiedwhen asking about the relationship

between an employee and their organization if the measure is intended as an affective one. If another

word is not available, then linguistic theory suggests that context aids comprehension. This means

that the scale authors should provide careful guidance, through contextual information, as to which

meaning the respondent should select. As this contextual information is typically provided in the

instructions, the problem of instructional miscomprehension becomes even more of a concern andthe use of effective mechanisms to combat it even more vital.

Ill-defined words, such asmeaningful, should also be avoided. While the researcher may have a

clear idea of what a concept means, the respondents may not. Plain, short, commonly used words are

most likely to be understood and reduce miscomprehension. Care should also be taken to avoid jar-

gon and culturally specific terms. The wordquite, as inquite good, for example, is a superlative in

the US but represents borderline mediocrity in the UK. In some cases researchers may abjure from

using words altogether and use pictograms, such as smiling/frowning faces used by Kunin (1998).

Those using existing scales should be similarly critical of words used, even in published scales, as

the authors may not have attended to issues of miscomprehension.

Survey Construction.Moving from the individual item to survey composition allows us to use linguis-

tic theory to aid comprehension. Given that comprehension is a function of syntax and context, a

sensible approach is to provide plenty of context to ensure a more uniform and predictable interpre-

tation process. While there have been conflicting views of the necessity or indeed the desirability of

intermixing items (Schriesheim & Denisi, 1980; Schriesheim, Kopelman, & Solomon, 1989; Spar-

feldt, Schilling, Rost, & Thiel, 2006), we suggest that it might potentially have an effect on miscom-

prehension as surrounding items may well provide contextual information that helps the respondent

understand the item, and so grouping may help reduce miscomprehension (Tourangeau & Rasinski,

1988). We therefore recommend grouping items together.

Similarly, more thoughtful instructionsagain with an attentiveness checkcould help

improve item comprehension. If the heading for the Sweeney and McFarlin (1993) scale containedthe instruction We now want you to think about the processes by which a number of different

things that affect you at work are decided, then this may help emphasize the procedural element

of the scale.




20/26

While it is clear that linguistic errors are a considerable concern in survey research, few researchers

are looking for it. When a new scale is being developed, researchers might ask expert judges if they

understand the item, or ask whether the item appears to them to measure the construct, but they do not

typically ask respondents what they think the item means. If an existing scale is used, then the prove-

nance of publication may well ensure that even less attention is paid to the properties of the scale.

Testing and Evaluation. After developing a scale that appears to minimize linguistic miscomprehen-

sion while sampling the content domain appropriately, we suggest that researchers assess the degree

of linguistic ambiguity that the items produce within the target population. This field testing should

be used both during scale development, where it should be thought of as a distinct and necessary step

in the scale development process and also when using preexisting scales. As linguistic theory sug-

gests that comprehension is a function of both syntax and context, any change in context necessitates

a check to ensure that the item is still uniformly understood by respondents and, crucially, in the

same way as the researcher.

We have found, during the course of our own research, that the approach we adopted in Study 1was very effective at identifying ambiguity. This simply asks respondents to describe what they

think the question means in their own words. This technique can be used in both the development

of new items and in appraising preexisting ones. The advantage of this approach is that it can be

administered remotely, and it elicits useful information. For item development purposes it is not nec-

essary to go through the coding processes we did in Study 1, as inspection of the responses is usually

sufficient to establish whether the item is being homogenously interpreted. This approach is prob-

ably only suitable, however, for about 15 to 20 items as it is time-consuming for the respondent. For

longer surveys one might use a piecemeal approach where the whole survey is broken down into

blocks of 15 to 20 items and administered to separate samples.

The scale development literature includes a number of approaches to further ensure both con-tent adequacy and homogeneity of comprehension of survey items. Schriesheim et al. (1993) sug-

gest using Q-methodology to help measure the differences between individual judges. Anderson

and Gerbing (1991) also offer methods for pretesting with small samples, although their technique

focuses more on predicting confirmatory factor analysis (CFA) performance. Hinkin (1998) has

suggested both these approaches to test content validity. Hinkin and Tracey (1999) built on this

work and provided an analysis of variance technique that allows for evaluation of item distinctive-

ness as part of the content validation process. In public opinion surveys cognitive interviewing is

commonly used to pretest items (Beatty & Willis, 2007; Schwarz & Sudman, 1996). This may

either take the form of asking respondents to think aloud as they answer the survey question

(Ericsson & Simon, 1980) or by probing specific areas of understanding to help draw out elements

of the respondents thinking (Willis, DeMaio, & Harris-Kojetin, 1999).

In all survey development there is a tension between specificity and applicability. Words and items

that are too specific will not be applicable in other contexts, similarly general words and items may not

be sufficiently precise to reflect subtle difference. The process of thoughtful development and field

testing should help ensure that the researcher successfully navigates between these two poles.

A Note for Reviewers

Those reviewing papers using survey research should also attend to issues of linguistic ambiguity, as

quality measurement is the responsibility of both researcher and reviewer (Hinkin, 1995). The first

step should be to require anyone who submits a paper based on survey research to provide the mea-sure for examination.4 They should then, having ruled out normal threats to validity (e.g., double

barreled, etc.), look at the individual words in the item to see if the item contains any modifiers such

asmostoroften or words with multiple meanings, such as satisfaction. The next step should be to

Hardy and Ford 19



21/26

inspect the syntax of the item to make sure that referents are clear, for example, yesterdayrather than

recently. Reviewers should then assure themselves that the authors have ascertained whether the

item is uniformly comprehensible to the target audience. Finally reviewers should inquire as to the

steps taken by researchers to ensure that their respondents have read the instructions and are moti-

vated throughout the survey. These steps, taken together, should help reduce poorly worded items

and weed out unmotivated participants, thus improving the quality of research based on soliciting

opinions.

It is unlikely, however, that linguistic ambiguity will ever be eliminated. As linguistic philoso-

phers have pointed out, and as we discussed in the linguistic theory section, there is always an

indeterminacy of language. Nonetheless these approaches to item development and testing will

help identify and eliminate the more obvious forms of instructional, sentential, and lexical

miscomprehension.

Limitations and Future DirectionsThis article used two different approaches to explore the impact of linguistic pragmatics on survey

interpretation through the examination of four carefully chosen scales. By using a combination of

open-ended and fixed response format questions we have aimed to use methods that have non-

overlapping weaknesses in addition to their complementary strengths (Brewer & Hunter, 2006,

p. 4). The four scales used here, however, can hardly be seen as representative of all available scales.

Future studies should extend this analysis to other scales.

Future research could explore the properties of items and words, and their context, which led to

them being either sententially or lexically miscomprehended. This would require considerable

effort, but given careful design and sufficient respondents, it may be that general principles to reduce

the impact of linguistic factors on survey research could be produced, given that the extant linguis-tics literature has examined some of these issues already.

Another interesting area for future research would involve comparing whether native English

speakers were more likely to sententially or lexically miscomprehend items than non-native

speakers. Theoretically native English speakers should have a more nuanced vocabulary and so

be more likely to make linkages to other English words than non-native speakers. This may mean

that non-native English speakers are actually better survey respondents as they are more likely to

interpret items appropriately. Research in children, who similarly have a more restricted vocabu-

lary, suggests that they produce a more restricted set of interpretations (Noveck, 2001). In addi-

tion, future research might examine the possibility that there may be individual differences that

drive linguistic miscomprehension.

Finally, the current study has been within the paradigm of classical test theory. Item response the-

ory (IRT) might offer an alternative approach to identifying linguistic ambiguity. IRTs ability to

examine bias by comparing the performance of individual items has been used to identify charac-

teristics of respondent populationsfor example those faking personality tests (Zickar, Gibby, &

Robie, 2004). IRT has also been used to explore characteristics of surveys, for example context

effects (Rivers, Meade, & Lou Fuller, 2009), the effects of extreme wording (Nye, Newman, &

Joseph, 2010), and equivalence of translation (Ellis, 1989). Evaluation of item response curves may

help identify differences in interpretation that are not readily appreciable using classical techniques.

SummarySurvey research is a critical weapon in the social scientists methodological armory. It enables the

opinions and feelings of large numbers of respondents to be rapidly ascertained and collated. Devel-

opments in statistical techniques have enabled more sophisticated analyses to be performed in order




22/26

to enhance our understanding of social phenomena and processes. Surveys have tended to rest on the

assumption of an unbroken chain of comprehension between the mind of the researcher through the

survey instrument and to the mind of the recipientand back again. This assumption does not seem,

on the basis of the results of this study, to be particularly robust. Respondents often either fail to

follow instructions or miscomprehend the items presented. This is not readily detectable when the

output from a survey is numerical; a problem that may be further exacerbated by changes in the con-

text in which the items are presented.

This article is not intended to denigrate surveys as an information source or research tool, but

rather it seeks to draw the readers attention to some of the linguistic problems that underlie surveys

and to demonstrate the magnitude of effect of these problems. The problem is, perhaps, most neatly

summarized by the sociologist R. H. Tawneys (1971) comment that Sociology . . . is a department

of knowledge which requires that facts should be counted and weighed, but which, if it omits to

make allowance for the imponderables, is unlikely to weigh or even count them right (p. 147), a

comment that seems as relevant and applicable to organizational survey research as it does to

sociology.Overall, research into the linguistics of survey items is a rich soil for future research. Given the

misinterpretation described in this article, there is clearly much work to be done. Attention to the

potential methodological issues outlined in this article should help produce better, more valid results

that will in turn provide the basis for an improved understanding of social and organizational

phenomena.

Authors Note

All data are available from either author (Ben Hardy, [email protected], or Lucy Ford, [email protected]). We

would like to thank Dr. Alyson Pitts, University of Cambridge, for her expert linguistics assistance, and Dr.

Raina Brands, University of Cambridge, for her tireless assistance with data coding. We would also like to

thank the three anonymous reviewers who provided invaluable feedback that greatly improved this manuscript.

A previous version of this manuscript appears in the Academy of Management (2012) proceedings and was

recipient of the Sage Publications/Research Methods Division Best Paper Award.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publi-

cation of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

Notes

1. In this article, any distinction between statements (declaratives) and questions (interrogatives) is disregarded

for current purposes as both statements and questions serve as the stimulus to which the survey recipient is

asked to respond. Accordingly, the terms will be used interchangeably.

2. The examples quoted in this section are actual responses from Study 1 in the empirical portion of this article.

3. As the Agho, Price, and Mueller (1992) satisfaction scale used declaratives it was impossible to tell whether

respondents were paraphrasing the question or answering. Accordingly this scale was not analyzed.4. We understand that some authors may not wish their scales to be placed in the public domain. This is rea-

sonable. It is not reasonable, however, given the problems of construct validity that numerous authors have

identified for the scales not to be shared with reviewers.

Hardy and Ford 21



23/26

References

Agho, A. O., Price, J. L., & Mueller, C. W. (1992). Discriminant validity of measures of job satisfaction, pos-

itive affectivity and negative affectivity. Journal of Occupational and Organizational Psychology, 65(3),

185-196.Anderson, J. C., & Gerbing, D. W. (1991). Predicting the performance of measures in a confirmatory factor

analysis with a pretest assessment of their substantive validities. Journal of Applied Psychology, 76(5),

732-740.

Beatty, P. C., & Willis, G. B. (2007). Research synthesis: The practice of cognitive interviewing. Public

Opinion Quarterly,71(2), 287-311.

Borsboom, D., Mellenbergh, G. J., & van Heerden, J. (2004). The concept of validity.Psychological Review,

111(4), 1061-1071. doi:10.1037/0033-295x.111.4.1061

Brayfield, A. H., & Rothe, H. F. (1951). An index of job satisfaction. Journal of Applied Psychology, 35(5),

307-311.

Brewer, J., & Hunter, A. (2006). Foundations of multimethod research: Synthesizing styles. Thousand Oaks,

CA: Sage.

Camman, C., Fischman, M., Jenkins, G. D., & Klesh, J. (1983). The Michigan organizational assessment sur-

vey: Conceptualization and instrumentation. In S. E. Seashore, E. E. Lawler III, P. H. Mirvis, & C. Camman

(Eds.), Assessing organizational change: A guide to methods, measures and practices . New York, NY:

Wiley Interstice.

Christian, L. M., & Dillman, D. A. (2004). The influence of graphical and symbolic language manipulations on

responses to self-administered questions.Public Opinion Quarterly,68(1), 57-80. doi:10.1093/poq/nfh004

Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement,

20(1), 37-46. doi:10.1177/001316446002000104

Cohen, J. (1977).Statistical power analysis for the behavioral sciences(Rev. ed.). New York, NY: Academic

Press.

Colquitt, J. A. (2001). On the dimensionality of organizational justice: A construct validation of a measure.

Journal of Applied Psychology,86(3), 386-400. doi:10.1037/0021-9010.86.3.386

Conway, J. M., & Huffcutt, A. I. (2003). A review and evaluation of exploratory factor analysis practices in

organizational research.Organizational Research Methods,6(2), 147-168. doi:10.1177/1094428103251541

Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika,16(3), 297-334.

DeVellis, R. F. (2003).Scale development: Theory and applications (2nd ed.). Thousand Oaks, CA: Sage.

Ellis, B. B. (1989). Differential item functioning: Implications for test translations. Journal of Applied

Psychology, 74(6), 912.

Ericsson, K. A., & Simon, H. A. (1980). Verbal reports as data. Psychological Review, 87(3), 215-251.

doi:10.1037/0033-295x.87.3.215Fellbaum, C. (1990). English verbs as a semantic net. International Journal of Lexicography, 3(4), 278-301.

doi:10.1093/ijl/3.4.278

Fields, D. L. (2002).Taking the measure of work: A guide to validated scales for organizational research and

diagnosis. Thousand Oaks, CA: Sage.

Ford, L. R., & Scandura, T. A. (2007, November).Item generation: A review of commonly used measures and rec-

ommendations for future practice. Paper presented at the Southern Management Association, Nashville, TN.

Garson, G. D. (2009).Statnotes: Topics in multivariate analysis. Retrieved fromhttp://faculty.chass.ncsu.edu/

garson/pa765/statnote.htm

Grice, H. P. (1975). Logic and conversation. In J. L. Morgan & P. Cole (Eds.), Syntax and semantics (Vol. 3).

New York, NY: Academic Press.Groves, R. M., Fowler, F. J., Couper, M. P., Lepkowski, J. M., Singer, E., & Tourangeau, R. (2004). Survey

methodology. Hoboken, NJ: John Wiley.


http://faculty.chass.ncsu.edu/garson/pa765/statnote.h

Documents

2014 It_s Not Me, It is You-Miscomprehention Linguistic in Survey