Does intercultural sensitivity cross cultures? Validity issues in porting instruments across languages and cultures

ARTICLE IN PRESS

International Journal of Intercultural Relations

29 (2005) 73–89

0147-1767/$ -

doi:10.1016/j

�Tel.: +1 6

E-mail ad

www.elsevier.com/locate/ijintrel

Does intercultural sensitivity cross cultures?Validity issues in porting instruments across

languages and cultures

Joe F. Greenholtza,b,�

aUBC-Ritsumeikan Academic Exchange Programme, Ritsumeikan-UBC House, Room 330,

6460 Agronomy Road, Vancouver, BC, Canada V6T 1W9bUBC-Ritsumeikan Academic Exchange Programme, Kyoto, Japan

Abstract

In 1986, Milton Bennett proposed his Developmental Model of Intercultural Sensitivity

(DMIS) which describes the development of a person’s attitude towards other cultures

through six linear stages (three ethnocentric (exclusive) stages: Denial, Defence, and

Minimisation—in which a person’s own culture is the measure of all things, and three

ethnorelative (inclusive) stages: Acceptance, Adaptation, and Integration—in which a person

understands and values other cultural points of view as equal to his or her own.

The DMIS has been presented as pan-cultural. The stages of the DMIS are measured by the

Intercultural Development Inventory (IDI). Version one of the IDI was a 60-item

questionnaire that has also been characterised as ‘culture proof’.

This study examines the validity of data obtained using a Japanese translation of version

one of the IDI (Bennett and Hammer, 1998) and, by extension, some assumptions surrounding

the cross-cultural transferability of the DMIS upon which it is based.

The protocol for translating the IDI is reported and issues related to translating and

adapting instruments are raised.

The results of the validation process raise questions about whether the concepts that

comprise the IDI, and by extension the DMIS, are transferable across languages and

cultures.

see front matter r 2005 Elsevier Ltd. All rights reserved.

.ijintrel.2005.04.010

04 822 8604; fax: +1 604 822 9515.

dress: [email protected].

www.elsevier.com/locate/ijintrel

ARTICLE IN PRESS

J.F. Greenholtz / International Journal of Intercultural Relations 29 (2005) 73–8974

These questions are discussed and their implications for practitioners using existing

instruments in a cross-cultural context, and in the context of validity issues, are elucidated.

r 2005 Elsevier Ltd. All rights reserved.

Keywords: Intercultural sensitivity; IDI; Validity; Translation; Cross-cultural; DMIS; Cross-linguistic

1. Introduction

This study was to comprise the first stage of a larger research project intended toexamine the effects of a year abroad on the intercultural sensitivity of a group ofJapanese undergraduates. However, the results obtained led me to temporarilycurtail the rest of the research because of serious issues that were raised regarding thecross-cultural transferability of the theoretical framework and the validity of theresearch instrument I had used, in translation.Bhawuk and Brislin (1992) have demonstrated that cross-cultural sensitivity is

crucial for successful interactions with people from cultures other than one’s own,ranging from job performance during international assignments to tourism,immigration, and refugee resettlement. They suggest that the following are keypredictors of success in intercultural contexts:

To be effective in another culture, people must be interested in other cultures, besensitive enough to notice cultural differences, and then also be willing to modifytheir behavior as an indication of respect for the people of other cultures. Areasonable term that summarizes these qualities of people is interculturalsensitivity (p. 416).

Bennett’s (1986) Developmental Model of Intercultural Sensitivity (DMIS)captures these elements and provides a theory-based model for progressive degreesof intercultural sophistication. The DMIS demonstrates that the more interculturallysensitive one becomes, the more one begins to enjoy cultural difference, the betterone becomes able to discriminate (in the denotative sense) cultural difference, andthe more capable one becomes of recognising and adopting cultural perspectivesother than one’s own.As a developmental model, the DMIS delineates the successive stages of

cultural understanding through which one must progress, each associatedwith particular attitudes and behaviours. The first three (of the six) stages aredescribed as ethnocentric; one’s own culture is experienced as central to reality. Whenother cultures are experienced as relative to one’s own they lack substance orsignificance.The latter three stages are described as ethnorelative and grow from the

realisation that one’s own culture represents only one of many equally valid worldviews.The stages of the DMIS are briefly sketched below.

ARTICLE IN PRESS

J.F. Greenholtz / International Journal of Intercultural Relations 29 (2005) 73–89 75

2. The stages of the DMIS

The ethnocentric stages are:Denial: The reality of other cultures is either not recognised at all, or is denied by

erecting psychological or physical barriers to contact.Defence: The existence of cultural difference is acknowledged, but with hostility,

causing other cultures, as one example of Defence, to be denigrated as inferior incomparison with one’s own.

Minimization: One’s own cultural values are seen as universal, with apparentcultural differences explained as cosmetic, surface variations.The ethnorelative stages are:Acceptance: Other cultures are accepted as complex, valid, alternative representa-

tions of reality.Adaptation: One becomes sufficiently comfortable with cultural difference to

adopt and shift in and out of alternative viewpoints.Integration: One’s experience of self expands to include the worldview of other

cultures.The DMIS as a narrative describes linear movement, from ‘left to right’, through

the ethnocentric stages to the ethnorelative ones, with each stage pre-requisite to thenext. Because the acquisition of a more complex worldview makes it impossible toretreat to a more simplistic, less developed view of culture, the narrative is alsobasically unidirectional.

3. The intercultural development inventory (IDI)

The IDI was developed by Bennett and Hammer (1998) to operationalise thetheoretical constructs of the DMIS.The IDI (version one) is a 60-item psychometric self-report instrument using a

seven-point Likert scale with responses ranging from one ‘strongly disagree’ throughseven ‘strongly agree’. Bennett and Hammer (1998), in reporting their IDIdevelopment and validation procedures, and in the workshops that a practitionermust attend in order to qualify to use the IDI, stressed the universal applicability ofthe instrument (i.e., that it was impervious to the possible distorting effects ofcultural and native-language differences). That made it seem to be an idealinstrument for the purposes of my research. However, that validation process useddata only from participants (all in the United States although not necessarily US-American), who were sufficiently proficient in English to understand the items in theinstrument. The problem with using the IDI with my subject pool was that the levelof English proficiency required to understand and complete it is fairly high. In orderto reduce or eliminate the possible confounding effects of language proficiency, I leda team that translated the IDI into Japanese. Given the relationship betweenlanguage and cultural schemata, the transferability of the concepts measured by theIDI to a culturally and linguistically very different group of subjects cannot be taken

ARTICLE IN PRESS


for granted, nor can inferences from its results. Therefore, it was necessary to‘re-establish’ validity for the translated version of the IDI before it could be used.The Standards for Educational and Psychological Testing (American Psycholo-

gical Association (APA), AERA, and National Council on Measurement inEducation (NCME), 1985) define validity as ‘‘the appropriateness, meaningfulness,and usefulness of the specific inferences made from test scores’’ (p. 9, italics added). Itis important to stress here that this means that an instrument cannot be validated,only the specific inferences from the data (scores) that it yields. It was important,then, to first establish the validity of the inferences that might be made from thescores from the Japanese translation of the IDI.I thought it was particularly important to test the assertion that the IDI

was ‘culture proof’ by privileging a translation that was faithful to the wordingand concepts of the original. All of the translators, however, commented onthe ‘foreignness’ to the Japanese mind of some of the concepts used in theinstrument. These items were flagged for attention in the statistical analyses (seeSection 4).

3.1. Procedures for translating the IDI

Four Japanese–English bilinguals (native Japanese speakers) were each asked totranslate the IDI into Japanese. Each had completed post-graduate work in theUnited States (one at the doctoral level). One of the translators, who had completedan MBA at an American university, was working in the United States for a largeJapanese multinational corporation. Two others, one of whom was also a certifiedIDI administrator, were English teachers working in Japan. The fourth hadcompleted a Ph.D from Columbia University in second-language acquisition, andwas also living in the United States.My qualifications for leading the translation team are as follows. I am a native

English speaker who has spent over half of my adult life in Japan. I’ve passed thehighest level of the Japan Foundation’s Japanese Language Proficiency Test(Nihongo Noryoku Shiken). I have done a large number of Japanese-to-Englishtranslations and published an English-to-Japanese translation of My Friend David(Edwards & Dawson, 1983) entitled Mai Frendo Deibido (Greenholtz & Morita,1988). I was certified as a Japanese–English court interpreter by the government ofOntario in 1991 and serve as a consecutive interpreter in a variety of contexts. I amalso a certified administrator of the IDI.I selected a final version from among the four independent translations and gave it

to two native-Japanese-speaking doctoral students at a Canadian university. Theyused the final version to rate students’ translations of the IDI, an exercise that forcedthem to consider the items much more deeply than simply comparing them to theEnglish originals, or performing a back translation would have because they had tojustify the score they assigned each of the students’ translations.Following the rating phase, I discussed each item with the two raters for fidelity to

the language and concepts of the original, within the constraints of natural Japanese.Several changes to the wording of items were suggested and many of those changes

ARTICLE IN PRESS


were incorporated into what became the Japanese-language version used in thepresent study.I believe that the method employed was superior to a translation/back translation

approach (the traditional ‘gold standard), conforming more closely as it does to thenew ‘gold standard’ described in Kristjansson, Desrochers, and Zumbo (2003) who(citing Behling & Law, 2000; Hambleton & Patsula, 1998) point out, ‘‘directtranslation and back translation can deal with literal meaning only’’, and ‘‘(B)acktranslation cannot detect differences in conceptual understanding of the question, andso cannot ensure psychological equivalence of the items in a scale or questionnaire’’(p. 135, italics added). Although my intention was to use as literal a translation aspossible, I also needed to remain aware of conceptual inconsistencies or difficultiesposed by the IDI because I was not interested only in producing a translation. I wasalso building an evidentiary validity argument with respect to the translation. Beingaware of the conceptual nuances would help me to make sense of the principalcomponents analysis.

3.2. Principal components analysis

Confident of the quality of the translation, the next phase in the validation processwas to determine whether the Japanese IDI mapped the same constructs underlyingthe English version; whether it was, language aside, the same instrument.To assess this, 400 IDIs were analysed using principal components analysis, first

with direct quartimin rotation (see Zumbo & Taylor, 1993) to determine whetherthere were any correlations among the factors (Table 1). Once I had confirmed thatthere were no significant correlations, varimax rotation was used to determinewhether the factor structure of the Japanese version matched that of the Englishoriginal, for this population. The results were further confirmed with a factoranalysis (maximum likelihood, varimax rotation).I used a hybrid exploratory/confirmatory approach for the principal components

analysis. A confirmatory factor analysis requires a larger subject pool (n ¼ 600, as arule of thumb, see Kelloway, 1998). The mathematics underlying the confirmatory

Table 1

Component correlation matrix

Component 1 2 3 4 5 6 7

1 1.000

2 .093 1.000

3 �.042 �.088 1.000 —

4 .187 .224 �.116 1.000

5 �.205 �.015 �.052 �.319 1.000

6 �.087 .039 �.058 �.087 .145 1.000

7 .167 �.010 �.077 .022 �.012 .011 1.000

Extraction method: principal component analysis.

Rotation method: oblimin with Kaiser normalization.

ARTICLE IN PRESS


and exploratory analyses are less at issue than the approach that the researcher takesin attempting to interpret the data because factor analysis is substantially aninterpretive art (see, for example, Fabrigar, Wegener, MacCallum, & Strahan, 1999).In this instance, the analysis was not entirely exploratory because the results of theanalyses of the original English IDI guided the selection of the general parametersfor this one. However, since I could not be certain that the Japanese version wouldbe identical to the original, the analysis was not wholly confirmatory either. Principalcomponents analyses were performed for four- through seven-factor solutions to findthe best fit. Knowing that there ought to be around six factors permitted me to selectinitial eigenvalues of greater than 1.5 (rather than the more traditional rule-of-thumbof 1.0), which greatly simplified the analysis.

4. Results

The four-, five-, six-, and seven-factor solutions were interpreted using the rotatedcomponent matrix, the scree plot, and nonrendundant residuals with absolute valuesgreater than 0.05, to determine which solution made the most sense both statisticallyand intuitively.The seven-factor solution proved to be the most satisfying.The first factor corresponded to the IDI’s Adaptation dimension, although it came

out as one factor rather than the two that the original IDI contains. However, whilethey could not be said to be separate factors, Cognitive Adaptation items clusteredtogether, as did the Behavioural Adaptation items. An argument could be easilymade for two subscales within a main factor; not very far removed from the IDI’sstructure. Overall, 14 of the 20 Adaptation items loaded on this factor.The second factor was clearly the IDI’s Defence dimension, with all ten items

loading cleanly and exclusively (factor loadings ranging from .754 to .499) on thisfactor.The third factor was slightly less clean. It contained seven of ten items from

the IDI’s Acceptance dimension, but the highest loading item on the factor (.649)was an orphaned Cognitive Adaptation item and there were also three itemsfrom the Denial dimension in the mix. Overall, the factor was dominated bythe Acceptance dimension, with loadings for Acceptance items ranging from .612to .380. Given that Acceptance and Adaptation are adjacent dimensions on thescale, the fact that an Adaptation item would load highly on the Acceptance factor isnot beyond comprehension. The presence of three items from Denial, a dimension onthe far side of the ethnocentric–ethnorelative divide, gave cause for concern,however.The fourth factor (following a transition band containing two orphaned

Adaptation items) contained five of the items from the Denial scale. This is onlyhalf of the Denial items, but they clearly dominated the factor with loadings rangingfrom .690 to .381.The fifth factor roughly corresponded to the Minimisation dimension of the IDI,

containing five of the ten Minimisation items.

ARTICLE IN PRESS

Table 2

Rotated factor matrix

1 2 3 4 5 6 7

24-(mo) Cognitive Adaptation-Bridge Builder

Item Factor

.673 .024 − .025 .257 − .041 .015 − .01518-Cognitive Adaptation-Bridge Builder .633 .030 .017 .085 − .062 .048 .03353-Cognitive Adaptation-Bridge Builder .606 − .104 − .019 − .140 − .089 − .048 − .02750-Behavioural Adaptation-Behavioural Shift .597 .044 .226 .134 − .012 .088 − .01660-Cognitive Adaptation-Bridge Builder .597 .061 .225 .020 .047 − .075 .08725-Cognitive Adaptation-Frame Shifting .589 .021 .185 − .006 .058 .011 .01552-Cognitive Adaptation-Frame Shifting .573 − .079 − .086 − .058 − .071 − .152 .03646-Cognitive Adaptation-Frame Shifting .547 − .066 − .025 − .023 .020 − .048 − .03137-Denial-Disinterest .536 − .006 .014 .286 .039 − .005 .02036-Behavioural Adaptation-Cultural Complexity .499 − .092 − .038 .085 .032 − .045 − .06226-Behavioural Adaptation-Cultural Complexity .462 .050 .083 .092 − .031 .035 .07613-Behavioural Adaptation-Behavioural Shift .442 .093 .069 − .075 − .051 − .024 .02335-Behavioural Adaptation-Cultural Complexity .440 .041 .103 .063 − .096 .008 .0493-Cognitive Adaptation- Frame Shifting .405 .019 .070 − .006 .091 .014 .0059-(nearly) Behavioural Adaptation-Cultural Complexity .385 − .093 .087 .247 − .105 .048 − .0707-Behavioural Adaptation -Behavioural Shift .365 .038 .057 .066 .131 − .108 .00258-Behaviou ral Adaptation -Behavioural Shift .331 − .041 .254 − .164 − .067 − .139 .04717-Acceptance-Enjoying Difference .322 .126 .212 .165 .024 .094 − .03654-Behavioural Adaptation -Behavioural Shift .268 − .054 .154 − .222 − .108 − .100 .03520-Defence-Denigration .015 .732 .135 − .005 − .045 .115 .01528-Defence-Denigration .056 .708 .174 .051 .063 .052 .00341-Defence-Denigration .054 .681 .237 .169 .035 − .069 .03316-Defence-Superiority − .004 .678 .125 .042 − .006 .100 .03655-Defence-Superiority − .058 .653 .214 .216 − .046 − .031 .03611-Defence-Superiority .010 .644 .042 .013 .077 .095 − .01539-Defence-Superiority − .023 .582 .228 .095 .196 .018 − .07310-Defence-Denigration − .034 .517 .123 .098 − .011 .087 .03544-Defence-Denigration .049 .514 .137 .377 .045 − .021 .05356-Defence-Superiority − .108 .458 .222 .035 .202 .060 − .03414-Denial-Avoidance Separation .030 .240 .153 .141 − .092 − .024 − .09831-Cognitive Adaptation-Multiple Perspective .057 .070 .566 .050 .009 − .068 − .03529-Acceptance-Enjoying Difference .019 .119 .564 .196 − .106 − .022 .10332-Acceptance-Describing Difference .025 .090 .524 − .052 .085 .047 .13257-Denial-Disinterest .117 .262 .509 .147 .017 .012 − .05827-Acceptance-Learning Difference .343 .112 .470 .190 − .045 − .014 − .03234-Minimisation-Superficial Differences .001 − .139 .460 − .143 .419 − .046 .06721-Acceptance-Value Relativity .069 .183 .448 .089 .008 .073 .04715-Denial-Disinterest .108 .233 .423 .209 − .066 .004 − .05419-Acceptance-Learning Difference .105 .099 .391 .071 − .068 .066 − .00433-Acceptance-Describing Difference .040 .171 .377 .077 − .050 − .020 − .0451-Denial-Disinterest .059 .167 .359 .143 − .027 − .047 − .16047-Acceptance-Enjoying Difference .073 .140 .323 .075 .028 .098 .05245-Cognitive Adaptation- Multiple Perspective .245 .028 .283 − .035 − .119 .055 .12842-Behavioural Adaptation -Behavioural Shift .201 .052 .224 − .030 − .160 .140 .05738-Denial-Avoidance Separation40-Denial-Avoidance Separation43-Denial-Avoidance Separation

.096 .222 .277 .668 − .008 − .019 .028

.151 .216 .264 .655 .041 .046 − .091

.081 .333 .199 .640 .054 .027 − .033


ARTICLE IN PRESS

1 2 3 4 5 6 7

49-Denial-Avoidance Separation .186

FactorItem

.241 .339 .566 − .068 .015 − .03030-Denial-Avoidance Separation .103 .298 .266 .364 − .065 .052 − .0076-Minimisation-Human Similarities − .111 .129 − .073 − .167 .126 .021 .11522-Minimisation-Human Similarities − .004 − .005 − .071 − .056 .670 − .010 .1734-Minimisation-Superficial Differences .023 .015 − .091 .045 .564 .025 − .00851-Minimisation-Human Similarities − .100 .187 .038 − .152 .498 .206 − .0538-Minimisation-Superficial Differences .032 .048 .033 .054 .477 .083 − .0465-Minimisation-Universal Values − .022 .033 − .068 .022 .429 .091 .09759-(sb) Minimisation-Unive rsal Values − .075 .112 .054 .057 .101 .834 − .04112-(sb) Minimisation-Unive rsal Values − .087 .149 .060 .023 .168 .695 .06523-(sb) Minimisation-Unive rsal Values − .052 .056 .040 .000 .194 .305 .1562-(ms) Acceptance-Value Relativity .034 − .019 .039 − .028 .124 .092 .85148-(ms) Acceptance-Value Relativity .123 .011 .038 − .054 .053 .024 .624

Table 2 (continued)


The last two factors did not fit the IDI model, but made sense in the context ofcross-cultural issues that emerged from the translation process.The sixth factor was a neat cluster of the three items on the Universal Values

subscale of the Minimisation dimension, based on the notion of all people being‘children of a spiritual being’ (see Section 5).The seventh factor contained two items that had been flagged from the

translation for two reasons. One was the omission of the modifier ‘some’ from theJapanese rendition. The second was that they were actually the same item.The original items are nearly identical expect that one asks whether it is appropriatefor cultures to have different conceptions of ‘right and wrong’ and theother, different conceptions of ‘good and bad’. I had not noticed that the translatorshad used the same term in both instances (and neither had they) until my researchassistant (RA) pointed it out when we were examining why her scores for thoseitems differed between the English and Japanese versions. The difference (inher case) was actually due to the omission of the word ‘some’ from the translations.She also had not noticed that the same Japanese translation was used for bothitems, but that discussion brought to the light the fact that the items were identical(Table 2).

5. Discussion

This research was not initially intended to be about validity issues orthe examination of validity in a cross-cultural context. When I learned aboutthe IDI, I had been excited to find out that a ‘reliable and valid’ instrument existedto operationalise the DMIS, and was keen to use the IDI as a tool for gainingsome empirical insights into the effectiveness of the academic exchange programmeI administer. Like most practitioners, I was mostly unaware that there might

ARTICLE IN PRESS


be reasons to be cautious about using it in my particular research context. However,the English-language proficiency of my subject pool was not sufficiently high to usethe IDI in its original form and that necessitated a translation into Japanese.Although I was still not very conversant in validity issues I knew enough about therelationship between language and culture to suspect that changing the language ofthe instrument might affect its reliability and validity. Still, I was hoping to quickly‘validate’ the Japanese version of the IDI in order to use the instrument in the firststage of my research, not yet knowing that validity was a property of inferencesdrawn from data, not of the instrument itself. Further investigation into the field ofvalidity made it quickly clear that differences in language and culture posedpotentially critical validity issues.This point is particularly important for other practitioners in academic exchanges

or those involved in education in cross-cultural contexts who may not be conversantwith validity theory. A case in point is a recent study by Westrick (2004) who usedthe IDI in a quantitative analysis of the effect of service learning on the interculturalsensitivity of high school students at an international school in Hong Kong.Westrick’s (2004) rationale for using the IDI is that:

Psychometric analysis shows that ‘the IDI is a highly reliable measure which haslittle or no social desirability bias and also reasonably approximates thedevelopmental model of intercultural sensitivity (Bennett, 1986, 1993) uponwhich it is based’ (Paige, Jacobs-Cassuto, Yershova, & DeJaeghere, 2003). Now

that there is a reliable instrument to measure these theoretical stages (Hammer etal., 2002) there is a potential to gain insights into international school studentsand the different programs and strategies that can increase their interculturalsensitivity (italics added) (p. 282).

This rationale, which represents my own starting point in embarking on thisresearch, now raises serious alarm bells for me. First, it mentions only the reliabilityof the instrument, and not its validity. It accepts without question that theinstrument would be suitable for use in the context of high school students, from avariety of cultural backgrounds, where ‘‘over a third, 38.7 percent of the students inthe sample claim nationalities in an Asian country’’ (Westrick, 2004, p. 286).In her conclusion, Westrick (2004) goes on to state that, ‘‘(A)s a statistically

valid and reliable instrument, the IDI has been shown to be a valuable tool forevaluating students’ level of intercultural sensitivity’’ (p. 296). However, she makesno mention of having considered potential validity issues in any way. This isdisturbing for two reasons. The first is the uncritical perpetuation of the notion thatan instrument can be validated. The second involves an aspect of validity theory thatlies beyond what have traditionally been regarded as the elements of constructvalidity. Messick (1998) calls it consequential validity, ‘‘the relation between theevidential and consequential bases of validity’’ (p. 38) (about which more will be saidbelow).The issue of consequential validity is important here in two ways. One way to

consider it is in terms of the IDI’s authors’ assurances that their instrument is reliableand valid. These assurances, from two very respected figures in the field of

ARTICLE IN PRESS


cross-cultural research, are what prompted me (and Westrick, I suspect, since, asmentioned earlier, a researcher can use the IDI only after attending a qualifyingworkshop given by the authors) to reach for the IDI as the empirical solution to myresearch problem.The second way to look at consequential validity is in terms of the user of

the instrument. Although the decision to use the IDI may have originated inthe misconception that the ‘instrument is valid’, Hambleton and Patsula (1998)remind us that, ‘‘(A) researcher using an adapted test still has the responsibility ofproducing evidence of validity in the context where that adapted test is used’’because ‘‘researchers risk imposing conclusions based on concepts which exist intheir own cultures but which are foreign, or at least partially incorrect, when used inanother culture’’ (p. 156). Hambleton and Patsula refer to an adapted test, but in thecase at hand, while it is the subjects who are ‘adapted’ (i.e., different from thepopulation upon which the original data inferences were validated), the sameprinciple applies.The only validation studies to date (Bennett & Hammer, 1998; Paige et al., 2003)

have used data and inferences from respondents in the United States, using theoriginal English version. This is not a shortcoming in and of itself, were it not for theissue of consequential validity and practitioners’ and researchers’ propensity tobelieve that an instrument can be validated if rigorous statistical protocols arefollowed.Messick (1995) laid out the validity issues that shaped this research as

follows:

Validity is not a property of a test or assessment as such, but rather of themeaning of the test scores. These scores are a function not only of the items orstimulus conditions, but also of the persons responding as well as the context ofthe assessment. In particular, what needs to be valid is the meaning orinterpretation of the scores; as well as any implications for action that thismeaning entails (Cronbach, 1971). The extent to which scores’ meaning andaction implications hold across persons or population groups and across settingsor contexts is a persistent and perennial empirical question. This is the mainreason that validity is an evolving property and validation a continuing process(p. 741).

The results of this study confirm that validity is in part dependent on thepersons responding to the instrument, that conclusions based on concepts from oneculture may be at least partially incorrect when used in another culture, and thatvalidation is an ongoing process. The inferences that one must draw from analyses ofdata from the Japanese-language version of the IDI administered to Japanesesubjects differ significantly from those drawn with the original IDI and English-speaking subjects in the United States. The discussion surrounding information fromnative-Japanese-speaking informants, in the translation process and elsewhere, willhighlight that.

ARTICLE IN PRESS


6. Content validity

Content validity in the original IDI was established following the guidelines fortest construction set out in DeVellis (1991, as cited in Hammer, Bennett, & Wiseman,2003). It was, to all appearances, a thorough and rigorous process. While I am notcriticising the procedures followed, that very rigour, set in a context of binarydecisions regarding validity (i.e., that an instrument is or is not valid) might haveencouraged certain conclusions to be drawn that appear to have been unwarranted.Ignoring the validity issues inherent in language and culture adaptation, the IDI’s

authors clearly intended the instrument to be interculturally transferable. They(Hammer et al., 2003):

were initially concerned that the empirical observations upon which the DMISwas based could be re-created in systematic ways. This concern was addressed byexamining discourse of people from a variety of cultures in order to determine ifobservers could reliably categorize the discourse in ways identified in the DMIStheoretical framework (p. 7).

Care was taken to have a broad cultural and experiential base. Hammer et al.(2003) report that:

While the pilot interviews were conducted with individual students from a varietyof cultures, it was decided that the actual sample of interviewees would consist ofpeople of varied cultural backgrounds who also extended beyond the universitycommunity. Therefore, the interview sample was selected from residents fromsuch places as the International House in Washington, DC (where professionalsfrom many different countries reside) as well as various places of employment inand around the Washington, DC area (p. 7).

Forty people as described above were interviewed regarding their experience withand attitudes toward other cultures. Of the 40, there were 12 US Americans ofEuropean origin, three more US Americans of South-Asian origin (bringing thenumber of US Americans to 15 of the 40). There were also three people from each ofBritain, Japan and France, two from each of Switzerland, Korea, Ireland, andRussia, and one person from each of China, Denmark, Spain, France, Germany,Estonia, India, Turkey, Ecuador, Guyana, and Ivory Coast.There are a number of ways to interpret these data besides the conclusion drawn

by Hammer et al., that they had a sufficiently culturally diverse mix to assure thecross-cultural robustness of the IDI. Most importantly in my view, all 40 subjectsspoke English sufficiently well to participate in a lengthy interview of someconceptual sophistication; a discussion of their experience of cultural difference. Hadthe non-native speakers been interviewed in their native languages and quotes fromthe translations of those transcripts been used in the analysis, a different set of itemsmight well have emerged, if not a DMIS of a different texture.This conclusion is not merely speculative. Yamamoto (1998) used the IDI

interview protocol to interview Japanese university students studying in the UnitedStates, in Japanese. First, she analysed the interview data against the DMIS stages.

ARTICLE IN PRESS


Then, she classified the data into the seven categories that she found emergingnaturally from them: ‘‘Attention to Physical Difference; Physical Admiration forCaucasians; Attention to Physical Similarity; Attention to Own Frame of Reference;Naturalness of Difference; Inevitability of Difference; and Suspension of Judge-ment’’ (p. 77). She states that ‘‘(T)hese emergent categories are closely related toJapanese cultural values and perceptions of reality’’ (p. 77). Going on to compare theDMIS to her emergent categories, she found that IDI:

statements such as ‘I appreciate (enjoy or respect) cultural differences’ were hardlyexpressed by the Japanese studentsyAlso, the students paid much attention tothe differences and similarities in physical appearance, which they associated withthe degree of discomfort or comfort (p. 77).

Yamamoto (1998) concluded that:

These results suggest that the definitions of each stage may need somemodification in order to understand intercultural sensitivity in the Japanesecontext. It might be possible to say that what Japanese perceive as differences/similarities or how they deal with difference/similarities are different from or notincluded in the stages of the model. These aspects need to be consideredand added to the model in order to modify it to apply in the Japanese context(pp. 77–78).

Returning to the subject pool, another way to look at it is to note that its membersare overwhelmingly from Judeo-Christian backgrounds, assuming that the 28subjects of European descent were members of the mainstream. Socialisation into aparticular religio-cultural worldview has obvious implications in the context oftolerating, appreciating, or enjoying difference.For the next step in the instrument-building process, culling utterances to include

in the IDI, four members of the research team each reviewed 25 randomly selectedtranscripts from among the original 40. Hammer et al. (2003) report that they ‘‘ratedthe DMIS orientations the interviewees’ (sic) most consistently expressed during theinterview’’ (p. 8). These resulted in an item pool of 239 IDI sample items for furtherpilot testing.Although adequate inter-rater reliabilities were obtained in this process, ranging

from .66 (fair) to .86 (excellent), we do not know how many utterances survived fromnon-US Americans, for example, or non-native speakers of English. It could verywell be that the utterances that consistently reflected the DMIS orientations duringthe interview came from particular cultural or linguistic subsets of subjects; a threatto validity that Messick (1995) calls ‘‘construct underrepresentation’’ (p. 742).Another potential shortcoming of the content-validation process was that

although Hammer et al. (2003) tell us that the pool of 239 quotes resulting fromthe process outlined above was twice tested in a pilot version of the IDI, ‘‘with aculturally diverse group of people’’ (p. 9) the pilot testing was looking for ‘‘clarity ofinstructions, item clarity, response option applicability, and overall amount of timetaken to complete the instrument’’ (p. 9), but not conceptual transferability.

ARTICLE IN PRESS


The surviving items (assuming that some of the items failed the test of item clarityand response-option applicability) were then given to another group of DMISexperts who further distilled them into the 60 items that comprise version one of theIDI. This methodology is not without redeeming features. It has the twin merits ofemploying items that were not generated by experts sitting down to write whatseemed to be reasonable items that could then be field tested, and a statisticallyrigorous process involving agreement by experts on the strength of the relationshipof the items to the IDI. However, it lacks rigour in its failure to confirm the cross-cultural robustness of the items.While the authors claim that the use of a culturally-mixed pool of inter-

viewees demonstrates pan-cultural content validity, in addition to simple contentvalidity, I believe that the arguments raised in this discussion are sufficient tobring that into question. Further research with non-Indo-European languageversions and with subjects from other than Judeo-Christian backgrounds is certainlywarranted before the pan-cultural applicability of the IDI can be asserted with anyconfidence.

7. Construct validity

The principal components analysis yielded important insights both into theIDI and also into the validation process itself. The number of subjects in thisstudy was larger than the group used by Paige et al. (2003) in their analysisðn ¼ 330Þ and also larger than the original IDI validation study ðn ¼ 226Þ.It is noteworthy that 16 of the 60 items, 26.7% of the instrument, did notmap onto their predicted IDI stages. That indicates a need for a deeper analysis ofthose items, but of a culturally and qualitatively different kind than that whichcomprised the validation process for the original or the replication by Paige et al.(2003).While the fact that some of the items do not perfectly map onto their intended IDI

stages is not a novel finding, it is interesting to follow how the lack of fit has beenvariously interpreted.Consider the Minimization stage of the IDI as example. In the Japanese

translation, the three Transcendent Universalism items clustered as a factor untothemselves (see Section 4). They had been flagged during the translation process asbeing potentially problematic on a conceptual level and that prediction was borneout in the principal components analysis. We can trace similar results through theoriginal validation process (Bennett & Hammer, 1998). Bennett and Hammer’soriginal factor analysis:

resulted in a ten-item scale with reasonably high corrected item–total correlationand a reliability estimate of .87 ðn ¼ 212Þ. These items reflect a mixture of both‘‘Physical Universalism’’ and ‘‘Transcendent Universalism’’ within the Minimiza-tion stage description of the DMIS and therefore were labelled withMINIMIZATION (p. 67).

ARTICLE IN PRESS


So, in the original analysis of the factor structure, the items in Minimization weredeemed to be sufficiently correlated and coherent to form a single dimension withtwo subscales.When Paige et al. (2003) performed their analysis, their conclusion was also

that while:

y the analyses of the internal structure of the IDI have shown it to be areasonable approximation of the theoretical model of intercultural developmenty Minimization items split along the theoretical lines of physical andtranscendental universalism (Factors 3 and 4) (p. 483).

They went on to say that, ‘‘(T)he Minimization split is interesting because itfollows exactly the form structure of the DMIS’’ (p. 484) and speculated that theirsubjects had responded to the Transcendence items the way they had because ‘‘it isvery likely that many individuals of the age groups represented in our sample havenot given any serious thought to their position on the issues of spirituality andreligious beliefs’’ (p. 484).On the surface, the results from the principal components analysis of the Japanese

IDI and those obtained by Bennett and Hammer (1998) and Paige et al. (2003) are inagreement. All three show a split structure for Minimization with TranscendentUniversalism clustering separately from Physical Universalism.However, my Japanese informants (the translators, my RA, and the raters who

helped me to refine the items), unanimously noted that the TranscendentUniversalism items were conceptually nonsensical in Japanese culture, despite thefact that the words could be translated. As one of the translators put it in privatee-mail correspondence ‘‘I don’t understand this whole sentence . . . M [anothertranslator] avoided translating all ‘the children of a spiritual being’ [items]. I don’tknow what that is either. It’s possible to put Japanese words together as a translationwithout understanding what that means. demo hen’’ [but it’s weird—author’stranslation].Yamamoto and Tanno (2002) cite a similar difficulty. They say, (in translation

from the original Japanese by this author) that:

I made an effort to be as faithful to the original wording of the IDI as possible,but some expressions came out as difficult to understand in translation. However,it wasn’t a simple translation problem. It is possible that some of the items weredifficult to understand culturally. For example, in the Universal Values subscaleof the Minimization stage items 12, 23 and 59 are phrased as ‘At our root is asupernatural holy being (choshizende shinseina sonzai), but the English original for‘supernatural holy being’ was ‘spiritual being’. I thought about translating it [in asyllabic katakana rendition] as ‘supirichuaru bi’ingu’, but trying to be faithful tothe original, I checked with Bennett and as a result I used ‘supernatural holybeing’. However, some doubt remains as to whether the value held by those whobelieve we are united under a supernatural being is an appropriate expression ofwhat the Minimization stage of intercultural sensitivity means to Japanese (p. 40).

ARTICLE IN PRESS


Although the most glaring example are the items that comprise the TranscendentUniversalism subscale of Minimization, clearly products of a Judeo-Christianmainstream culture, this is not a quibble over three items out of 60. It is fundamentalto the way the instrument was constructed; out of quotes from actual interviews, notfabricated items.Bennett and Hammer (1998) and Paige et al. (2003) looked at the data through

their own cultural and linguistic lenses and concluded that it was an accuratereflection of the DMIS. If that conclusion were limited to the English version of theIDI, administered to highly proficient speakers of English, there would be noproblem. However, I believe Bennett and Hammer were being premature at best inassuring people who attend IDI workshops that the IDI is valid with any culturalgroup (and, by implication in any language version), because they had used aculturally and linguistically diverse pool of subjects in the item-building phase. Fromthe results I obtained and the conclusions drawn by Yamamoto (1998) it is clear thathad the original interviews been conducted in Japanese, for example, the itemsappearing on the IDI would have been at least somewhat, perhaps substantially,different.Other issues came to light during consultations on the translation. One

recurring issue was the IDI’s references to ‘being a member’ of one’s own cultureas juxtaposed with people from other cultures. Although some of the phrasessound (to my ear) a bit laboured even in English, e.g., ‘Although I feel I am amember of my own culture y’ they are otherwise unremarkable in English. To theJapanese, references to one’s own and other people’s cultures have an unnaturalclang. As one translator put it, ‘‘I don’t think Japanese people distinguish people byculture, but by country or nationality’’ (personal correspondence). In the Japaneseworldview there are two types of people in the world, nihonjin (Japanese) and gaijin

or gaikokujin (literally ‘outside’ people or ‘foreign-country’ people), i.e., non-Japanese.It might be this proclivity for dichotomising the world that led three of the

Denial–Disinterest subscale items and one Minimization–Superficial Differencesitem to fall into factor four, which was primarily an Acceptance (seven out of 12items) factor, in the principal components analysis. There are four Acceptancesubscales; Value Relativity, and Enjoying Difference, Describing Difference, andLearning Difference. It seems that for Japanese respondents, there might have been a‘Difference’ factor rather than an Acceptance factor. This notion is reinforced byYamamoto’s (1998) list of dimensions that spontaneously emerged from herinterviews in Japanese with Japanese subjects. In her analysis, three of the sevendimensions, Attention to Physical Difference, Naturalness of Difference, andInevitability of Difference cited difference specifically, with another, Attention toPhysical Similarity, invoking difference as its opposite.This is particularly contentious for construct validity with this population because,

as noted above, Denial and Minimization lie on the opposite side of theethnocentric–ethnorelative divide from Acceptance and so, unlike items fromadjacent dimensions crossing factors, are theoretically incompatible within thesame factor.

ARTICLE IN PRESS


Had I relied entirely on formulaic validation protocols consisting of inter-raterreliability statistics and the numerical rules of thumb for factor analyses I might havedrawn the same conclusions as previous validation studies; particularly if I had beenworking from the premise that the instrument, rather than the inferences made withit, could be validated. However, interpreting the results of the principal componentsanalysis through the lens of information from the translation process and Japanesecultural informants made that impossible.

8. Conclusion

The validity issues raised in this paper should have real resonance forpractitioners, and even for academics in the field who reach for an instrumentbecause they believe ‘it is valid’.It is the first study to examine the validity of inferences made with the IDI in

translation, with a culturally different population. In doing so it raised strong doubtsabout the cross-cultural transferability of version one of the IDI and raised somequestions about the DMIS as a model for understanding worldviews with respect todifference, in cultures other than US American. By introducing a culturally sensitivealternative analysis to what appeared to be similar data sets, the study highlightedthe importance of current thinking in validity theory that instruments cannot bevalidated, only specific inferences made from data. Issues related to consequentialvalidity in the use of an instrument or a theoretical framework in contexts other thanthe ones for which they were specifically developed were also raised.These results indicate that the IDI should still be considered to be a work in

progress, at least in a cross-linguistic environment, rather than a ‘reliable and validinstrument’ ready to pull of the shelf for all research contexts. A lot of room remainsfor further research in non-US American cultures, using subject utterances inlanguages other than English. There are also obvious implications, by extension, forexploring whether the DMIS actually taps a universal ‘deep cognitive structure’ ofthe development of intercultural sensitivity or whether it, too, is culture bound.

Acknowledgements

The author would like to gratefully acknowledge Drs. Milton Bennett andMitchell Hammer who permitted him to translate the IDI and use it in his researchand Dr. Bruno Zumbo, Professor in the Department of Educational Psychology andSpecial Education at the University of British Columbia, for his generous assistancewith the statistical analysis.

References

American Psychological Association (APA), AERA, & National Council on Measurement in Education

(NCME). (1985). Standards for educational and psychological testing. Washington, DC: American

Psychological Association.

ARTICLE IN PRESS


Bennett, M. (1986). Towards ethnorelativism: A developmental model of intercultural sensitivity. In M.

Paige (Ed.), Education for the intercultural experience, (2nd ed.). Yarmouth, ME: Intercultural Press.

Bennett, M., & Hammer, M. (1998). The intercultural development inventory (IDI) manual. Portland, OR:

Intercultural Communication Institute.

Behling, O., & Law, K. S. (2000). Translating questionnaires and other research instruments: Problems and

solutions. Thousand Oaks, CA: Sage.

Bhawuk, D. P. S., & Brislin, R. (1992). The measurement of cultural sensitivity using the concepts of

individualism and collectivism. International Journal of Intercultural Relations, 16, 413–436.

DeVellis, R. F. (1991). Scale development. Thousand Oaks, CA: Sage.

Edwards, J., & Dawson, D. (1983). My friend David. A sourcebook about Down’s Syndrome and a personal

story about friendship. Portland, OR: Ednick Communications.

Fabrigar, L. R., Wegener, D. T., MacCallum, R. T., & Strahan, E. J. (1999). Evaluating the use of

exploratory factor analysis in psychological research. Psychological Methods, 4(3), 272–299.

Greenholtz, J., & Morita, Y. (1988). Mai Frendo Deibido. Kyoto, Japan: Dohosha Publishers.

Hambleton, R. K., & Patsula, L. (1998). Adapting tests for use in multiple languages and cultures. Social

Indicators Research, 45, 153–171.

Hammer, M. R., Bennett, M. J., & Wiseman, R. (2003). Measuring intercultural sensitivity: The

Intercultural Development Inventory. In R. M. Paige (Guest Ed.). International Journal of Intercultural

Relations, 27(4), 421–443.

Kelloway, E. K. (1998). Using LISREL for structural equation modelling: A researcher’s guide. Thousand

Oaks, CA: Sage.

Kristjansson, E. E., Desrochers, A., & Zumbo, B. D. (2003). Translating and adapting measurement

instruments for cross-linguistic and cross-cultural research: A guide for practitioners. Canadian Journal

of Nursing Research, 35(2), 127–142.

Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons’ responses

and performances as scientific enquiry into score meaning. American Psychologist, 50, 741–749.

Messick, S. (1998). Test validity: A matter of consequence. Social Indicators Research, 45, 35–44.

Paige, R. M., Jacobs-Cassuto, M., Yershova, Y., & DeJaeghere, J. (2003). Assessing intercultural

sensitivity: An empirical analysis of the Intercultural Development Inventory. In R. M. Paige

(Guest Ed.). International Journal of Intercultural Relations, 27(4), 467–486.

Westrick, J. (2004). The influence of service-learning on intercultural sensitivity: A quantitative study.

Journal of Research in International Education, 3(3), 277–299.

Yamamoto, S. (1998). Ibunkasenshitivity moderu wo nihonhin ni taioh suruniattate – saiteigi no hitsuyousei ni

tsuite- Applying the developmental model of intercultural sensitivity in Japanese context. Journal of

Intercultural Communication, 2, 77–100.

Yamamoto, S., & Tanno, D. (2002). Ibunkakanjuseihattatsushakudo (The Intercultural Development

Inventory) no nihonjin ni taisuru tekiyohsei no kentoh: Nihongo bansakusei wo shiya ni irete

(Assessing the applicability of the Intercultural Development Inventory to Japanese: Bringing in the

viewpoint from the creation of a Japanese version). Journal of the Aomori National University, 7(2),

24–42.

Zumbo, B. D., & Taylor, S. V. (1993). The construct validity of the extroversion subscales of the

Myers–Briggs type indicator. Canadian Journal of Behavioural Science, 25(4), 590–604.

Documents

Does intercultural sensitivity cross cultures? Validity issues in porting instruments across languages and cultures