Languagecommunities-Laitin

What Is a Language Community?Author(s): David D. LaitinSource: American Journal of Political Science, Vol. 44, No. 1 (Jan., 2000), pp. 142-155Published by: Midwest Political Science AssociationStable URL: http://www.jstor.org/stable/2669300 .Accessed: 20/10/2014 15:54

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

.

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

.

Midwest Political Science Association is collaborating with JSTOR to digitize, preserve and extend access toAmerican Journal of Political Science.

http://www.jstor.org

This content downloaded from 189.245.238.211 on Mon, 20 Oct 2014 15:54:15 PMAll use subject to JSTOR Terms and Conditions

What Is a Language Community? David D. Laitin Stanford University

Theories of nationalism, democracy, regional assertiveness, and civil war have relied on vague and unspecified notions of linguistic heterogeneity, based upon estimates of the "mother tongues" of a population. Under conditions of mother-tongue diversity, the criterion for a language community that requires structural proximity of languages is equally problematical, and here linguistics gives us little theoretical guidance or empirical data. Another criterion of language diversity measures the probability that two random people in a country will share a language. Empirical problems (on getting standards of "knowing" a language) and theoretical ones (concerning whether the ability to communicate is sufficient for an indicator of cultural homogeneity) beset these measures. In light of these problems, the paper specifies several new measures that might be used for coding a language community. Data collected from six post-Soviet republics illustrate the potential usefulness of these measures.

thnic heterogeneity is often portrayed as a powerful source of democratic instability, regional assertiveness, and civil war. In his classic essay on primordial conflict, Geertz (1973) sees it as a source of

chronic tension in the postcolonial states after World War II. Dahl (1971) sees it as a serious constraint to the success of democracy. Rabushka and Shepsle (1972) model ethnic heterogeneity such that it leads in equilibrium to the breakdown of democratic regimes. Connor (1994) equates ethnic heterogeneity with higher probability for civil war. But not all studies link heterogeneity with unhappy outcomes. Lijphart (1977) for one showed the possibility for democracy (of the nonmajoritarian sort) under conditions of cultural pluralism.

The debate on the influence of ethnic heterogeneity on political outcomes has remained vibrant for nearly forty years. Yet, a major problem in testing such claims is lack of consensus on how to measure either the degree of ethnic heterogeneity within a polity or the degree of difference between any two ethnic groups in that polity. It may seem obvious that the Hirschman-Herfindahl concentration index used for political parties would be ideal for an index of ethnic dispersion-and thus all you need to know is the percentage of each ethnic group of an entire population. How- ever, members of ethnic groups are not out there to be counted as are votes for political parties. In fact, the clear identification of ethnic groups as enti- ties is often the result of their mobilization due to political instability and regional conflicts of interest. But if ethnic mobilization becomes the criterion for ethnic groupness, there is a problem, as the value of the independent variable becomes dependent on the value of the dependent variable. The methodological task is to develop valid indicators of ethnic heterogeneity that are independent of our observations on the dependent variables whose values are being explained.

One way out of this problem is to use language as a proxy for ethnicity. The percentage of the population whose historic language is Welsh, or Catalan, or Luo therefore substitutes for a more subjective accounting of the percentage of Welsh, Catalans, or Luos in Britain, Spain, or Kenya. Use of language data in this way allows us both to measure the degree of group

David D. Laitin is Professor of Political Science, Stanford University, Stanford, CA 94305-2044 ([email protected]).

This paper was originally presented at the 1998 Annual Meeting of the American Polit- ical Science Association, September 3-6, 1998. Richard Anderson, Robert Bates, Nathaniel Beck, Amitai Etzioni, James Fearon, Pieter van Houten, and Abram de Swaan provided critical comments on earlier versions of this paper.

American Journal of Political Science, Vol. 44, No. 1, January 2000, Pp. 142-155

?2000 by the Midwest Political Science Association

142


WHAT IS A LANGUAGE COMMUNITY? 143

dispersion within a polity, but also measure (by means of the structural difference between languages) the degree of cultural difference between ethnic groups. To be sure, ethnic differentiation in numerous cases around the world is based on criteria other than language. Nonethe- less language has the advantage that it can be measured independently of the dependent variables that concern political scientists interested in ethnic issues, and coding on language can allow for useful statistical tests of received theory.

This solution, even ignoring the ethnic divisions that are not based on language, is not entirely satisfactory. For one, language heterogeneity is subject to many of the same coding problems as ethnic heterogeneity. For example, in Gurr's "Minorities at Risk" database, linguistic difference (coded as CULDFX2) has a positive and statistically significant relationship to levels of ethnic rebellion over a variety of specifications. In Laitin (1999), I found the coding on Gurr's measure of linguistic heterogeneity to be inconsistent across cases. Using measures derived from Ethnologue (Grimes 1996), I respecified Gurr's models and found that language differences between a minority and the majority group of a polity correlated significantly with lower levels of group rebellion. When a different measure of a variable changes the sign of a relationship, it is time to put some effort into better specify- ing that variable.'

To take another example, La Porta et al. (1998), relying on a myriad of explanatory variables, find "ethnolinguistic fragmentation" to be a powerful predictor of bad government, which subsequently lowers per capital income. Their data base is from sources that give gross estimates of the relative population of linguistically based groups, something which is quite distinct from the actual speech patterns or language facility of individuals. The authors do not consider the problem that under conditions of "bad" government, individuals are more likely to emphasize to census takers their linguistic differences from the ruling group. If this is so, La Porta et al.'s measure of fragmentation is endogenous to the quality

of government, their dependent variable. Without independent data on individual language repertoires, it is quite tricky to establish causal relationships between fragmentation and quality of rule.

A second problem with using language data as a proxy for ethnic heterogeneity is that it assumes that people map one-to-one onto ethnic or linguistic groups. But people have multiple ethnic heritages, and they can call upon different elements of those heritages at different times. Similarly, many people throughout the world have complex language repertoires, and can communicate quite effectively across a range of apparently diverse cultural zones. In light of this, it would be useful to have data on the degree to which people within a polity can effectively communicate, as a proxy for a common national community. Karl Deutsch (1954) long recognized the need for data on the emergence of such language communities, but despite a career devoted to the collection of such data, he never specified his notion of a language community in a quantifiable way.

In the field of linguistics, there was a parallel attempt to delineate statistically a language community (Green- berg 1956). Greenberg's several specifications were virtually ignored by his disciplinary colleagues. His measures received some empirical attention in sociology (Lieberson 1981), but outside the reliance on some version of the Hirschman-Herfindahl index, which is similar to the first of Greenberg's indices, they have been ignored in political science. In this working paper, I propose a set of possible indicators to measure language diversity, language difference, and language community, starting out with Green- berg's classic measures of diversity, and relying on Lieber- son's development of them. Because states, international organizations, and multinational businesses have shown little interest in collecting data on people's language repertoires-with very little value added by social science- it would be a gargantuan project collecting data on some of the measures proposed herein.2 Therefore, my hope is that if one or two of the measures proposed here turns out in preliminary analysis to have explanatory power, there would develop an institutional interest in systematic data collection for those measures.

But before explicating these measures, it is important to ask why Greenberg's program was never carried out in

1 This will serve as my replication footnote. The Gurr "Minorities at Risk" database can be accessed at http://www.bsos.umd.edu/ cidcm/mar/. The coded Ethnologue data are in the process of being archived, but the preliminary codings used in this paper are available from the author. The surveys from the former Soviet Union, used here for illustrative purposes, were administered by Jerry Hough and me, in an NSF-supported project "Nationality and Politics: The Dismemberment of the Soviet Union" POLS/ SES92125768, hereafter "the Laitin/Hough surveys." This paper's Appendix provides details on the surveys. The data from the surveys can be accessed at ftp://ftp.spc.uchicago.edu/data/laitin. For the census data I relied on Results of the 1989 USSR Population Census CD-ROM format, East View Publications.

2 It has been only a headache for states to collect information on language repertoires through the census. In Belgium, for example, the linguistic aspect of the population census, due to political ten- sions, was abolished in 1961. No wonder the editors of a volume on bilingualism in Brussels bemoaned the difficulty of engaging in research on urban bilingualism (Baetens Beardsmore and Witte 1987, 3) and provided no solid data on the actual level of contem- porary bilingualism. On the problems of relying on census data in general for understanding bilingualism, see Lieberson (1969).


144 DAVID D. LAITIN

the field of linguistics. Perhaps linguists know something about language that led them to ignore attempts to measure statistically such social facts as linguistic diversity of populations. To be sure, there are immense conceptual problems in counting people's language repertoires. Basic terms such as "mother tongue" have been abandoned by linguists, as they don't measure the same phenomenon across contexts. Many people who identify with ethnic groups whose ancestors spoke a language that is currently tipping toward extinction (Dorian 1981) will call that language their "mother tongue" even if they cannot speak it at all. Even the term "language" (as differentiated from "dialect") has no clear reference. Ethnic groups seeking political autonomy from a political center have in some cases ceased thinking of their language as a dialect of the dominant state language and have begun thinking of it as a separate language. Linguists, who understand that language and dialects have no sharp boundaries, have no technical criteria for evaluating such a claim.3 If we rely on claims of cultural entrepreneurs, political conditions (the emergence of a separatist movement which induced its followers to report what had been thought of as a dialect as a separate language) can change a country's score on linguistic diversity without any change in the language repertoires of the population. Even if we had excellent criteria for identifying languages, there would still be problems. People are very bad reporters of their own language repertoires-some lie (especially to political authorities) about their competency in certain languages; others are simply unaware of the languages (or speech forms) they use in different contexts. An even greater problem is that people might well underreport their facility in a high status language (e.g., Malays on Chinese) while they might overreport their facility in a lower status language (e.g., Kikuyus on Swahili). Therefore it is difficult to get valid indicators of competency across languages in a set of repertoires. Given what linguists have taught us about language, and the self-reporting about facility in particular languages, is it not foolhardy to collect high-n samples of language repertoires? Haven't linguists taught us to abjure such an effort?

The intractability of getting high-n data on language repertoires has not, however, been the sole reason the linguistics discipline has abandoned the effort. In the 1960s, when vast sums of money were spent in collecting cross-

national data throughout the social sciences, the linguistics discipline in the U. S. was in the midst of a scientific revolution led by Noam Chomsky (1957). The central fo- cus of linguistics moved to the concept of a universal language reflecting a unique aspect of the human brain. The study of particular languages, or the repertoires of individuals speaking those languages, decreased substantially in the research programs of American linguistics, especially those situated in linguistics departments. Scholars concerned with language repertoires, soon called the "ethnography of communication" (Gumperz and Hymes, 1972), found a home in anthropology and sociology departments. In anthropology, the microscopic analysis of conversation in multilingual contexts became a core research technique, and many insights showing the contextual nature of language repertoires come from these investigations. In sociology, a dominant concern was in language planning (Rubin and Jernudd 1971), and the attempt to secure the status of dying languages (Fishman 1972). And so, in linguistics, anthropology, and sociology, with Lieberson a remarkable exception, there was little interest in procuring cross-national data on linguistic repertoires.

A linguistic repertoire-the set of speech forms that a person commands-is not, to be sure, all there is to language. Language, after all, is not only a means of communication, but it is also a marker of identity and, through its pragmatics, a cultural institution (Laitin 1977; Lucy 1999). Nor is differential command of languages the only linguistic basis for group conflict. Status differentials in speech forms (described by Ferguson 1959 as "diglossia," in which speakers of the "higher" form of a language can exclude speakers of a "lower" form from participation in certain arenas of discourse) are potential sources of intra-linguistic group conflict. In this paper, I shall be looking only at the question of how to measure linguistic repertoires. While there will be identity/cultural/status aspects of language difference that are missed by the indicators I propose, at the least the development of the indicators that I offer here will allow political scientists to separate out the communication and social mobility aspect of language (emphasized in the work of Gellner 1964) from its identity/culture/ status aspects.

In light of these considerations, I seek here to revive interest in the Greenberg program, applying it to the needs of recent political science theory. I concede right away that several of the conceptual problems that linguists have alerted us to in the counting of language repertoires are not solved; I concede as well that not all aspects of language are entailed in my proposed indicators. Yet the theoretical need for data on the degree to which a

3See Joseph (1987, 1-3), who relies on political factors (control over a territorial region), structural differentiation from other languages, and long-term use for education, publications, and other prestigious realms to differentiate languages from dialects. He rec- ognizes, however, that this is rather ad hoc and that "No one has yet succeeded in establishing concrete rational criteria" in this regard.



political unit is a "language community" impels me to open a discussion in political science on how best to construct a useful database. This paper is thus a plea for a serious disciplinary discussion on the construction of such a database. It also suggests a new criterion for measuring "language community." For empirical illustration, I will rely on urban surveys I conducted in six republics of the former Soviet Union, supplemented by data from the 1989 Soviet census. Since the surveys were not designed to determine whether the Soviet republics were language communities, the data were only imperfectly suited to the measures. And since the number of cases is so few where I do have data, no statistical tests can be run on received theories. And as I pointed out in regard to self-reporting of languages, surveys are not the ideal-nor should they be the sole-source of information for the proposed measures. Thus this "workshop" paper can only propose measures for an important variable in comparative politics; it cannot as yet recommend the best methods of collection or test any theories with newly coded variables.

Greenberg's Indices of Diversity Greenberg's A-index

In 1956 Joseph Greenberg sought to develop a quantitative measure differentiating regions of the world in which there was great linguistic diversity from those in which there was relative linguistic uniformity. He agreed that linguists have clear impressions already about levels of diversity, but he sought a way "to render such impressions more objective, allow the comparing of disparate geographical areas, and eventually to correlate varying degrees of linguistic diversity with political, economic, geo- graphic, historic, and other non-linguistic factors" (1956, 109). He began with an index "A," which he called the monolingual nonweighted method to determine linguistic diversity, which is the inverse of the Hirschman- Herfindahl concentration index. The equation is as follows, with i representing the proportion in the population of each mother tongue group:

A = 1- El (i2 ) (1)

Thus if all members of the population have the same mother tongue, A = 0; if half the population has one mother tongue and the other half a different one, A = .5; if the population is divided equally among three mother tongue groups, A = .67, and if there are one hundred equally sized mother tongue groups, A = 1. The greater

the linguistic heterogeneity, the higher the score on the A-index.

Lieberson relied on this index in a paper published in 1975 (republished in 1981, 48-82) and coded for language diversity in thirty-five countries, with at least two data points for each country. The index showed great levels of diversity, ranging from .867 in 1960 South Africa to virtually zero in 1960 Greece. Based on the Laitin/Hough urban surveys of six republics in the former Soviet Union, and using answers to the question concerning respondent's mother tongue, I have calculated the A-index, and the results are presented on Table 1, row 1.4 A- index scores ranged from a low of .454 in Kazakhstan (where there is basically a division between two mother tongues, Kazakh and Russian, with Russian, in the weighted urban context, accounting for 61 percent of mother tongues) to a high of .561 in Bashkortostan (where there is a division among three mother tongues, Bashkir, Russian, and Tatar). One might surmise from examining these data that these Russian republics were all, more or less, at some midpoint between complete fragmentation and full homogeneity. As we will see, however, the A-index in the post-Soviet context overestimates the degree of linguistic fragmentation.

In Lieberson's study, the A-index serves as a dependent variable. His initial finding was that the general trend has been toward uniformity. In twenty-one of thirty-five countries, there had been a decline in diversity, and with a greater magnitude of change than in the fourteen countries in which there had been an increase. Yet the rate of change toward linguistic uniformity was quite slow. It would take some half millennium for a country to move from .62 (moderately high diversity) to .04 (virtual uniformity). Most of the article addressed the differential rates of change. Relying on an index of linguistic segregation (developed by Wendell Bell), the data show that diversity decreased to the extent that minority languages were not geographically segregated. Relying on UNESCO data on mother tongue schooling, the data showed that the decline of diversity was much lower when primary education was provided in minority languages. Relying on urbanization data, Lieberson found no consistent pattern linking levels of urbanization to pressures for linguistic uniformity. Relying on geopoliti- cal data, the study reported that World War II occupation by Germany or one of its allies had a significant impact on reducing diversity, that those countries that entered

4In my calculations, respondents who answered "other" in the survey, i.e., not having one of the mother tongues that were most prevalent in the republics and therefore listed on the answer sheet, were all categorized as having the same mother tongue. This is an unrealistic assumption, but does not significantly alter the results.


146 DAVID D. LAITIN

TABLE I Data from Soviet Census 1989 and Post-Soviet Surveys

Republic

Latvia Estonia Ukraine Kazakhstan Tatarstan Bashkortostan Index

1. A-lndex (mother tongue .508 .520 .514 .454 .543 .561 heterogeneity) (a)

2. Titular who do not speak 1.8 0.8 17.3 17.8 25.0 18.4 the titular (i.e., native) language (%)

3. Titular-only/E respondents .161 .314 .067 .161 .047 .013

4. B-Index (mother tongue .459 .520 .376 .454 .535 .532 heterogeneity modified by level of difference between languages (a)

5. H-Index (probability that .872 .736 .977 .907 .934 .933 residents, meeting randomly, will share a common language)

6. Titular-only or .285 .326 .077 .060 .068 .012 strictly preferred (b)

7. Russian-only or .535 .401 .782 .827 .835 .841 strictly preferred (b)

8. Centrality (percentage 87, 100, 16 61, 96, 71 of multilinguals who have this language in their repertoire) (T, R, M)

9. Monolingualism(c) (Ts, Rs, Ms) 39.5, 97.4, 15.6, 97.9, 19.8 16.1

10. Bilingualism(c) (Ts, Rs, Ms) 60.0, 2.5, 61.4, 2.0, 73.3 68.2

11. Trilingualism(c) (Ts, Rs, Ms) .5, .1, 6.9 23.0, .1, 15.7

12. Redundancy(d) .222 .096 .313 .063 .099 .097

Notes: (a) The A-index for this paper was computed as 1 - (Titular percentage squared + Russian percentage squared + Minority Language percentage squared + "Other" mother tongue squared + [2 X Titular/Russian Bilingual Mother Tongue squared]). The survey percentages were weighted based upon the 1989 census figures for the percentages by nationality of the urban populations by republic. In the Ukraine survey, a significant percentage of respondents listed Russian and Ukrainian as joint mother tongues.For both the A- and B-indices, I split the mother tongue bilinguals in half, with half counted as Russian mother tongue and half Ukrainian mother tongue. If I had assumed that this set of respondents were of the same mother tongue as both the Russian and Ukrainian mother tongue groups, and therefore doubled the square value of its interactions with one another, the Ukrainian A-index would have been .599, and B-index .498. (b) Weighted probability that two respondents are paired who must rely or strictly prefer relying on indicated language. (c) For figures on monolingualism, bilingualism and trilingualism, I did a cross tabulation of nationality to ABTR (fluency in Titular and Russian), ABTM (fluency in Titular and the Minority language), ABRM (fluency in Russian and the Minority Language), and ABTRM (fluency in all three languages). I computed ABTRM as a percentage of the total respondents of each nationality for the trilingualism score. I then computed two of the bilingual scores for each nationality (the two that contained the titular language of the nationality), subtracted the number of trilinguals, and divided by the total number of respondents for the bilingualism percentage. The remaining percentage I assumed to be monolingual. (d) For redundancy, I took the number of random matchings from the weighted version of the surveys. The denominator is the percentage of all matchings that involve successful communication (i.e., the H-index). The numerator is the percentage of all matchings in which both members of the pair are mutually fluent in more than one language.



the world system of states between 1914 and 1945 had faster reductions than any other set of countries in levels of linguistic diversity, and that those countries (such as Poland) that physically contracted also had more rapid reductions than the mean in linguistic diversity.

Despite the powerful and provocative findings (and therefore the unrealized potential of using the A-index as an independent variable for studies of ethnic conflict and violence), there are two big problems with the reliance on the A-index as an indicator of a "language community." First, the notion of a "mother tongue" is immensely am- biguous. People report on their mother tongues to census takers in a variety of ways, but with a systematic bias to- wards the historical language of their ethnic/cultural group and away from the actual speech patterns of their mothers. In Soviet censuses, for example, the question of mother tongue (rodnoi iazyk) received answers that were virtually the same as the question asking about their nationality (natsional'nost), even if respondents could not complete a sentence in their ascribed mother tongues. In the Laitin/Hough surveys (see Table 1, row 2), we found that among urban populations in four of the six republics (all but Estonia and Latvia), the revelation of a "mother tongue" for about a sixth of the titular residents (those respondents whose nationality group is the same as the name of the republic and who answered that they spoke the titular language with difficulty, with great difficulty, or not at all) did not coincide with the claim to speak it well. In Tatarstan, for example, a quarter of the respondents who claimed Tatar as their mother tongue also revealed that they did not speak it well.

In India, Brass (1974) has revealed similar biases such that census data could not be relied on to reveal the dominant language of home life, to the extent there is one, if that is the purpose of knowing a respondent's mother tongue. In fact, it is reasonable to infer from analyses such as Brass's that reports to census takers about mother tongue are more a reflection of the social relations between the central and peripheral cultural groups than they are of the linguistic repertoire of people living in the periphery. If this is the case, studies linking language diversity (by this measure) and social relations between ethnic groups would be tautological.

Second, and more problematical for a notion of a language community, the A-index does not address the issue of the communicative possibilities that exist within a society. If there is a high degree of mother tongue diversity, but there exists a lingua franca or universal multilingualism, we might say that while there is mother tongue diversity, there still exists within a particular country a coherent language community. Looking again at Table 1 (row 3), we see that less than seven percent of the urban

surveyed population in Ukraine is fluent only in Ukrai- nian; in Estonia, with the highest degree of titular monolingualism, only 31.4 percent of the titular speakers are monolingual. Clearly, mother tongue diversity hides a significant degree of community in urban settings of the post-Soviet republics.

Greenberg's B-Index

Greenberg recognized that diversity varies depending on how different the various languages are within a society. The variety of "Germans" in Germany may yield a high diversity score, but the proximity of the languages to each other may make for greater concentration than the A-index would reveal (if in fact in the A-index, High and Low German, inter alia, were coded as different languages). Yet Spanish and Nahuatl in Mexico represent quite different languages, suggesting a greater diversity than would be revealed in the simple A-index. Greenberg therefore suggested a monolingual weighted method, symbolized by B. "For each pair of languages (M, N) the probability of choosing successively a speaker of M and a speaker of N is the product mn, where m and n respectively designate the proportion of M speakers and N speakers to the total population. Each such product is weighted by multiplica- tion with a number between 0 and 1, here called the re- semblance factor (r), obtainable [according to Greenberg] as follows: Using arbitrary but fixed basic vocabulary, e.g., the most recent version of the glottochronology list,5 the proportion of resemblances between each pair of languages to the total list is given as a fraction ... the sum of the weighted products subtracted from 1 will give the monolingual weighted index B":

B = 1- ?1111((mn)(rmnt) (2)

While this is an extremely interesting measure-after all, it would distinguish the diversity that exists between Castile and Basque Country (from two different language families) from that which exists between Castile and Catalonia (both Indo-European, Italic, Western, Ibero Romance languages) far better than the A-index, especially if an equal percentage of Basques and Catalans reported their regional languages as their mother tongues. Even if far more Basques reported Castilian as their mother tongues, index B would show that the two autonomous regions of Spain have different forms of diversity, based on two different scores for r.

I Greenberg wrote "glottochronology" which relies on word lists to determine the date when linguistic communities separated, when he probably meant "lexico-statistics," which relies on the same methodology for cross-sectional comparisons.


148 DAVID D. LAITIN

The search for an indicator for r has not been successful. Glottochronology, the route suggested by Green- berg, has long been in linguistic disrepute, for a variety of reasons. In one schema of statistical similarity, developed by Morris Swadesh (1971, 271-284), a list of common words for which all languages could be compared was stipulated. This schema became the source of an exciting research program that could make reasonable predic- tions about the period of separation between dialects and therefore played a key role in reconstructing prehistoric migration patterns. But as a tool for comparative linguistics, it had many inadequacies. First, written forms could be quite different from spoken, and there were no general criteria for judging whether or how far two pronuncia- tions needed to be from one another to be coded as a different word. Second, most languages have many words for each of the items on Swadesh's list, and for any word the r-factor could change based on which word in a set of synonyms was chosen. Third, and as earlier indicated, linguistics, especially in the United States, focused almost entirely on structure by the 1960s, and much less on meaning. Therefore languages were coded based on mostly syntactic structures rather than on word relationships. Largely for this last reason, the "most recent version of the glottochronology list" that Greenberg asked his readers to rely on, was never produced, in the form of a covariance matrix of all languages of the world.

Comparative linguists such as Uriel Weinreich held such efforts in considerable scorn. He wrote that in regard to his particular interest, "Great or small, the differences and similarities between the languages in contact must be exhaustively stated for every domain-phonic, grammatical, and lexical-as a prerequisite to an analysis of interference [his dependent variable, and not language distance, but the same problems hold]." He therefore takes one attempt at an enumeration of difference, in which the author "contents herself with a four-page out- line of the differences between eleven languages as unlike as English, Cantonese, and Tagalog" as an exercise that "cannot serve the linguist" (1953, 2).6 Within linguistics) structuralists have had no interest in seeking measures of difference; and descriptivists have had no interest in par- simony. The result is that the discipline has not produced useful measures of linguistic difference.

Fearon and I (Fearon and Laitin 1997) have begun a different tack in regard to collecting data on language

difference, whether to be used for an Index B, or merely to ask whether groups with more distinct languages are more or less likely, other things being equal, to assimilate, integrate, or engage in violent conflict. We took the world classification of languages, produced by Ethnologue (Grimes 1996), a society of linguists interested in pro- ducing versions of the Bible in all languages of the world. Ethnologue linguists rely on linguistic trees, classifying languages by structure, with branch points for language family (e.g., Indo-European from Afro-Asiatic), language groups, and down to subdialects, as I have already delin- eated in my classifications of Castilian and Catalan. With Ethnologue data we created a variable measuring the distance between languages that we called LANGFAM. If the two languages are of different language families as with Spanish and Basque, the score for LANGFAM is 1, but if they break off on the fifth branch from one another, as do Akan from Ewe (two Ghanaian languages), the score is 5. The higher the number, the greater the language similarity.7

This measure of language distance is not without its own problems. First, since we used it to measure the cultural distance between a minority and the dominant group in a country, we faced the problem that there is no accepted criterion for judging the language of the dominant group or the minority. Our criterion was to code the historic language of the country's political leadership as the dominant language (and thus the dominant language of Kenya changed when Jomo Kenyatta, a Kikuyu, died and power was transferred to Daniel Arap Moi, a Kalenjin), and the historic language of the minority (and thus Germans in Russia are coded as German speakers even though most cannot speak German). This made sense to us as a general rule, but this decision still appears rather arbitrary, and we might decide to change our criterion of dominant language to that language used nor- mally in high-status public domains. Second, there are problems in Ethnologue's classification of languages, in part due to the fact that across language families, the data are not equally sensitive to dialectical differences in different regions. Since Ethnologue linguists have a greater interest in preparing Bible translations for heathens, they have been more sensitive to small differences in Papua- New Guinea than in Germany. And so, the data may overstate linguistic differences among non-Christians. A third problem, as was indicated earlier, is that structural differences are not a good proxy for communicative diffi- 6 See Mackey (1976: 281-307) for an even more complicated for-

mula based on calculations of differences between sets of juxta- posed sentences. Perhaps with high speed computers, an algorithm can be written that would allow a matrix of distances between all language dyads using Mackey's measure, but the much-easier-to- collect measure discussed below may well be sufficiently discrimi- nating and externally valid as to serve our purposes.

7 For purposes of " r," I normalized LANGFAM from 0 to 1: a break at the first branch (r= 0), a break in the second branch (r= .2), a break in the third branch (r= .4), a break in the fourth branch (r= .6), a break in the fifth branch (r = .8), and the same language (r = 1).



culties. While Castilian and Mexican Spanish are closely related, and equidistant from English in Ethnologue's classification, the interference of English-speakers in Baja California is so great as to make Spanish spoken there sound somewhat like a dialect of English, with higher levels of mutual intelligibility than would be predicted from structural distance. More generally, "linguistic distance" only in part determines communications' difficulties. Language distances can be shortened when people are accustomed to meeting strangers, when they have a strong will to understand one another, and when the context of communication (e.g., agreeing on a price in a market rather than interpreting a poem) involves simple declarative speech.

A fourth problem is that language distance may be subject to the same problem of endogeneity (where the value of the independent variable is affected by a changed value on the dependent variable) discussed above in reference to the enumeration of groups. Consider these examples of political "divorce" between languages: Hindu/ Urdu; Serb/Croatian; Romanian/Moldavian; and Russian/ Ukrainian. In such cases, ethnic entrepreneurs seeking political independence work with local linguists in order to impose a new standard dialect for a language so that it is maximally different from the official language of the once central state. The enlarged difference between the separatist's language and the former language of central rule would then be the result of political conflict, not the cause of it. Despite these difficulties, Ethnologue data are available and give a rough and ready measure of linguistic difference, and were we to use the B-index for the identification of a language community, the branching data from Ethnologue rather than a nonexistent glottochrono- logical covariance matrix are the best available source.

Running the B-index for the six republics of the Laitin/Hough survey, with the data on Table 1 (row 4), we can see that Ukraine's mother tongue diversity is substantially reduced in the B-index (.376) compared to its score on the A-index (.514). Given the proximity of Ukrainian to Russian, and the relative ease in understanding across these two languages, the B-index gives a more realistic sense of the existence of a mother-tongue language community than did the A-index. Also of interest is that Latvia, with a very close score to Estonia on the A-index (Latvia is more homogeneous by .012), shows much less diversity on the B-index (Latvia is more homogeneous by .061), due to the relative proximity of Latvian to Russian, compared to Estonian and Russian. In the two republics where we had data on minority languages, the B-index shows less diversity than does the A- index due to the close proximity of Bashkir to Tatar in Bashkortostan, and of Tatar to Chuvash in Tatarstan. The

scores on the B-index are thus closer to my intuitions of language diversity in these republics than is the A-index.

Our LANGFAM index has problems in that it makes some disputable judgments and in some cases its value is endogenous with the political outcomes it is supposed to explain. Nonetheless, as is apparent in the comparison of the post-Soviet data on the A- and B-indices, our coding of language distance is a potentially useful parameter for any future index of language community.

Greenberg's H-Index

Greenberg recognized that the key to linguistic homogeneity might not be in the sharing of a mother tongue or even having proximate mother tongues, but rather the sharing of any language enabling two people in a community to communicate. His H-index, what he calls the index of communication, (1956, 112) "is the probability that if two members of the population are chosen at random, they will have at least one language in common." So, if there are three languages, A, B, and C, for computing the H-index we need to know the percentage of the population with the following seven repertoires: A only; B only; C only; A&B; A&C; B&C; and A&B&C. The sum of the products of these sets where there is communication (e.g., A meeting A&C) constitutes the index, as ex- pressed below.

H= A*A(1) + 2*A*B(0) + 2* A*C(0) + 2*A*AB(1) + 2*A*AC(1) + 2*A*BC(0) + 2*A*ABC(l).+ B*B(1) + 2*B*C(0) + 2* B*AB(1) + 2* B*AC(0) + 2*B*BC(1) + 2*B*ABC(l) +C*C(1) + 2*C*AB(0) + 2*C*AC(1) + 2*C*BC(1) + 2*C*ABC(1) + AB*AB(l) + 2*AB*AC(1) + 2*AB*BC(1) + 2*AB*ABC(1) + AC*AC(1) + 2*AC*BC(1) + 2*AC*ABC(1) + BC*BC(1) + 2*BC*ABC(l) + ABC*ABC(1) (3)

Greenberg relied on the 1930 census of Mexico, which had good statistics on bilingualism and computed that while the all-Mexico A-index was .3122, the H-index was .8386. Converting the H-index to one of linguistic noncommunication, the figure is .1614, about halving the degree of diversity. While under the A-index, Mexico in 1930 may not have looked like a language community, from the point of view of the H-index, it seems like one.

In computing the H-index relying on the Laitin/ Hough survey data, and different from indices A and B where the respondents' reported mother tongues were the source of information, I have relied solely on respondents' reports on their fluency in a set of languages. Since in many cases respondents report that they are not fluent


150 DAVID D. LAITIN

in their reported mother tongues (Table 1, row 2), my computation of the H-index is more an index of communication possibilities than it is of ethnic solidarity. As shown in Table 1 (row 5), if pairs of residents in urban areas of these republics were matched randomly, from 74 percent (in Estonia) to 98 percent in Ukraine would have a language in common in which both interlocutors spoke fluently. By measure of the H-index, the urban areas of all six republics were full-fledged language communities.

Lieberson (1981, 318) developed the H-index in a variety of ways. The most interesting is his disaggrega- tion based on language power. For any two languages, A & B, the data that go into the H-index can be ordered so that we know the conditions under which (1) A must be used; (2) B must be used; (3) Either can be used, but (a) where A is favored as it is the principal language of two bilingual A & B speakers; (b) where B is favored as it is the principal language of two bilingual B & A speakers; and (c) where there is no clear preference, as the two bilingual speakers have different principal languages; and (4) No communication is possible. A rather interesting notion of community in a multilingual society might be one that reflects the percentage of c-type communications, which might be called "redundancy" and thus be a criterion of a more egalitarian communication system.

My data from the Soviet republics allow me to make preliminary codings for six urban samples on the Lieber- son extension. Comparing rows 6 and 7, one can see that there was a highly inegalitarian language community in the post-Soviet republics. In our urban samples of all republics, Russian is either necessary or strictly preferred by both interlocutors (both consider Russian to be the language they speak best; or one considers Russian to be the language he or she speaks best and the other interlocutor is indifferent between Russian and another language) in far more cases than is the case with the titular languages. This is the situation that many of the nationalist politi- cians have sought to reverse. Furthermore, the degree to which random interlocutors can communicate with each other in more than one language (measured as "redundancy" in Table 1, row 12), while the case for 31 percent of the possible interactions in Ukraine and 22 percent of them in Latvia, would occur in less than 10 percent of the interactions in the cities of the other republics.

Despite great advantages of the H-index for the coding on the variable "language community," we have virtually no data on the language repertoires of populations in other countries that are commensurate with the Laitin/ Hough surveys. It would not be easy to collect such data with any hope of cross-country reliability, and this for several reasons. First, as I mentioned in regard to asym- met-ric status of languages, the criteria people use to determine whether they speak another language are quite

different across countries, and across languages within a country. Second, cross national wordings of questions can give quite different signals to respondents, yielding bias in the reported repertoires. In the post-Soviet surveys reported here (see Appendix), respondents were asked whether they speak (vladet') particular languages. Since vladete connotes possession or mastery, this might have set a higher criterion for respondents to acknowl- edge than if we had used govorie, which merely implies the capability of saying or telling things in a particular language. There is no way to precisely calibrate the con- notation of the terms used in a survey instrument across languages (Hymes 1970). Third, as noted earlier, respondents systematically misreport their language behavior to those who question them about it.

Yet the probability of being able to communicate effectively with nearly all interlocutors, the key idea in the H-index, comes close to what we mean by a language community, and it is worthwhile to develop survey and observational techniques to procure raw data for cross- country H-indices. One way to address the problems of cross-national survey bias is to have complementary eth- nographic reports on subsamples of the surveyed populations, in which participant observation techniques could be used to assess the direction and magnitude of the bias in each of the surveys, allowing for correction parameters. More ambitious would be to get stratified samples of country populations and use standard second (and if appropriate third) language testing techniques (such as dictation and measurements of the speed of oral production) that are designed to measure achievement. More ambitious still would be to do more in-depth (pre- sumably with a lower number in the sample) "linguistic background scales" in which trained interviewers would be able to assess language skills, languages used for pres- tige and nonprestige functions, and intergenerational patterns of maintenance.8

Language Regimes as Communities

The equation of a common mother tongue with a language community (although noticing that Switzerland was an exception) is not precisely what Deutsch and other theorists of the nation were thinking about when they theorized about language communities. Yet high

8 Baetens Beardsmore (1986) discusses the opportunities and dan- gers (especially in regard to the use of censuses) of measuring bilingualism. His discussion should form the basis of any large-n research project seeking to get valid measures of the H-index. To my knowledge, no linguist has attempted to measure bilingualism using the best of these techniques so as to make inferences about the language situation in a country or a large region within a country.



levels of communicative success may not be quite right either, inasmuch as a notion of a community entails a set of normative rules about what languages one ought to learn and what languages one ought to speak in specific domains. In Laitin (1992), I suggested that there are language configurations within state boundaries that are both multilingual and reflective of a national community. In this section, I try to indicate how a multilingual language community or language regime might be specified more precisely. Having shown how this could be done, I suggest that future data collection on language regimes should differentiate on two separate dimensions: the first would be a categorical variable indicating the country's type of language community; the second would be a continuous variable measuring the degree of fulfillment for each type of community.

As a preliminary matter, I shall discuss the notion of a "language situation" which is defined by Charles Ferguson as "the total configuration of language use at a given time and place, including such data as how many and what kinds of languages are spoken in the area by how many people, under what circumstances, and what the attitudes and beliefs about languages held by the members of the community are" (1966, 309). He reduces in this schema a vast amount of information into an alge- braic form, such that the reader will know all the languages used in a geographical area, the functions in which they are used, the probability they will be used for those functions, and the status of the language controlled for functional domain. No one (to my knowledge) has ever coded another case after Ferguson's sketch of the Ethio- pian language situation. But even had others done so, there would be no way to construct a variable from those data that would have told us whether in Ethiopia there was a language community or what value Ethiopia had on a dimension of language community.

After reviewing Ferguson's and other attempts to cat- egorize language situations, Laitin (1992) offered a cat- egorization of states based on the language repertoires that are necessary for citizens of any state to assure them a wide range of mobility opportunities within domestic political, economic, and social institutions. A categoriza- tion of "language regimes" can be derived based on the notion of necessary (and normatively valued) language repertoires. The goal is to specify a set of easily codable but distinguishable language regimes, keeping in mind that in each type, the degree to which a country is a language community would be a variable.

Rationalized Language Communities The first critical distinction is whether the state has "rationalized" language or whether there is a "multilingual

regime." Rationalization, the authoritative imposition of a single language for educational and administrative communications, is a concept derived from Max Weber (1968), who used the term to refer to modern state prac- tices of standardization and bureaucratization. A common currency, a common legal system, and a unified tax code are all examples of rationalization, as would be a common administrative language. For purposes of coding "language communities," states that are rationalizing are "language communities" to the extent that their A-indices approach unity. States can achieve language rationalization by three different methods.

(R1) Rationalization through the recognition of a lingua franca This form of rationalization is evident when there is a language spoken widely and understood practi- cally universally within the boundaries of a state, but this language is not associated as the mother tongue of a significant language-group living within that state. Swahili in Tanzania, Bahasa in Indonesia, English in the U.S., and perhaps even English in England are examples of rationalization through the state recognition of a lingua franca as practiced in society.

(R2) Rationalization through the recognition of the language of a majority group French in France, Han Chinese in China, and Kyotsugo Japanese in Japan are examples of a dominant language group having the power to impose its standard on a wider society.

(R3) Rationalization through the recognition of the language of a minority group Here the rationalization of Spanish by Mestizos in South America, Haile Selassie's policy to impose Amharic on Ethiopia, and Afrikaner attempts to make Arfrikaans the rationalized language of South Africa are examples.

Multilingual Language Communities

If states have not sought rationalization or were com- pelled by political pressure to recognize linguistic rights of minority populations, we can say that these states have multilingual regimes. There are two different types of multilingual regimes that might be distinguished.

(Ml) Multilingual regimes with individual multilingual repertoires Ml regimes demand that individuals culti- vate language repertoires that reflect different languages needed for different functional domains, e.g., for official regional affairs, for economic exchange in large businesses, for official business with the central state, for local services such as hospitals and primary schools. In India there is a well-established (but not formally recognized)


152 DAVID D. LAITIN

3?1 language regime. Here, Indians with aspirations for a wide range of mobility opportunities must know Hindi (the language of much popular culture and some state documents), English (the language of the higher civil service and big business), and the state language (used for most state services and education). This is a three-language formula. For those who live in a state where Hindi or English is the state language, only two (3-1) languages are necessary for one's repertoire. For those who are minorities within states where Hindi and English are not state languages, and seek minority rights, their people need to know four (3+ 1) languages-English, Hindi, the state language, and their minority language. While no country has precisely India's configuration, several Afri- can countries (e.g., Congo and Nigeria) are moving toward such an outcome.

Another variant of an MI regime, a 2?1 outcome, occurs when there is official rationalization in one language with legalized regional autonomies relying on other languages. This is the case where a single language becomes the de jure official language of the state, and everyone with reasonable mobility prospects is literate in it. However, a region or set of regions gets limited autonomy rights for official use of the regional language in a range of prescribed domains. It is even possible that regions can successfully compel states to require some state bureaucrats to be literate in a regional language as well. Examples include Canada (with Quebec), Spain (with Basque Country, Galicia, and Catalonia), Algeria, the emergent European state (Laitin 1997), and Russia (with a large number of republics having their own official language). A methodological problem arises as MI regimes must incorporate a dimension to the degree there is a language community.

Here I propose three quantitative measures of an M 1 regime to determine whether it constitutes a language community. Each of them captures part of the orderli- ness that helps give M1 regimes a sense of being a community where there are norms about language choice, norms that might make little sense if the goal were to maximize communicative range. First, relying on an indicator developed by Abram de Swaan (1993), there must be at least one language with high "centrality." This means that a high percentage of all multilingual speakers must share at least one common language. This indicator is computed by selecting from the population all people who speak more than one language and seeing what percentage of that population speaks each of the enumer- ated languages. As at least one language reaches 100 percent, there is maximum "centrality." In Cameroon, for example, there is a division between those who have added French and those English into their repertoires. Due to a potentially low score on centrality, Cameroon

would be coded as a M 1 regime with a low level of community. Second, the MI regime that I have just outlined requires that those who claim the "central" language as their mother tongue tend to have the highest incidences of monolingualism in the population. It requires as well that those who belong to regionally based language groups have the highest incidences of bilingualism. It also requires that minorities within regions who have neither the central nor the regional languages as their mother tongues have the highest incidences of trilingualism. Third, as I foreshadowed in my discussion of the H-index, an MI regime to be a community requires that people in each of the categories (e.g., central language as mother tongue; regionally based nationalities; minorities within regions) have the same set of languages in their repertoires rather than a complementary set. Therefore, in an MI regime, there will be high redundancy of language competence. Or another way to put it is that the strategy of language acquisition will not look as if people are trying to increase the H-index; rather they are trying to mimic the language repertoires of those similarly situated in the multilingual society in which they live.

The Soviet Union was an MI regime. To be sure, it had elements of rationalized language regime. Nearly all pupils in the Soviet Union were required to study Rus- sian, and social mobility through membership in the Communist Party required facility in Russian. Yet (Table 1, row 3) more than 30 percent of Estonian respondents claimed not to have facility in Russian, and (row 7) fewer than half the urban respondents strictly preferred Rus- sian in their daily interactions. Also, because each republic had considerable autonomy in promoting its titular language (and in 1989 all passed language legislation that to various degrees gave preeminence to the titular languages for official business in the republics), the Soviet Union had elements of a pillarized system to be discussed below. But because of the high level of individual multilingualism (in which the titular languages and Russian had complementary functional domains) the Soviet Union is best considered an MI language regime. Fur- thermore, my data on two of the six former Soviet republics suggest that the Soviet Union's language regime cer- tainly approached "community" standards.9 On the dimension of centrality (row 8), 100 percent of the bi- or trilinguals in Tatarstan and 96 percent in Bashkortostan

9 In the surveys in these two republics (Tatarstan and Bash- kortostan), I have information on the titular language, Russian, and a single minority language. It would have been better if I had data on each respondent's full repertoire. For the other four republics, I have data only on Russian and the titular language. Thus all bilinguals would be fluent in both of these languages, and both languages would come out as equally and completely central. I thus compute the measures for a multilingual language community only for Tatarstan and Bashkortostan.



are fluent in Russian. This means that in both republics, there is a focal language of wider communication. The second place central language for bilinguals is 13 percent behind Russian in Tatarstan and 25 percent behind Rus- sian in Bashkortostan. On the spread of monolingualism, bilingualism, and trilingualism (rows 9-11), these two republics had a normative order quite consistent with a 3?1 language situation. Russians (by nationality) were far more likely to be monolinguals in both republics. The minorities had relatively high trilingual scores in both republics (far higher than the Russians), but in Bash- kortostan the titulars were more likely to become trilingual than the minority. As for bilingualism, the criteria for a 3?1 language community would demand that the titulars have the highest incidences; the data show that in the urban areas, it is the minorities that have the highest bilingualism quotients. There is thus only moderate support for the existence of a normative order of who learns what languages in Bashkortostan and Tatarstan. On the degree of redundancy (row 12), the data show far greater redundancy in Latvia and Ukraine than in Tatarstan and Bashkortostan, but in general not a very high degree any- where. Despite the low levels of redundancy, there is far more fluency than would be demanded for effective interlocutor communication, and this is an indicator of a moderately high score for the language community dimension.

We might want to go beyond the three measures to determine whether an M1 regime was a language community by getting some behavioral data. For example, we might want to observe random interactions in a variety of domains, such as markets, hospitals, ticket booths, etc., and measure the time before which interlocutors have fig- ured out the language in which they can communicate. If there were a language community, people would know the norms of what languages are used in what domains, and the adjustment of language to situation would be very fast. Such data gathering would require quite sophisti- cated observational techniques, but could be collected in conjunction with the participant observation that would complement the surveys on language repertoires.

(M2) Multilingualism through Pillarization In this multilingual configuration, there is societal but not any need for individual multilingualism. Each region under pillarization has equal rights to write laws, to impart education, and to administer society in its own language. There is no necessity for a citizen living in one pillar to learn the language spoken in regions of the other pillars, but there is a minimal level of bilingualism for those who develop a specialty in all-pillar governance. The exem- plary cases of multilingualism through pillarization are Switzerland and Belgium (McRae 1983 and following).

In Switzerland (but decreasingly so in Belgium) there are cultural phenomena which provide glue to the society outside of language. A pillarized configuration is one in which there is a high A-index in each region, but a relatively low H-index for the country at large.

Conclusion

Theories of nationalism, democracy, regional assertiveness, and civil war have relied on vague and unspecified notions of cultural (or linguistic) heterogeneity. Con- cerning language, the criterion of a common "mother tongue" is usually relied on, but it is a conceptually inad- equate measure. Under conditions of mother tongue diversity, the criterion for a language community that requires structural proximity of languages is equally problematical, and here linguistics gives us little theoretical guidance or empirical data. Another criterion of a language diversity measures the probability that two random people in the country will share a language. Not only are there empirical problems here (on getting standards of "knowing" a language), but there are theoretical ones as well (concerning whether the ability to communicate is sufficient for an indicator of cultural homogeneity). In light of these problems, I suggest a typology of language communities. For one of them-the multilingual community of type MI-I postulate three complementary measures-centrality, a normative rule as to who learns what languages, and redundancy-to measure the degree of community-ness.

Data collected from post-Soviet republics illustrate the possibility of collecting data to code political units on the basis of these criteria. Although the number of observations is too low to perform any statistical analysis, the data on Table 1 are illustrative of the indices that could become standard measures of ethnolinguistic fragmentation and reflect better linguistic realities than do current indices. The relationship of the level of "language community" to such outcomes as democratic stability, ethnic violence, and economic growth could be, if data were systematically collected on language repertoires of populations, tested with greater confidence in the external va- lidity of the measures.

To the extent that measures of language community can help to test theories of ethnic diversity, it would be useful to develop databases on two dimensions. The first dimension would be a categorical variable that would dis- criminate among three possible outcomes: (a) rationalization; (b) the development of a multilingual language communities of type M1; or (c) M2. The second dimension would be an indicator of the degree to which each of


154 DAVID D. LAITIN

these types of language communities is fulfilled. For rationalization, I would recommend use of the B-index. For MI communities, there would need to be an index com- bining scores on centrality and redundancy, as well as an assessment as to the existence of normative rules about who learns what languages. For M2 communities, the measure would rise with a high B-index for each region of the state and decline with the rise of the H-index for the state as a whole. Each of these two dimensions could serve as an independent variable (or they could be combined, if theoretically elaborated) in explanations for democratic breakdown, regional assertiveness, and inter-ethnic conflict and cooperation.

But concerns of findings go beyond my purposes here. It is hoped that this paper will induce political scientists to analyze and modify the proposals herein. Once the community of political scientists coordinates on which of these measures of ethnic diversity are worth developing, there will be incentives for field researchers to collect comparable data-to the extent such data can be comparable-across countries.

Manuscript submitted November 4, 1998. Final manuscript received May 24, 1999.

Appendix Notes on the Laitin/Hough Survey

Jerry Hough and I collected data in six republics of the former Soviet Union from which it would be possible to construct both A and H indices. As part of an NSF-supported research project, Hough and I commissioned surveys in four former Union Republics (Estonia, Latvia, Ukraine, and Kazakhstan) and two former Associated Re- publics within the Russian Federation (Bashkortostan and Tatarstan). With support from the Harry Frank Guggen- heim Foundation, I have collected but not yet analyzed data from two additional former Union Republics Moldova and Azerbaijan

In the surveys, we asked a large sample of the urban population questions concerning (a) their "mother tongues", (b) the language they speak best, (c) their command over the titular language of the republic, (d) their command of Rus- sian, and for the two former ASSRs, (e) their command of a minority language (Tatar in Bashkortostan, Chuvash in Tatarstan). For questions concerning "command," we asked how freely they speak (vladet') the particular language. Those who answered that they think in it or freely speak it were considered conversant in the language. Those who answered that they speak it with difficulty, with great difficulty,

or do not speak it at all were considered nonconversant. From the answers to these questions, and relying on Greenberg's indices, it became possible to develop a set of indicators of a language community with different emphases.

The samples for these republics were not determined with either of these indices in mind. They overrepresent minorities and Russians. For this reason, in measuring Greenberg's indices, I weigh the cases conditioned on the percentages of urban residents of each nationality in each republic as revealed in the 1989 Soviet census.

References Baetens Beardsmore, Hugo. 1986. Bilingualism: Basic Principles.

Avon: Multilingual Matters Ltd. Baetens Beardsmore, Hugo, and Els Witte. 1987. The Interdisci-

plinary Study of Urban Bilingualism in Brussels. Philadel- phia: Clevedon.

Brass, Paul. 1974. Language, Religion and Politics in North India. Cambridge: Cambridge University Press.

Chomsky, Noam. 1957. Syntactic Structures. The Hague: Mou- ton.

Connor, Walker. 1994. Ethnonationalism: The Quest for Under- standing. Princeton: Princeton University Press.

Dahl, Robert. 1971. Polyarchy. New Haven: Yale University Press.

De Swaan, Abram. 1993. "The Evolving European Language System: A Theory of Communication Potential and Lan- guage Competition." International Political Science Review 14:241-256.

Deutsch, Karl. 1954. Nationalism and Social Communication. Cambridge: MIT Press.

Dorian, Nancy. 1981. Language Death. Philadelphia: University of Pennsylvania Press.

Fearon, James D., and David D. Laitin. 1997. "A Cross-Sectional Study of Large-Scale Ethnic Violence in the Postwar Period" Mimeo, University of Chicago.

Ferguson, Charles. 1959. "Diglossia." Word 15:325-340. Ferguson, Charles. 1966. "National Sociolinguistic Profile For-

mulas." In Sociolinguistics, ed. W. Bright. The Hague: Mou- ton.

Fishman, Joshua. 1972. The Sociology of Language. Rowley, Mass.: Newbury House.

Geertz, Clifford. 1973. "The Integrative Revolution." In The In- terpretation of Cultures, ed. C. Geertz. New York: Basic Books.

Gellner, Ernest. 1964. Thought and Change. Chicago: University of Chicago Press.

Greenberg, Joseph H. 1956. "The Measurement of Linguistic Diversity." Language 32:109-115.

Grimes, Barbara F., ed. 1996. Ethnologue: Languages of the World. 13th ed. Dallas: Summer Institute of Linguistics.

Gumnperz, John, and Dell Hymes, ed. 1972. Directions in Sociolinguistics: The Ethnography of Communication. New York: Holt, Rinehart and Winston.



Gurr, Ted Robert. 1993. Minorities at Risk. Washington, D.C.: United States Institute for Peace.

Hymes, Dell. 1970. "Linguistic Aspects of Comparative Political Research." In The Methodology of Comparative Research, ed. Robert Holt and John Turner. New York: Free Press.

Joseph, John Earl. 1987. Eloquence and Power. New York: Basil Blackwell.

Laitin, David D. 1977. Politics, Language and Thought. Chicago: University of Chicago Press.

Laitin, David D. 1992. Language Repertoires and State Construc- tion in Africa. Cambridge: Cambridge University Press.

Laitin, David D. 1997. "The Cultural Identities of a European State." Politics & Society 25:277-302.

Laitin, David D. 1999. "Language Conflict and Violence." Un- published manuscript. University of Chicago.

La Porta, Rafael, Florencio Lopez-de-Silanes, Andrei Shleifer, and Robert Vishny. 1998. "The Quality of Government." Presented to the Research Workshop in Political Economy, Harvard University.

Lieberson, Stanley. 1969. "How Can We Describe and Measure the Incidence and Distribution of Bilingualism?" In De- scription and Measurement of Bilingualism, ed. L. G. Kelly. Toronto: University of Toronto Press.

Lieberson, Stanley. 1981. Language Diversity and Language Contact: Essays by Stanley Lieberson. Stanford: Stanford Uni- versity Press.

Lijphart, Arend. 1977. Democracy in Plural Societies. New Ha- ven: Yale University Press.

Lucy, John. 1999. "Grammatical Categories and the Develop- ment of Classification Preferences: A Comparative Ap- proach." In Language Acquisition and Conceptual Develop- ment, ed. S. Levinson and M. Bowerman. Cambridge: Cambridge University Press.

Mackey, W. F. 1976. Bilinguisme et Contact des Langues. Paris: Klincksieck.

McRae, Kenneth. 1983 and following. Conflict and Coiiipronise in Multilingual Societies. Vol. 1 Switzerland; Vol. 2 Belgium. Waterloo, Canada: Wilfred Laurier University Press.

Rabushka, Alvin, and Kenneth Shepsle. 1972. Politics in Plural Societies: A Theory of Democratic Instability. Columbus: Merrill.

Rubin, Joan, and Bjorn Jernudd. 1971. Can Language be Planned? Hawaii: University of Hawaii Press.

Swadesh, Morris. 1971. The Origin and Diversification of Lan- guage. Chicago: Aldine.

Weber, Max. 1968. Economy and Society. Berkeley: University of California Press.

Weinreich, Uriel. 1953. Languages in Contact. The Hague: Mouton.


Article Contentsp. 142p. 143p. 144p. 145p. 146p. 147p. 148p. 149p. 150p. 151p. 152p. 153p. 154p. 155

Issue Table of ContentsAmerican Journal of Political Science, Vol. 44, No. 1 (Jan., 2000), pp. 1-185Front Matter [pp. ]Meeting Halfway between Rochester and Frankfurt: Generative Salience, Focal Points, and Strategic Interaction [pp. 1-16]Old Voters, New Voters, and the Personal Vote: Using Redistricting to Measure the Incumbency Advantage [pp. 17-34]Partisanship and Voting Behavior, 1952-1996 [pp. 35-50]Information and Congressional Hearings [pp. 51-65]Measuring Issue Salience [pp. 66-83]A Unified Statistical Model of Conflict Onset and Escalation [pp. 84-93]Putting Parties in Their Place: Inferring Party Left-Right Ideological Positions from Party Manifestos Data [pp. 94-103]Constituency Characteristics and the "Guardian" Model of Appropriations Subcommittees, 1959-1998 [pp. 104-114]Electoral Rules, Career Ambitions, and Party Structure: Comparing Factions in Japan's Upper and Lower Houses [pp. 115-122]Promoting Human Rights and Democracy in the Developing World: U.S. Rhetoric versus U.S. Arms Exports [pp. 123-131]Labor Organization and Electoral Participation in Industrial Democracies [pp. 132-141]WorkshopsWhat Is a Language Community? [pp. 142-155]Using Media-Based Data in Studies of Politics [pp. 156-173]Modeling Direction and Intensity in Semantically Balanced Ordinal Scales: An Assessment of Congressional Incumbent Approval [pp. 174-185]

Back Matter [pp. ]

Documents

Languagecommunities-Laitin