Language as a Driver of Migration and Trade using the

Language as a Driver of Migration and

Trade using the Gravity Model:

A Comparative Analysis

Farhat Fasih

Master of Philosophy in Economics

Department of Economics

University of Oslo

May, 2018

(ii)

LANGUAGE AS A DRIVER OF MIGRATION AND TRADE USING THE GRAVITY MODEL:

A COMPARATIVE ANALYSIS

________________________________________________________________________________

FARHAT FASIH

(iii)

© Farhat Fasih, 2018

Language as a Driver of Migration and Trade using the Gravity Model:

A Comparative Analysis

Farhat Fasih

http://www.duo.uio.no/

Publisher: Reprosentralen, Universitetet i Oslo, Oslo – Norway

http://www.duo.uio.no/

(iv)

Acknowledgments

This thesis marks the completion of my master’s degree and my tremendous journey at

the University of Oslo. Working on the thesis was both challenging and exciting for me.

Nevertheless, it has been a period of intense and progressive learning. I would not be

writing this preface today, had it not been for the support and encouragement of several

people who remained instrumental throughout the process. I would like to reflect on

the people who contributed to my writing, either directly or indirectly:

First and foremost, thank you to my advisor, Professor Andreas Moxnes, for

sharing his insightful knowledge, invaluable input, learning sessions, and

constant availability. I know you have been really patient and kind, thank you for

bearing with me!

I am indebted to this university for providing a platform of learning, for its

inspiring teachers, for all the facilities, and most importantly for the conducive

study environment.

I am particularly indebted to my parents for giving me a life full of opportunities,

and above all for their belief in me, as well as to my siblings for their never-ending

affection.

I am also thankful for my classmates and friends, for their study sessions, informal

discussions, and for simply being stress-busters. To Markus for his valuable

comments and Corina for her exhaustive proofreading.

Special thanks to my husband, for his zealous support, patience, and sharing my

responsibilities. Thanks, Fasih!

Finally, apart from all the gratitude and token of thanks, there is one beautiful

soul, my son Aiden, to whom I am really sorry. By this tender age of 18 months,

he has already learned my problems and taught me time management.

All the remaining mistakes and misprints are my sole responsibility.

Farhat Fasih

May 4, 2018

(v)

Abstract

Bilateral flows of both people (via migration) and goods (via trade) between countries

are imperative in the field of international economics and share many common

characteristics. This thesis attempts to examine the relative significance of language

proximity in terms of international migration and trade simultaneously, using the

standard gravity model. For this purpose, we use a dataset on migration and trade flows

from 223 host countries to 30 OECD destination countries in the period 1980-2006. We

find that language plays an important role in shaping migration patterns and boosting

trade volumes. In addition, the effect of gravity variables (i.e., GDP and geographical

distance) on migration and trade are in line with economic theory with small variations.

The estimation results successfully show that countries that are farther apart trade less

and have less migration between them, while economically larger countries are more

engaged in bilateral trade and attract more immigrants.

(vi)

TABLE OF CONTENTS

ACKNOWLEDGMENTS IV

ABSTRACT V LIST OF FIGURES AND TABLES VII

INTRODUCTION 1 1

THEORETICAL FRAMEWORK AND LITERATURE 2 4

ECONOMICS OF MIGRATION 2.1 4

MIGRATION AND CULTURAL PROXIMITY TO LANGUAGE 2.2 6

ECONOMICS OF INTERNATIONAL TRADE 2.3 8

INTERNATIONAL TRADE AND CULTURAL PROXIMITY TO LANGUAGE 2.4 10

GRAVITY MODELS OF INTERNATIONAL TRADE AND MIGRATION 2.5 11

EMPIRICAL MODEL 3 15

ESTIMATING EQUATIONS 3.1 15

ESTIMATION METHODOLOGY FOR GRAVITY EQUATIONS 3.2 20

DATA DESCRIPTION AND STRUCTURE 4 23

DATA SOURCES 4.1 23

DATA MERGING 4.2 28

DEPENDENT VARIABLE(S) 4.3 29

SUMMARY STATISTICS 4.4 29

EMPIRICAL RESULTS 5 32

MAIN RESULTS 5.1 32

ROBUSTNESS 5.2 40

ECONOMETRIC ISSUES 5.3 46

COMPARATIVE ANALYSIS 6 50

CONCLUSIONS 7 53

BIBLIOGRAPHY 56 APPENDIX 60

(vii)

LIST OF FIGURES AND TABLES

Figure 1 Multi-stage Economic Model of Migration ........................................................... 5

Figure 2 Multi-dimensional Explanation to Migration-Decision ........................................... 5

Figure 3 Determinants of Bilateral Trade Flows ................................................................ 9

Figure 4 Migration Flows to selected OECD Countries from all world, 1980-2006 .............. 24

Figure 5 Trade Flows to selected OECD Countries from all world, 1980-2006 ................... 25

Figure 6 Migration Flows by Linguistic Proximity Index OECD, 1980-2006 ........................ 27

Figure 7 Trade Flows by Linguistic Proximity Index OECD, 1980-2006 ............................. 28

Figure 8 Development in Migration and Trade Flows for OECD countries 1980-2006 ......... 31

Table 1 Summary Statistics for the variables subject to analysis…………………………………….30

Table 2 Effect of Language Proximity and Gravity Variables on Migration Flows……………….33

Table 3 Effect of Linguistic Proximity and Gravity Variables on Trade Flows……………………..37

Table 4 Robustness Checks: Controlling with additional dummies and variables………………..42

Fixed Effect Regression and Poisson Estimation with fixed effect

Table 5 Robustness Checks: Alternative Measures of Linguistic Proximity…………………………46

(Dyen and Levenshtein) for First Official Languages with Controls

Table 6 Multicollinearity Diagnostic: The VIF Method……………………………………………………..47

Table 7 Seemingly Unrelated Regression (SURE) - Migration and Trade Equations……………51

Table A.1 Summary Statistics – Additional Variables from Full Sample……………………………60

Table A.2 List of Variables and Definition – Full Sample………………………………………………..61

Table A.3 List of OECD Destination and Source Countries……………………………………………..62

Table A.4 Definitions and Technical Notes…………………………………………………………………..63

(viii)

1

1. Introduction

Human migration and bilateral trade are two broad subjects in international

economics and their origin can be traced back to the earliest periods of history. Rapid

technological development coupled with advancements in transportation and

communication have significantly increased human mobility and international trade. In

this thesis, we want to analyze the determinants of migration and trade simultaneously,

using the gravity model.

Migration and trade have always been of particular interest to economists. Recent

debates on the determinants of migration and trade have gone beyond the scope of

traditional economic and social variables by categorically separating cultural

characteristics from socio-economic aspects. In evaluating how culture impacts trade

and migration, language has emerged as the most accessible parameter that can serve as

a proxy for culture. For this reason, analysts have been using linguistic distance as a

proxy for cultural distance in recent empirical studies. This is not to say that economists

were previously oblivious to the importance of linguistic diversity. On the contrary,

trade economists (Tinbergen, 1962) realized very early on that sharing a common

language was conducive to trade expansion between countries. Labor economists

Chiswick and Miller (1992) examined whether migration flows were influenced by

common languages and estimated immigrants’ returns to learning the language of their

destination country.1 Despite these studies and their findings, the use of language as an

economic variable remains rather constrained. Previous studies have been limited to

controlling for whether agents share a common language using a ‘binary variable’.

However, a binary variable may fail to capture the broader influence of common

languages on migration and trade. There are only a couple of studies that employ more

sophisticated linguistic measures. For example, Belot and Hatton (2012) use the number

of nodes on the linguistic tree between two languages to construct a linguistic proximity

measure. Belot and Ederveen (2012) use the Linguistic Proximity Index proposed by

Dyen et al. (1992) to show that cultural barriers explain patterns of migration flows

1 The Palgrave Handbook of Economics and Language (2016), Ginsburgh and Weber

2

better than traditional economic variables, such as income and unemployment

differentials. More recently, Adserà and Pytlikovà (2015) developed a refined Linguistic

Proximity Index with an additional feature of quantifiability that allows for better

interpretation. For this reason, our study uses the newer Linguistic Proximity Index from

Adserà and Pytlikovà (2015).

Although there is an extensive list of academic papers that estimate a gravity model

for either trade or migration, hardly any research has considered the two models

simultaneously. This has motivated us to explore the importance of linguistic proximity

within the framework of gravity models for migration and trade concurrently rather than

independently. We believe that there are many common factors that can determine

migration and trade simultaneously. In doing so, we can establish a direct comparative

analysis for two apparently different models.

In this context, this thesis contributes to the ongoing discussion on how language

drives migration. What is the relationship between immigrants’ language skills, their

integration in the host country, and the returns to human capital from their source

country? Likewise, how does linguistic diversity shape bilateral trade between importing

and exporting countries? Are the costs of language acquisition an adequate proxy for

trade costs? In our study, we employ a traditional gravity model, as we believe it is the

most straightforward way to produce statistically comparable results. Gravity models

allow us to derive important economic relationships within our framework. We are also

interested in examining how the geographic distance and the economic size of both

source and destination countries impact migration and trade, and how they interact

with linguistic proximity to control its impact on bilateral flows. This fascinates us

further to determine whether linguistic distance and geographical distance have a

similar impact on migration and trade.

Our empirical approach uses a series of panel regressions on a dataset of 223

countries, including 30 OECD “destination countries”, over a span of 27 years. We use

the software program Stata 15 to estimate our empirical models.

Our results are mostly aligned with economic theory, with few exceptions. We find

that countries that are linguistically closer have both larger trade flows and larger

3

migration flows between them. However, language proximity has a slightly larger effect

on migration, compared to trade. Moreover, we find that geographic distance enters

negatively and has almost same impact on both migration and trade. Finally, we find

that economically larger and richer countries are more involved in trade.

The structure of the thesis is as follows. Section 2 provides a theoretical background

on the subject, followed by a literature review. Section 3 builds our econometric model,

including a discussion on methodology and econometric issues. Section 4 introduces

the data, its construction, and sources. Section 5 presents the empirical results and

provides a discussion. Section 6 presents comparative analyses of our migration and

trade models. Finally, we conclude our thesis in Section 7.

4

2. Theoretical Framework and Literature

This section presents a theoretical background and summarizes previous

economic literature for understanding the underlying drivers of migration and bilateral

trade. The section ends with a note on the gravity model to highlight its relevance for

the topic at hand.

2.1 Economics of Migration

In the current era of globalization in an increasingly interconnected world, labor

markets are more integrated than ever before. Economic globalization, with its rapid

improvements in transportation and communication, has increased human mobility at

a remarkable speed. In 2015, there were an estimated 244 million international migrants

globally, representing 3.3% of the world population2. In comparison, in 2000 that figure

was at an estimated 155 million people, or 2.8% of the world population 3 . Such

developments have made migration economics a fast-growing and exciting research area,

for both policy makers and economists across the globe.

Though the vast majority of people worldwide live in their country of birth, more

and more people are migrating to other countries or regions. The predominant motive

for migrating internationally is employment, as migrant workers represent a large

majority of the world’s international migrant stock, most of whom live in high-income

countries. However, not all migration occurs under positive circumstances. In addition,

migration decisions are rarely made on short-notice, but rather through a gradual

process over time. Structural factors interact with individual characteristics, embedded

in social and cultural factors. Friberg (2012) and Kley (2011) conceptualized the stages

in the migration process. Inspired by their work, and other theories of migration, we can

describe the migration process as a sequence of stages, visualized in Figure (1) below:

2 UN DESA, 2015a. 3 UN DESA, 2015a.

5

Figure 1 Multi-stage Economic Model of Migration

Author’s adaptation based on economic theory

Migration is a process of cultural and economic adjustment and adaptation that

takes place after migrants move from one environment to another. In attempting to

understand why people migrate, some scholars emphasize individual decision-making,

while others stress broader structural forces. Numerous studies on migration have

emphasized the importance of “push” and “pull” factors. Push factors are those that repel

migrants from their country of origin, such as economic dislocation, population pressure,

religious persecution, or denial of political rights. Meanwhile, pull factors attract

migrants to a destination country for example through the prospect of higher wages,

greater job opportunities, and increased social safety. More broadly, both push and pull

factors exist at the individual, family, and structural/institutional levels, as illustrated in

Figure (2).

Figure 2 Multi-dimensional Explanation to Migration-Decision


Pre-decisional Phase

• Structural factors

• Considering migration

Planning Migration

• Choice of destination

• Socio-economic & cultural factors

Settlement

• Integrating

• Labor market

•Focuses on individual or psychological perceptions.

•What advantages individuals expecting to obtain

•Prospects of increased economic opportunity, higher standard of living

Individual

•Focuses on family needs

•Desire of family security and improving wellbeing

•Remittances, cash payments received by family, relatives from migrant member

Familial

•Focuses on broad social, political, cultural and economic contexts that encourage/discourage population movement

•Factors that stimulate migration including better social expenditure infrastructure at destination countries, income differentials between countries.

•Migration induced due to war, political instability

•Other factors: immimigration laws and immigrants' network at destination.

Structural and Institutional

6

Thus, both cultural and economic factors are drivers of migration. While

individual and familial factors can also shape migration decisions, they do not occur in

isolation and are closely related to broader social-economic variables, either directly or

indirectly.

2.2 Migration and Cultural Proximity to Language

Researchers and analysts have explored the field of human migration from as

early as the 1850s and since then, the research in this field has been extensive. Economic

theory historically placed great emphasis on migration as ‘factor mobility’ and on

migrants as ‘factors of production’. Most literature on the determinants of migration is

limited to social and economic aspects with push and pull factors, as summarized in

Section 2.1 above. However, recent research specifically separates cultural characteristics

from economic indicators to better analyze global migration trends. For instance, Belot

and Ederveen (2012) found a strong negative effect of cultural differences on

international migration flows. Another study by Gidwani & Sivaramakrishan (2004)

conceptualizes the cultural dynamics of migration by examining the linkages between

culture, politics, space, and labor mobility. Manning and Roy (2010), in a theoretical and

empirical framework, discuss the cultural assimilation of immigrants in the UK, British

identity, and the views on rights and responsibilities in societies4. They find that almost

all UK-born immigrants5 see themselves as British and others feel more British the

longer they stay in the UK. However, not all the white UK-born population thinks of

these immigrants as British, because they are more concerned about values than

national identity. In yet another study, Mayda (2009) investigated the determinants of

migration inflows in 14 OECD countries and examined the impact of geographical,

cultural, and demographic factors, as well as migration policy. She found her results to

be consistent with international migration models. Having established the importance

of cultural variables in shaping migration behavior, we identify language as the most

prominent and measurable one. The role of language in migration is a relatively new

field of research. Previous literature has shown that immigrants’ fluency in the language

4 International Handbook on the Economics of Migration, Constant and Zimmermann (2013) 5 UK-born children of immigrant parents

7

of the destination country or their ability to learn it quickly play a vital role in

transferring existing human capital from the country of origin to the destination country,

and generally boost immigrants’ success in the destination countries’ labor markets.

These findings are supported by the works of Kossoudji (1988), Dustmann (1994),

Dustmann and van Soest (2001, 2002), Chiswick and Miller (2002, 2007, 2010), and

Dustmann and Fabbri (2003). Furthermore, Bleakley and Chin (2004, 2010) find that

linguistic competence is a key variable in explaining disparities between immigrants in

terms of their educational attainment, earnings, and social outcomes. Studies by

Chiswick and Miller (2005) show that it is easier for a foreigner to acquire a language if

his or her native language is linguistically closer to the target language. In addition, a

widely-spoken native language in the destination country can act as a pull factor in

international migration, as the costs associated with skills transferability are lower.

While the above-mentioned studies establish the important role of language in

international migration, the contributions of Adserà and Pytlikovà (2015) stand out as

the first to disentangle the multidimensional roles of language by taking into account

linguistic proximity, widely-spoken languages, linguistic communities, and language-

based immigration policies at the destination country. Their study examined the

determinants of migration by collecting a unique and large dataset on annual migration

stocks and flows for 30 OECD destination countries. They constructed a new set of

refined indicators for the linguistic proximity between two languages based on

Ethnologue (an encyclopedia of languages). Their main findings conclude that migration

rates are 20% higher among countries with a common first official language. The same

result also holds when considering the linguistic proximity between any other official,

main, or major language. Few economic studies employ such a multidimensional

linguistic measure.

2.3 Economics of International Trade

From a historical perspective, international trade has changed the world

drastically over the last few centuries. Technological advancements have intensified

world trade. Recent decades have been marked by decreased transportation and

8

communication costs alongside a rise in preferential trade agreements, particularly

among developing countries. Free international trade is generally considered to be

desirable because it allows countries to specialize. This is basically the essence of

comparative advantage, the foundational argument supporting gains from trade in

economic theory. It is therefore not surprising that most countries that export goods to

another country, also import goods from the same country.

Globally, international trade has reached remarkable levels. Total world exports

(merchandise and services combined) in 2015 were valued at 21 trillion US dollars

(World Trade Statistical Review, 2016 WTO). As of today, the sum of exports and imports

across nations accounts for more than 50% of global production. The volume of world

trade grew at a rate of 2.7% in 2015, which was roughly in line with the growth rate of

world GDP at 2.4% in the same year (WTO, 2016).

Beyond economic trade theory, practically there are also other factors as well that

determine trade patterns. Significant contributions have been made to the study of both

traditional and new determinants of bilateral trade flows, notably in the works of Julian

Gourdon (2009, 2011) and Nicita and Tumurchudur-Klok (UNCTAD, 2011) 6 . One

important determinant of trade is the geographic distance between two countries. Both

economic theory and empirical studies have shown that trade diminishes dramatically

with distance. The impact of geographical distance on trade has long been studied in the

empirical economics literature, typically through gravity models of trade. The main

finding has been a strong negative relationship between geographic distance and trade

(Eaton and Kortum, 2002). Another important factor is a country’s economic size.

Bilateral trade has been found to be directly proportional to the respective GDP of the

trading partners. Furthermore, Ventura (2006) reported a strong positive correlation

between per capita income growth and trade growth. Meanwhile, in other academic

research, researchers have also used geography as a proxy for trade (Frankel and Romer,

1999). Following this logic, the authors suggest that trade affects economic growth as

well. In addition, some patterns of trade seem to be rather arbitrary, in that countries

do not always export labor-intensive goods to capital-intensive countries and vice versa.

6 Alessandro Nicita and Bolormaa Tumurchudur-Klok Report UNCTAD (2011), Geneva

9

Linder (1961) explains the emergence of zero trade flows with differing demand patterns.

The demand for certain goods depends heavily on income level, which explains why

some goods are not demanded in certain parts of the world. However, this proposal has

its limits, because even if we control for proximity and gravity factors like size, distance,

and Regional Trade Agreements (RTAs), every country has a potential outlier trade

partner that cannot be entirely explained by similarity in income levels alone, for

example due to neighboring countries, or countries in origin’s area of influence, or

countries with historical ties (such as colonialism). The U.S. trades inexplicably high

amounts of goods with Mexico and Canada, while the same holds for China, Japan, and

Korea. Similarly, looking at India, we can consider Sri Lanka, Pakistan, Bangladesh, and

Nepal as outliers. In contrast, Germany seems to be completely in line with economic

predictions, mainly because Germany has a similar income level as the EU member

states. In sum, the determinants of bilateral trade can be visualized in Figure (3) below:

Figure 3 Determinants of Bilateral Trade Flows


•Gravity variables have been primary focus for empirical studies of international trade

•Distance

•Common border

•Cultural distance

•Colonial ties

•Economic Scale, measured through GDP

•Population, which is indirectly taken into account if GDP per capita is used

Gravity Variables

•A country's factor endowments are an important determinants of country's pattern of trade. This is widespread acceptance of H-O model of international trade.

•Human Capital, measured by labor inputs

•Physical Capital

•Land

Factor Endowements

•There are other factors that may dampen or stimulate trade.

•Trade barriers like quota

•Technology

•Regional trade agreements, trade treaties, free trade agreements

Other factors

10

2.4 International Trade and Cultural Proximity to Language

The theory of international trade is one of the oldest branches of academic

literature. The significant research and empirical work undertaken to date has explored

the determinants of international trade through multiple dimensions. Among them,

cultural distance has been identified as an important determinant of bilateral trade.

However, empirically quantifying and testing “cultural distance” is difficult due the

elusiveness of the concept and its lack of observability. Cultural proximity has received

less attention, although empirical trade flow models typically include it in some way or

another (Boisso and Ferrantino, 1997). Cultural proximity may influence bilateral trade

through two channels: A preference channel and a trade cost channel. In other words,

two culturally close countries tend to have a higher propensity to trade because they

have strong tastes for each other’s products and because trade costs between them are

relatively low. Felbermayr and Toubal (2010) attempted to disentangle the trade cost

channel from the preference channel of cultural proximity in an empirical bilateral trade

flow model.

Social scientists and economists often define culture through trust, language, and

information. In particular, cultural and language differences result in communication

barriers, which reduce the level of trust between economic agents and make exchanging

information costlier. Economic theory and empirical evidence suggest that sharing a

similar official language or speaking a common language provide an important stimulus

in international economic exchange. Gokmen (2017) studied the clash of civilizations

hypothesis using trade data and measures of cultural differences. He evaluated how the

impact of cultural differences on trade evolved over time during and after the Cold War.

He found that the negative influence of cultural differences on trade has become more

prominent in the post-Cold War era. For instance, ethnic differences reduced trade by

24% during the Cold War, whereas this reduction is 52% in the post-Cold War period.

This suggests that the differential impact of cultural differences may vary over time.

Recently, there has been widespread agreement that cultural proximity plays an

important role in determining trade flows between countries. The literature has used

11

different variables to proxy cultural ties, such as the existence of a common language,

religion, or ethnicity (Boisso and Feeantino, 1997; Frankel, 1997; Melitz, 2008). While

these variables clearly capture cultural proximity, they also reflect other trade-creating

factors, such as communication costs. There is progressive work in the literature

analyzing the relationship between common language and international trade. As

predicted by the gravity model and shown empirically in Melitz (2008), Egger and

Lassmann (2011), Melitz and Toubal (2014) and Egger and Lassmann (2015), common

native and spoken linguistic traits affect different margins of bilateral trade to a

considerable extent. Egger and Lassmann (2012) found that having a common language

increases trade flows by about 40%.

The interesting works of Melitz and Toubal (2014) construct a new series for

common native languages for 195 countries, used together with series for common

official language and linguistic proximity to draw inferences about the aggregate impact

of all linguistic factors on bilateral trade and to isolate the role of facilitated

communication from those of ethnicity and trust. Their results imply that the impact of

linguistic factors, taken together, is at least twice as large as the binary variable for

common language typically used in other studies.

Language barriers are now considered more important to international trade than

previously thought. Lohmann (2011) used a language barrier index that quantifies more

detailed linguistic data to show that language barriers are significantly and negatively

correlated with bilateral trade, using a gravity model of trade. Further, there is also

prominent research on the role of English as a second language in overcoming language

barriers, like in Ku and Zussman (2010). As of today, English is the leading candidate to

play the role of lingua franca. Their study demonstrated that the ability to communicate

in English has a strong effect in promoting trade across the globe.

2.5 Gravity Models of International Trade and Migration

Gravity models have been one of the most successful empirical tools, especially

in the international trade literature. Researchers from various fields have used gravity

models to predict population movement, cargo shipping volume, inter-city

12

telecommunication, as well as bilateral trade flows between countries (Simini et. al.

2012). While the first contemporary form of a gravity model in social sciences can be

traced back to 1946 (George Kingsley), it was not until 1962 that the first empirical

application of the gravity model was made, by Nobel Laureate Jan Tinbergen, to examine

international trade flows. Since then, international trade researchers have widely used

the model to provide more accurate predictions about trade flows between countries for

various goods and services. Its recent applications have estimated and interpreted

spatial relations in both trade and factor movement (James E. Anderson, 2011).

The gravity model of international trade is based on Newton’s universal “law of

gravitation” and states that bilateral trade between two countries is proportional to

the product of countries’ sizes (measured by GDP) and inversely proportional to the

distance between them (Head and Mayer, 2013). This implies that countries that are

relatively closer geographically and similar in economic size will be more engaged in

bilateral trade due to reduced trade costs (Feenstra, 2016). Empirically, trade volumes

between two countries, say exporting Country i and importing Country j, can be

explained using the simplest form of the traditional gravity equation:

𝑙𝑛𝑋𝑖𝑗 = 𝑙𝑛𝑌𝑖 + 𝑙𝑛𝑌𝑗 − 𝜌𝑙𝑛𝐷𝑖𝑗

Where, all variables have been taken into their natural logarithmic form:

𝑙𝑛𝑋𝑖𝑗 = 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑒𝑥𝑝𝑜𝑟𝑡𝑠 𝑓𝑟𝑜𝑚 𝑖 𝑡𝑜 𝑗

𝑙𝑛𝑌𝑖 = 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝐺𝐷𝑃 𝑖𝑛 𝑖

𝑙𝑛𝑌𝑗 = 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝐺𝐷𝑃 𝑖𝑛 𝑗

𝑙𝑛𝐷𝑖𝑗 = 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝑓𝑟𝑜𝑚 𝑖 𝑡𝑜 𝑗

𝜌 = 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟

Analogous to Newton’s law of gravity:

Gravity equation of international trade is,

𝑋𝑖𝑗 = 𝑘𝑌𝑖 𝑌𝑗

𝑑𝑖𝑗𝜌

Yi and Yj, value of GDP in i and j

dij, distance from i to j

k is a constant

Taking logs,

lnXij =lnk+lnYi +lnYj – plndij

Gravitational force between two objects is,

𝐹𝑖𝑗 = 𝐺𝑚𝑖 𝑚𝑗

𝑑𝑖𝑗2

mi and mj, mass of two objects

dij, distance between them

G, is gravitational constant

Taking logs,

lnFij =lnG+lnmi +lnmj – 2lndij

13

The term “gravity equation” refers to the specification of the determinants of

bilateral trade flows. Theoretically, several studies have modelled the gravity equation,

including Krugman (1980), Eaton and Kortum (2002), and Melitz (2003). Generally,

across many applications, the estimated coefficients on “mass” variables (represented

by GDP) cluster around value 1, while the distance coefficients cluster close to –1. Most

observations are close to the fitted line, capturing about 80-90% of the variation in

trade flows. The fit of the traditional gravity model improves when other proxies for

trade frictions are incorporated, such as geographical borders and common language

effects. An interesting study by McCallum (1995) called Border Puzzle found intra-

national trade is much higher than inter-national trade. He compared trade flows

between Canadian provinces and across the U.S. and Canadian border and,

surprisingly, trade between Canadian provinces was 22% higher.

As gravity models continued to be tested empirically, the traditional model

received considerable criticism. The main critique was its lack of theoretical

foundation. However, numerous studies have attempted to bridge this theoretical gap

and present more functional forms of the gravity model, notably Anderson (1979),

Bergstrand (1985, 1990), and, more recently, Head and Mayer (2013). By 2004, gravity

models’ connection to economic theory became well-established (James E. Anderson,

2011) and they began appearing in economics textbooks (Feenstra, 2004).

The 1990s witnessed an ongoing debate about reshaping the specification of

gravity models into a better functional form. A more elaborated theoretical framework

was presented by Anderson and Van Wincoop (2003), which has been well

appreciated in the applied literature. Their gravity methodology was also presented

as a solution to McCallum’s border puzzle. However, despite its popularity, this new

model also had its assumptions and limitations. Nevertheless, recent debates on

gravity are about estimation techniques. Traditional literature employed ordinary

least squares (OLS) methods to estimate the gravity equation. However, recent

literature, for example Redding and Venables (2004), has suggested including

importer and exporter fixed effects to capture multilateral resistance terms and to

control for omitted variables. Silva and Tenryro (2006) contributed further by

14

suggesting using the Poisson method to treat zeroes in the trade data. Economic

literature shows that gravity models has also been successfully applied to estimate the

impact of a common (official or spoken) language on bilateral trade. Egger and

Lassmann (2012) found that, on average, a common or spoken language between

countries directly increases trade flows by 40%.

More recently, using gravity equations to estimate migration flows has become

quite popular among researchers and economists. We now turn to the gravity model

of international migration, founded in the groundbreaking work of Ravenstein (1889)

who used the gravity model to study migration patterns in the UK. Years later,

Steward (1941) specified aggregate models of migration, which may be considered as

modified versions of gravity models. These models are “gravity-like” because they

hypothesize that migration is directly related to population size in the origin and

destination regions and inversely related to distance. The same modeling techniques

used in the trade literature can also be applied to migration flows (Anderson, 2011)

and other interactions. Again, adapting from Newton’s law of gravity, the functional

form of gravity equation for migration can be presented as:

𝑇𝑖𝑗 = 𝐺𝑚𝑖

𝛼 𝑛𝑗

𝛽

𝑓(𝑑𝑖𝑗)

Where, Tij is the number of individuals (or immigrant rate) that move from location i

to location j per source population, per unit of time, and is expressed as being

proportional to some characteristics of the source (mi) and destination (nj) locations,

like population size, while at the same time declining with the distance dij between

them. 𝛼 , 𝛽 are two adjustable exponential coefficients.

Overall, we find that academic literature provides a significant evidence to

nominate language as the best candidate for representation of cultural characteristics

in a gravity model setting, either for trade or migration.

15

3. Empirical Model

This section describes the empirical strategy used to model how linguistic

proximity affects international migration flows to a particular destination, as well as how

linguistic proximity may influence bilateral trade. This section also presents the

econometric methodology adopted to estimate our empirical gravity-based models.

3.1 Estimating Equations

Previous empirical findings and theoretical background, as discussed in Section

2, have shown that we cannot fully understand global trade and migration by relying on

economic variables alone, without explicitly accounting for cultural factors as well. In

terms of statistical interpretation, the most realistic proxy for culture is language. In the

context of migration, language skills are a key variable in explaining immigrants’

disparities in educational attainment, earnings, and social outcomes. Likewise, language

(and cultural) barriers create trade costs that can stifle international trade. However, the

way in which previous studies accounted for language has been limited to controlling

for sharing a common language through a binary variable. A binary variable may not

capture the full impact of language on bilateral trade and migration. Therefore, in this

thesis, we employ the elaborated index of linguistic proximity developed by Adserà and

Pytliková (2015), which has gained significant popularity in academic world since it was

first introduced.

Our objective is to estimate the impact of linguistic proximity as a driver of

migration and trade simultaneously, using the same independent variables. Our

approach is to study these models in parallel. To the best of our knowledge, no previous

academic work has taken our specific approach to answering this question. Our

approach enables us to conduct a parallel comparative analysis of two apparently

different models. Furthermore, we are interested in examining how gravity variables

(such as countries’ geographical distance and economic size) interact with linguistic

proximity to control its impact on bilateral flows of goods and people. We would also

16

like to examine whether linguistic distance and geographical distance impact migration

and trade with the same magnitude.

a. Migration Model

For simplicity, we employ a basic gravity model, which allows for straightforward

interpretation of the coefficients. In addition, we use panel regressions with fixed effects

to control for omitted variables and unobserved characteristics. We opted for a log-

linear model using the count method to fit the non-negative dependent variable. In this

way, estimation results will not be biased. A common problem in estimating gravity

models is that the data often contains many observations with value zero, which we

resolve by adding 1 to each observation of immigration flows. Thus, when taking the

logarithm, we do not discard zero observations. Our migration model based on gravity

methodology using fixed effects reads as:

𝐥 𝐧(𝑴𝒊𝒈𝒓𝒂𝒕𝒊𝒐𝒏𝒊𝒋𝒕) = 𝜷𝟎 + 𝜷𝟏𝐥 𝐧(𝑮𝑫𝑷𝒊𝒕−𝟏𝒑𝒄) + 𝜷𝟐𝐥 𝐧(𝑮𝑫𝑷𝒋𝒕−𝟏𝒑𝒄) +

𝜷𝟑𝐥 𝐧(𝑮𝑫𝑷𝒊𝒕−𝟏) + 𝜷𝟒𝐥 𝐧(𝑮𝑫𝑷𝒋𝒕−𝟏) + 𝜷𝟓𝐥 𝐧(𝑫𝒊𝒋) + 𝜷𝟔 𝑳𝒊𝒋 + 𝜹𝒊 + 𝜹𝒋 + 𝜽𝒕 + 𝜺𝒊𝒋𝒕 (1)

The key variables in equation (1) are:

Migrationijt, denotes the gross flow of migrants from source country i to destination

country j (in other words, the number of emigrants from country i who immigrate to

country j) at time t, where i = 1, 2 …. 223; j = 1,2 …. 30; and t = 27 years;

GDPit-1pc, denotes GDP per capita (in current US $) in source country i, t-1 is the

economic indicator to capture development effects (lagged);

GDPjt-1pc, denotes GDP per capita (in current US $) in destination country j, t-1 is the

economic indicator to capture development effects (lagged);

GDPit-1, denotes GDP (in current US $) of source country i at time t-1 to capture

economic size and provide country-specific characteristics;

GDPjt-1, denotes GDP (in current US $) of destination country j at time t-1 to capture

economic size and provide country-specific characteristics;

Dij, denotes the geographic distance between the capital cities of source country i

and destination country j in kilometers;

17

Lij, denotes the Linguistic Proximity Index, which measures the linguistic closeness

between country pairs;

Ꝋt, denotes year dummies to control for common idiosyncratic shocks over the time

period and robust errors, clustered at each source-destination country pair;

δi δj, denotes country of origin and country of destination fixed effects separately to

capture unobserved characteristics;

Ꞓijt, denotes the idiosyncratic error term.

All variables used in the estimations, except for dummy variables and the

Linguistic Proximity Index, are expressed in their natural logarithm form. This

specification has the added advantage of easy interpretation of the estimated parameters

as being relative elasticities – in other words, by how much do migration flows between

a given source-destination country pair increase when GDP of either country increases

by 10%, while holding all other variables constant.

The relative differences in economic development and size between origin and

destination countries are lagged by one period to account for the information available

to a potential migrant at the time of deciding whether or not to migrate. Lagging the

economic explanatory variables and treating them as predetermined with respect to

current migration flows reduces the risk of reverse causality in our model.

The specification in equation (1) assumes that migration inflows to a particular

destination are driven by differences in economic development, economic size between

source and destination countries, and the costs of migration in the form of language

barriers and geographical distance. We expect economic theory to hold true that

migration decisions involve a comparison of country- or region- specific variables. High

wages or income levels in the source country and high migration costs discourage

emigration. GDP per capita is normally used as a proxy for wages or income, while the

physical and cultural distance between two countries acts as a proxy for migration costs.

The existing literature confirms that the propensity to migrate decreases with higher

levels of GDP per capita in the source country and larger distances between the source

country and destination country, as the economic incentives to migrate to other

countries decline. A potential emigrant takes these variables into account when

18

choosing to reside in the country where his utility is maximized among the possible

destinations. To control for the economic size of countries and to provide their specific

characteristics, we also include GDP at both source and destination countries. We

assume that the propensity to migrate between a specific country pair depends in part

on economic size, as countries with relatively higher GDP or that are more developed

tend to attract more immigrants.

Beyond the economic dimension, we also expect costs associated with migration

to increase with the geographical, cultural, and linguistic distance between countries.

Variable Lij, includes measures of linguistic proximity between countries (details in

subsequent Section 4 on data construction) and tests our hypothesis: Whether larger

language barriers increase migration costs for a potential migrant by posing barriers to

skills transfer and integration in the receiving country. We use log-distance in

kilometers between the capital cities of sending and receiving countries to control for

the effect of geographical distance. Finally, our model includes year dummies to control

for common idiosyncratic shocks over the time period. Our model also contains country

of destination and country of origin fixed effects separately to capture unobserved

characteristics, for example immigration policy in destination country, credit market

constraints at origin as well as climate, openness towards foreigners or culture in each

country, networks of family and friends, and the population from the same origin

already living the destination country, among other things. Meanwhile, the country

fixed effects terms mitigate the risk of omitted variable bias by controlling for any

unobserved permanent differences between countries.

b. Trade Model

In parallel to the migration model described above, we estimate trade flows using

a modified gravity model with fixed effects on panel data. Our basic model takes the

form of:

𝐥𝐧(𝑻𝒓𝒂𝒅𝒆𝒊𝒋𝒕) = 𝜷𝟎 + 𝜷𝟏 𝐥𝐧(𝑮𝑫𝑷𝒊𝒕−𝟏𝒑𝒄) + 𝜷𝟐 𝐥𝐧(𝑮𝑫𝑷𝒋𝒕−𝟏𝒑𝒄) + 𝜷𝟑 𝐥𝐧(𝑮𝑫𝑷𝒊𝒕−𝟏) +

𝜷𝟒 𝐥𝐧(𝑮𝑫𝑷𝒋𝒕−𝟏) + 𝜷𝟓 𝐥𝐧(𝑫𝒊𝒋) + 𝜷𝟔 𝑳𝒊𝒋 + 𝜹𝒊 + 𝜹𝒋 + 𝜽𝒕 + 𝜺𝒊𝒋𝒕 (2)

The dependent variable in equation (2) is Tradeijt,, which denotes gross trade flows from

19

exporting country i (akin to “source country” in the migration model) to importing

country j (akin to “destination country”) at time t. It is worth noting that the trade data,

collected from the Direction of Trade Statistics (DOTS), often reports two different

values for the same flow from Country A to B, due to differences in how Country A

reports its imports from Country B versus how Country B reports its exports to Country

A. Some studies suggest taking the average of these two values. This is to say that a trade

flow could be either an export or an import, depending on which country is reporting

the exchange. In addition, trade data is recorded in millions of dollars with 1 or 2 decimal

places, which can give rise to erroneous zero values, while actual trade is not zero. Many

missing observations are substituted with zeroes. Structured zeros are better treated as

missing observations rather than true zeroes.

The independent variables in equation (2) are same as those in equation (1), but

they work in a different fashion. As before, all variables used in the estimation, except

dummy variables and the Linguistic Proximity Index, are expressed in natural logarithms.

The zeroes issues in the data is treated by adding one to each observation of trade flows

so that taking the logarithm does not discard zero observations. We predict trade flows

using fixed effect regressions to account for unobserved variables within the data. We

also include the GDP of both exporting and importing country since these variables are

assumed to be correlated with the unobservable effects. We also use year dummies and

country-fixed effects in our estimation to control for heterogeneity within the model.

Using fixed effects mitigates omitted variable bias in the panel data arising from

unobserved differences across entities (countries i and j) that do not change over time.

Alternatively, omitted variable bias could be due to differences that affect all countries

in the same way, but that vary over time. We address this by lagging economic

explanatory variables and treat them as predetermined with respect to current trade

flows to reduce the risks of reverse causality in our model.

Our empirical gravity model in equation (2) estimates the determinants of

bilateral trade. These are driven by differences in the economic size of trading countries

and trade costs. We expect exports to rise proportionally with the economic size of the

destination country and imports to rise in proportion to the size of source economy.

20

Empirically, bilateral trade patterns between two countries are proportional to their

GDPs and inversely proportional to the distance between them. As previously discussed,

cultural proximity may influence bilateral trade through both preferences and trade

costs. Two culturally close countries may trade a lot because they have strong tastes for

each other’s products and/or because trade costs are relatively low. Sharing a common

language is an important symmetric component of cultural proximity, as are religion

and ethnicity. In this empirical bilateral trade flow model, we may attempt to

disentangle the trade cost channel (i.e. physical distance) from the preference channel

of cultural proximity (i.e. linguistic distance) to estimate whether linguistic distance and

geographical distance have a similar impact on trade.

3.2 Estimation Methodology for Gravity Equations

We now turn to the econometric methods used to estimate the standard gravity

model of migration and trade.

a. The Ordinary Least Squares (OLS) Estimators

Ordinary Least Squares (OLS) is the most widely-practiced method to analyze

data. It has become the starting point for regression analysis, particularly in social

sciences. The OLS estimation follows the assumptions of the classical linear regression

model. The regression coefficients predict the estimated regression line as closely as

possible to the observed data. OLS is the best application to study the econometric

relation of bilateral flows with respect to GDP and distance. Furthermore, OLS has been

the most practical tool to estimate traditional gravity models. Using OLS for gravity

model estimates requires the following conditions to hold:

The other factors contained in gravity error term (𝜺𝒊𝒋𝒕) have conditional mean zero

and are uncorrelated with each of the explanatory variables.

The errors are independently drawn from a normal distribution with a constant

variance (homoscedasticity assumption).

None of the explanatory variable is a linear combination of another explanatory

variable (no multicollinearity).

21

If all three are conditions met, the OLS estimates are consistent, unbiased, and

efficient. In our estimation, there are unobserved characteristics that might affect

migrant flows and trade volumes, which means that the homoscedastic error terms

assumption is very strong and may not necessarily hold, thus making OLS estimates

inefficient. The OLS presumes that error terms in trade data are independent for each

country-pair such that country i exports to country j are independent of imports to i

from j. This assumption would be violated in practice, if a country is producing and

exporting a product that is being used as an input in the production process of another

country. Thus, gravity error terms are correlated with each other and its OLS estimates

are inefficient, but this does not affect consistency and unbiasedness. As we are using a

log-linear model, there is a risk to efficiency due to missing information in the presence

of zero valued observations. However, we resolve this problem by adding 1 to each

observation of trade flows and migration flows. We use year dummies (a time factor t)

which control for common idiosyncratic shocks over time and to capture the influence

of aggregate trends which might affect the explanatory variable(s). This is important, as

time-variant variables are falsely related to aggregate trend variables like economic

development, inflation, and population growth. This is more probable in the case of

panel data. Conventional OLS regression serves as a baseline reference point in our

results, which is then compared to alternative specifications with country-fixed effects.

b. The Fixed Effects Regression

Estimates from OLS have a tendency to produce biased results due to their

inability to take unobserved variables into account, which are instead captured by the

error term. In contrast, fixed effects regressions on panel data control for omitted

variables if the omitted variables vary across entities but are constant over time.

In panel data, the regression error can be correlated over time within an entity.

As with heteroscedasticity, this correlation does not introduce bias into the fixed effect

estimator, but it does affect the variance of the fixed effects estimator. By using fixed

effects regressions, we control for omitted variables in the traditional gravity model. This

does not require the assumption of symmetrical trade costs. We take country-specific

fixed effects into account by creating dummies for exporting and importing countries.

22

This way, we assume an unobserved heterogeneous component as being constant over

time within each specific country. Potential unobserved sources of heterogeneity in the

trade model are, for example, trade policies, regional trade agreements (RTAs), or

treaties signed between countries affect trade volumes, while in the migration model,

these sources could be the stock of immigrants already at destination, migration policy,

or cultural proximity affect bilateral migration flows. Likewise, fixed effects in migration

models means that dummy variables equal to unity each time a particular country

appears in the dataset. The standard errors in fixed effect regressions are called clustered

standard errors and are robust to both heteroscedasticity and correlation over time

within an entity.

Fixed effects also come with some issues. Variables that only vary in the same

dimensions as the fixed effects will become perfectly collinear and the regression will

drop them. At the same time, fixed effects regressions do not solve endogeneity bias

because unobserved variables might vary in another dimension than the fixed effects.

Hence, absence of omitted variable bias is not fully guaranteed. Endogeneity occurs

when an explanatory variable is correlated with the error term.

Another inadequacy of using fixed effects estimation is that it cannot handle zeroes

in the dependent variable, and instead treats them as missing. This is similar to the case

with trade flows, which consists of structured zeros due to missing observation or ‘false’

zero values. However, as discussed earlier, migration and trade data sets are partially

treated for zero issues by adding 1 to each observation of migration flows and trade flows.

Thus, the log-linear migration model is developed using the count method to fit the

non-negative dependent variable. This way, estimation results are not biased.

c. The Poisson Fixed Effect Regression

An alternative way to address zeroes in the data is to use a Poisson regression, which

also accounts for heteroscedasticity. As a robustness check and alternative method to

estimate gravity models, I also employ a Poisson regression, which is consistent with

fixed effects. These results are discussed separately in the robustness section.

23

4. Data Description and Structure

This section describes the data and variables that used in the empirical analysis.

It also presents summary statistics for the population under study. Variables are defined

in the Appendix.

4.1 Data Sources

To analyze and estimate the two empirical models specified in Section 3, we

require information on bilateral flows (for both migration and trade) and a measure of

linguistic proximity that, along with gravity variables, can establish a meaningful and

correlative relationship. This data was obtained through two different sources to fit both

bilateral flow gravity models.

a. International Migration Data

We use a dataset adapted from Adserà and Pytliková (2015) containing

information on migration flows from 223 source countries to 30 OECD destination

countries over the period 1980-2010. This data was primarily collected through national

statistical offices in OECD countries, and supplemented with data from the OECD

International Migration Database. This is a comprehensive dataset with respect to

destination countries, origin countries, and time. The dataset is quite extensive with

over 100 variables. In addition to the variables available directly, a number of other

variables related to gravity and that impact migration patterns were generated from

existing variables. OECD countries are the most reliable source of migration data. The

list of variables used in the analysis and their definitions is found in Table A.2 in the

Appendix section.

Figure 4 below illustrates the migration inflow at OECD countries from various

host countries around the world during the period of almost 30 years. We have selected

8 countries from our data of 30 OECD countries to roughly cover each region, also the

information on each of these countries is mostly available, thus making our panel

somewhat balanced. From the visualization of stacked countries, we find that the USA

24

is receiving bulk of immigrants with fluctuations over some periods. The USA is getting

steadily increasing number of migrants over years until they reaches at peak during

1990s. Australia and Canada have relatively stable flow, with growing number of

migrants at an increasing rate as compared to rest of the countries. Meanwhile, Germany

is also receiving a significant surge of migrants.

Figure 4 Migration Flows to selected OECD Countries from all world, 1980-2006

Source: Author’s tabulation based on data on migration flows

b. Trade Data

In addition to migration data, we also used a gravity dataset from a geographical

database owned by CEPII, a French research center in international economics that

produces research, analyses, and databases on the world economy. This dataset includes

the variables and economic indicators commonly used in gravity models. For my analysis

of trade flows, I am interested in data on the GDP of both destination and source

countries. This dataset contains complete information for all pairs of countries (in total

224) from the period 1948–2006. Data on GDP and population size were obtained from

the World Bank Development Indicators (WDI). CEPII is considered as the most

accurate data on trade flows. The data is collected from the Direction of Trade Statistics

(DOTS), which often reports two different values for the same trade flow between two

25

countries, as discussed in Section 3. Some of the new variables were generated from the

data to better control the results. Variable details and definitions are listed in the

Appendix.

Figure 5 below illustrates the trade flows at OECD countries from various host

countries around the world during the period of almost 30 years. We have selected 8

countries from our data of countries as sample to broadly cover each geographical region

for OECD, also the available information on each of these countries is complete with no

missing information, so our panel for trade data is balanced. From the visualization of

overlaid countries, we can see that all countries have an increasing trend in trade. The

USA is the biggest market in terms of trade volumes. Japan and Germany follow the

almost same pattern, while Canada is relatively stable in its trade business with

increasing rate.

Figure 5 Trade Flows to selected OECD Countries from all world, 1980-2006

Source: Author’s tabulation based on data on trade flows

c. Linguistic Distance Measure

We primarily used three different indices of linguistic distance, namely:

i. The Linguistic Proximity Index (newly constructed), based on information

from Ethnologue;

26

ii. The Levenshtein Distance, developed by the Max Planck Institute for

Evolutionary Anthropology; and

iii. The Dyen Linguistic Proximity Measure proposed by Dyen et al. (1992).

Most previous studies include a simple dummy for whether two countries chare

a common language, which may not capture the true impact of language. We instead

use the Linguistic Proximity Index (LPI), used only in a couple of other studies to date.

Compared to a dummy variable, this index is more inclusive, provides a better-adjusted

and smoother indicator of proximity, and is quantifiable. The index ranges from 0 to 1

depending on how many levels of linguistic family tree the languages of the destination

and source countries share. Prior to constructing the index, a set of increasing weights

are defined as explained by Adserà and Pytliková (2015) as follows:

The first [weight] equal to 0.1 if the two languages are related at the most

aggregated linguistic level; the second equal to 0.15 if two languages belong to the

same second linguistic tree level; the third equal to 0.20 if two languages belong to

the same third linguistic tree level; and the fourth equal to 0.25 if both languages

belong to the same fourth level of linguistic tree family.

If two languages are identical, then the Linguistic Proximity Index takes value 1.

Otherwise, for languages that are different, the Linguistic Proximity Index is constructed

as the sum of the above four weights to capture the maximum number of shared

linguistic family tree’s branches.

Index = 0} if two languages do not belong to any common language family

Index = 1} if two countries have a common language

Thus, the Linguistic Proximity Index equals 0.1 if two languages are only related at

the most aggregated level of the linguistic; 0.25 if two languages belong to the same first

and second linguistic tree level; 0.45 if two languages share up to the third linguistic tree

level; and 0.7 if both languages share the first four levels. A good example of the latter

case are Scandinavian languages (Danish, Norwegian, and Swedish). My analysis

includes only first official language in each country pair, which is strong enough to

capture the effects of linguistic proximity in our hypotheses under study.

27

However, the visualization below presents a detailed information about the

distribution of migration flows by the linguistic tree level the source and destination

country share. It covers all country-pairs, over the period 1980-2006 with individual

representation of first official, all official & main and major languages.

Figure 6 Migration Flows by Linguistic Proximity Index OECD, 1980-2006

Source: Author’s adaptation7 based on Data on Migration Flows

During the period of 1980-2006, there were almost 110 million people migrating

to another OECD country: among them about 14.6 million people migrated to countries

that share the same first official language and about 40 million people migrated to

countries whose first official languages did not have any level in common with that of

their country of origin. The largest proportion of migrants almost 44 million, migrated

to countries whose languages share only the most aggregate linguistic tree family and

about 1.6, 6.9 and 2.1 million to countries sharing the second, third and fourth level of

linguistic tree respectively. The overall pattern is not different in terms of migration by

major language spoken, though more migrants are moving to countries with major

languages very distant. When all official and main languages are considered, the flows

to destinations with a common language are strikingly higher. This is of course partially

due to the fact that countries shared a common colonial past.

7 Adserà and Pytliková (2015)

28

By adopting the same functionality as above, we used trade data and derived the

distribution of trade flows by the linguistic tree level the exporting and importing

country share. From figure 7 below, we find that the overall pattern of distribution is

interestingly quite comparable to that of migration flows’. Based on our data, there is

total trade of about 100 trillion US dollars between OECD and all world countries during

1980-2006. Out of this, about 10.6 trillion US dollars trade was carried out between

those trading partners that share the same first official language and about 30.3 trillion

US dollars’ worth of business was conducted between countries whose first official

languages did not have any level in common.

Figure 7 Trade Flows by Linguistic Proximity Index OECD, 1980-2006

Source: Author’s tabulation based on data on trade flows

When all official and main languages are considered, the trade flows to OECD

countries with a common language are strikingly higher, similar to migration patterns.

4.2 Data Merging

To produce a single dataset for estimation purposes, we have merged the trade

and migration data, then extracted the variables of interest. Both datasets include

information about source and destination countries and the corresponding observation

year. This information served as the basis for merging. Countries were listed by their

standard ISO 3-letter codes. In addition, a specific numeric code for each country was

29

combined to produce a unique ID for each country-pair (i.e. destination country

numeric code ×1000 + source country numeric code). These IDs enabled us to deal with

the string nature of the variable country name or its 3-letter ISO code. We discovered

that the CEPII data incorporated a longer observation period than did the migration

dataset. The migration data covered the period 1980–2010 with 215,040 observations,

while CEPII covered the period 1948–2006 with 1,204,671 observations. Almost 65% of

migration data was merged with CEPII data, while the remaining data was dropped.

However, most of the observation recorded for recent years overlapped. Following the

merge, our final dataset contained 137,472 observations over a span of 27 years, which is

a considerably good number.

4.3 Dependent Variable(s)

We want to study two relationships: (1) The effect of language proximity on

migration decisions in parallel to (2) the effect of language proximity on trade volumes

between any two pairs of countries. Therefore, Migration Flows and Trade Flows are our

two dependent variables in their respective gravity models.

4.4 Summary Statistics

Key characteristics of the dependent and control variables for our population

under study are summarized in Table 1 below. The table shows the data for the period

1948-2010 comprising of 1,282,239 observations for a 62-year period.

Turning first to our migration data, gross migration flows from source country i

to destination country j (where i = 1,2,…223 source countries and j = 1,2,… 30 destination

countries) is our main dependent variable for one of the models, in which we want to

study the relevance of linguistic proximity between origin and destination countries in

the decision to migrate.

Turning next to our trade data, gross trade flows from exporting country i to

importing country j is the dependent variable to study the significance of language

proximity in defining international trade patterns. Country-pairs are same for both

models.

30

GDP is our main economic indicator that enters gravity model specifications.

Exports rise proportionally with the economic size of the destination country, while

imports rise in proportion to the size of the origin country. Further, the size of the GDP

at destination and source counties may also explain variations in migration patterns. To

account for size and development effects, we use GDP (current US $) and GDP per capita

(current US $) separately.

Table 1. Summary Statistics for the variables subject to analysis.

Variables Mean SD Min Max Observations

Year 1982.973 16.55774 1948 2010 1,282,239

Bilateral Flows

Migration Flowsij 2189.025 24953.77 0 1827167 103,199

Trade Flowsij 127.901 1995.513 0 348420.6 1,204,671

Linguistic Distance

Linguistic Proximity Index 0.1393973 0.2496661 0 1 215,040

Controls

Distance in Km 7189.407 4488.417 0 19611.12 212,160

GDP pCap– Destination 24579.13 9997.241 5543.572 74113.94 202,720

GDP pCap – Source 10057.98 12237.55 140.0198 123263 156,960

GDP – Destination 4434.455 7831.898 6.813283 89563.63 1,080,866

GDP – Source 4553.223 7933.045 6.813283 89563.63 1,030,200

Notes: The table is restricted to relevant variables only, which are subject to empirical analysis. All variables used in the estimations except dummy variables and the Linguistic Proximity Index are expressed in logarithms.

31

Distance (in kilometers) represents the distance between the capital cities of

sending and receiving countries. We expect both migration and trade to decrease with

as the physical distance between countries increases.

The Linguistic Proximity Index ranges from 0 to 1 depending on the number of

linguistic family tree levels shared by the first official languages of the destination and

source countries. Additional summary statistics for the full set of variables and their

description employed in empirical analyses and for robustness may found in Appendix

table A.1.

Based on our gravity dataset, we can also demonstrate the overall trend in

migration and trade patterns for OECD countries during the span of 27 years, from

2006-1980 through figure 8, below:

Figure 8 Development in Migration and Trade Flows for OECD countries 1980-2006

As visualized, we find that trade is growing smoothly at an increasing rate, with

not very sharp spikes during the period, while migration flows though increasing but

experiencing much fluctuations during the same period.

32

5. Empirical Results

This section presents the main findings of our econometric models by employing

a series of regression analyses, followed by a discussion. We also investigate the

usefulness of the gravity model for trade and migration, as well as discuss the robustness

of our models and the implications of our results.

5.1 Main Results

Based on the econometric model developed and discussed in Section 3, we begin

by using a conventional ordinary least squares (OLS) approach. Afterwards, we run OLS

regressions with country- and time-fixed effects. This is a common approach when

working with panel data, as it controls for unobserved individual heterogeneity, or

unobserved characteristics that vary by country or by year. In addition, we use robust

standard errors to control for heterogeneity in the error term, which also means that our

standard errors are heteroscedastically consistent.

In what follows, we first present individual results for each model separately, and

later present a comparison of the models.

a. Migration Model Estimation Results

Beginning with the migration model, three different estimation results of the log-

linear model introduced in Section 3 through equation (1) are tabulated below in Table

2. The dependent variable is the log of migration flows. All three regressions include

year dummies to control for time-fixed effects. However, only one of the regressions

controls for country-fixed effects by incorporating country-specific fixed effects for each

source and destination country over time.

From Table 2 below, we can determine the effect of linguistic proximity and basic

gravity variables on migration flows. The first regression in Column (1) provides

coefficient estimates for a simple OLS regression where the only regressor is the

33

Linguistic Proximity Index. It gives statistically significant results for a sample

encompassing over 103,000 observations.

Table 2. Effect of Language Proximity and Gravity Variables on Migration Flows

(1) (2) (3)

Regressors OLS OLS Fixed Effects

Linguistic Proximity Index 1.592*** 1.673*** 1.112***

(0.174) (0.116) (0.136)

Ln GDP pCap _Dest_t-1 0.265*** -2.345***

(0.0328) (0.327)

Ln GDP pCap _Orig_t-1 -0.353*** -0.129

(0.0199) (0.158)

Ln Distance in km -0.650*** -1.082***

(0.0301) (0.0501)

Ln GDP_Dest_t-1 0.898*** 2.714***

(0.0178) (0.330)

Ln GDP_Orig_t-1 0.696*** -0.158

(0.0141) (0.187)

Constant 3.101*** -7.449*** 9.423***

(0.0839) (0.459) (1.480)

Observations 103,199 73,359 73,359

R-squared 0.024 0.641 0.778

Country FE NO NO YES

Dependent Variable: Natural logarithm of migration flows between country i and j at time t. Robust standard errors are reported in parentheses. All models include year dummies. Regressions are controlled for country-specific FE for source and destination in the model of Fixed Effects. *** Statistically significant at the 1 % level p<0.01 ** Statistically significant at the 5 % level p<0.05 * Statistically significant at the 10 % level p<0.1

34

In Column (2), we add to the OLS model standard gravity variables for country

size and distance. Finally, in Column (3), we further add destination and source country-

fixed effects. In all three regressions, the coefficient on linguistic proximity is positive

and highly significant. Thus, other things held constant, migration flows between two

countries are higher the closer their languages are. We provide a more complete model

in Column (2) after adding gravity variables for geographical distance and economic

indicators. Further, when country-fixed effects are introduced in Column (3), the

coefficient on linguistic proximity decreases as more control variables are added. This

suggests that incorporating control variables may alleviate the pressure of learning the

destination country’s language to integrate socially and economically.

The fixed effect regression in Column (3) absorbs both time- and country-specific

variation in the model to control for heterogeneity that varies across countries (but that

is constant over time) or across time (but that is constant across countries). The

difference between estimated coefficients in each of the regressions can be explained by

these unobserved characteristics. The estimated coefficients from the fixed effect

regression in Column (3) are also significant except for real GDP and GDP per capita at

the origin country, whereas in the simple OLS in column (2), these two GDP variables

were significant.

The coefficient on linguistic proximity in the gravity model with fixed effects

(Column 3) is 1.112 and is highly significant at the 1% level. The interpretation on the

Linguistic Proximity Index coefficient for each country-pair is based on their linguistic

tree level score. The regression result in Column (3) implies that emigration flows to a

destination country that has a similar language to the source country should be around

111% higher compared to a country with a more distant language. We expect a reasonable

drop in the value of this coefficient in the robustness analysis when additional controls

are added.

Across all three regressions, we can observe a drastic increase in the explanatory

power of the model from 2.4% (simple OLS regression in Column 1) to 78% (FE

regression in Column 3) as indicated by the R2. This means that our regressors account

for 78% of the variance in world migration patterns. The explanatory power of a gravity

35

model generally ranges from 60 to 80% (Feenstra, 2016). This increase in R2 can also be

explained by the addition of country- and year-fixed effects to absorb much of the

heterogeneity. In contrast, the R2 value of 2.4% from the simple OLS regression in

Column 1 indicates that there remains much unobserved heterogeneity not accounted

for by the model.

The estimated coefficient on GDP per capita in Column (2) is significant and

positive for the destination country, but significant and negative for the source country.

These signs correspond to our expectation that migration flows tend to go from

relatively poorer to relatively richer countries. A 10% increase in a destination country’s

GDP per capita results in a 2.6% increase in immigration. The negative sign of GDP per

capita in source country indicates that potential emigrants have less incentive to migrate

as economic opportunities in their own country of origin grow. In contrast to Column

(2), the fixed effects model in Column (3) predicts that migration flows are inversely

related to GDP per capita of the destination country. This is an unexpected result that

goes against economic theory and may be an indication that GDP per capita in the

destination country is not only correlated to the response variable but also to other

predictors in the model. Real GDP at destination has a positive and significant impact

on migration flows in Column (3), where a 10% increase in the GDP of the destination

country is associated with a 27% increase in immigration flow to that country. This is

also in line with economic theory that relatively larger and economically stronger

countries attract more immigrants, ceteris paribus, as prospective immigrants tend to

migrate looking for better standards of living and job opportunities. Meanwhile, both

real GDP and GDP per capita in the origin country both in Column (2) and Column (3),

entering negatively and insignificantly in both regressions, albeit with small magnitudes.

This implies that GDP and GDP per capita at source is associated with an economically

and statistically insignificant decrease in migration flows.

Geographical distance is also an important determinant of migration. We expect

that shorter distances between countries are significantly associated with larger

migration flows. Therefore, to control for the effect of geographic distance, we include

a regressor for the logarithmic distance (in kilometers) between the capital cities of the

36

sending and receiving countries. The estimated coefficients in Columns (2) and (3) show

that distance has a statistically significant and negative impact on migration flows.

Specifically, in the Column (3) model, a 10% increase in the physical distance between

capital cities is associated with an almost 11% decrease in migration flows. Migration

costs increase when countries are further apart as transportation costs increase with

distance. It is also interesting to note that, in Column (3), linguistic distance and physical

distance have a similar impact (in terms of magnitude) on migration flows.

There are other factors that may affect migration flows but not included in our

model, such as the stock of migrants from the same country of origin already residing in

the destination country, unemployment constraints, social security systems, public

social expenditures, political stability, and employment rates. In the absence of such

variables, the coefficient on GDP may be biased. This raises the possibility of omitted

variable bias, which is addressed in the robustness analysis in the next section. However,

for our purposes, using standard gravity variables is sufficient to establish a causal

relationship with respect to migration. Furthermore, the interaction between the gravity

variables and linguistic proximity does not undermine the importance language as a

determinant of migration.

To summarize, we find that language proximity is an important determinant of

migration between two countries, even after controlling for gravity variables for

economic size and geographic distance. In our model, whether or not countries share a

common language affects immigration flows to a larger extent than does GDP, while

linguistic distance and geographical distance have almost the same impact on migration.

In short, the more similar the languages and the shorter the distance between two

countries, the greater the migration between them.

b. Trade Model Estimation Results

Turning now to the trade model, the same methodology used in the migration

analysis was applied to our augmented gravity model of trade to determine how

language proximity affects levels of trade between trading partners. From the log-linear

form of our trade model in equation (2), estimation results are tabulated in Table 3

37

below. As before, year dummies are included in all three regressions to control for time-

fixed effects.

Table 3. Effect of Linguistic Proximity and Gravity Variables on Trade Flows

(1) (2) (2)

Regressors OLS OLS Fixed Effects

Linguistic Proximity Index 0.924*** 0.619*** 0.418***

(0.155) (0.0662) (0.0686)

Ln GDP pCap _Dest_t-1 0.111*** -1.851***

(0.0225) (0.203)

Ln GDP pCap _Orig_t-1 0.147*** 1.097***

(0.0132) (0.0858)

Ln Distance in km -0.709*** -0.959***

(0.0200) (0.0299)

Ln GDP_Dest_t-1 0.698*** 2.225***

(0.0130) (0.212)

Ln GDP_Orig_t-1 0.743*** -0.617***

(0.00862) (0.0953)

Constant 2.322*** -7.861*** 1.857*

(0.0413) (0.293) (0.956)

Observations 137,472 123,880 123,880

R-squared 0.014 0.742 0.829

Country FE NO NO YES

Dependent Variable: Natural logarithm of trade flows between country i and j at time t. Robust standard errors are reported in parentheses. All models include year dummies. Regressions are controlled for country specific FE for each exporting & importing countries in the model of Fixed Effects. *** Statistically significant at the 1 % level p<0.01 ** Statistically significant at the 5 % level p<0.05 * Statistically significant at the 10 % level p<0.1

38

Table 3 presents a set of regression results for the effect of linguistic distance and

basic gravity variables on trade flows. Column (1) shows coefficient estimates for the

standard OLS regression where Linguistic Proximity Index is the only regressor. It gives

statistically significant results for the full sample under study, encompassing around

137,000 observations.

Column (2) adds standard gravity variables for economic size and geographic

distance, while Column (3), our model of interest, further incorporates country-fixed

effects to control country-specific characteristics for each exporting and importing

country that are constant over time

The coefficient on linguistic proximity is positive and highly significant in all

three specifications. Thus, other things being equal, we find that the volume of trade

between two countries is higher if their official languages are closer. These results hold

for aggregate exports and imports.

As expected, the coefficient on linguistic proximity decreases in magnitude as

more controls are added, from Column (1) to Column (2) to Column (3). The inclusion

of country-fixed effects regression in Column (3) shows that differences between

estimated coefficients across the three regressions is due in part to unobserved

characteristics. The coefficient on linguistic proximity in the gravity model setting with

fixed effects in Column (3) is 0.418, which is highly significant at 1% level. This implies

that trade flows between country pair i and j are around 42% higher when the countries

share a similar first official language, compared to if their languages are more distant.

The estimated coefficients on all the economic variables (i.e. GDP and GDP per

capita) in the OLS regression in Column (2) are statistically significant and positive, and

appear to substantiate economic theory. Trade between exporting and importing

countries increases with increases in their respective GDP level or GDP per capita levels.

For example, a 10% increase in the GDP of an exporting country (i.e. source country)

will boost its trade volumes by 7.4%. In the estimation results with fixed effects in

Column (3), we find that GDP and GDP per capita are significant both for destination

and source, but with opposing signs. The estimated coefficient on GDP per capita of the

importing country j has a statistically significant and negative effect on aggregate trade

39

values, suggesting that a 10% higher GDP per capita in importing country will decrease

trade volumes by 18.51%. This contrasts with macroeconomic theory, which suggests

that a country’s imports are positively affected by the country’s national income. We

also expect our results to be consistent with the economic theory that exports rise

proportionally with the economic size of the destination country and imports rise in

proportion to the size of the source economy. In our estimates, the GDP per capita of

the exporting country has a positive and significant effect: A 10% increase in the GDP

per capita of the exporting country (source) is associated with an increase in their

exports by around 11%, in other words we can say that imports at destination are

increasing. We also expect less developed countries tend to import more than they

export due to their lower capacities and underutilization of resources. To account for

the economic size of countries, we used real GDP (at current US $) for both exporting

and importing countries. The estimated coefficient shows that GDP for importing

country j is highly significant at the 1% level with a positive impact on trade flows. These

results strongly support the economic theory that trade increases proportionally with

the relative economic size of the countries, ceteris paribus. However, the GDP of the

exporting country i is significant and negative. Thus, a 10% increase in the GDP of the

exporting country is associated with an almost 6% decline in trade volume, while a 10%

increase in the GDP of the importing country implies an approximately 22% increase in

bilateral trade. The importing country’s GDP is relatable to trade theory whereby GDP

has a positive effect on trade, simply because economically larger countries tend to trade

more. This, in turn, means increased globalization and economic development.

Geographical distance is an integral part of gravity models and an important

determinant of trade patterns. We expect our empirical model to shows that bilateral

trade flows are inversely proportional to the distance between trading countries. Indeed,

the estimated coefficient on distance has a statistically significant and negative impact

on trade. Specifically, from Column (3), a 10% increase in the physical distance between

capital cities is associated with a decrease in aggregate trade volumes of almost 10%.

Additionally, we find that R2 equals 0.829 in the country-fixed effects regression

model of Column (3), indicating that our model explains 83% of the variance in world

40

trade flows. The increased absolute value of R2 as compared to the conventional OLS

regressions in Columns (1) and (2) demonstrates that controlling for country-fixed

effects captures much of the unobserved heterogeneity in the model.

Certainly, there are other economic and non-economic factors that contribute to

increased bilateral trade flows, such as trade policies, regional trade agreements (RTAs),

treaties signed by the member countries of multinational organizations like the WTO,

the EU or by a country’s neighbors. A common ethnic background or historical ties could

also favor trade between countries. The absence of these variables in our model could

induce omitted variable bias, whereby the coefficient on GDP is potentially biased. This

issue is discussed separately in the robustness analysis section. However, for our

objective of assessing the explicit role of language, using a standard gravity model is

sufficient to establish a causal relationship with respect to trade.

To summarize, we find that linguistic distance is an important determinant of

trade. Trade will be negatively impacted when two countries speak entirely distinct

languages. Linguistic distance and geographical distance both effect trade flows, but

with different magnitudes. Our model shows that trade between two countries in higher

they share a similar language, have higher levels of GDP, and have a shorter distance

between them.

5.2 Robustness

To test the robustness of our results, we include a set of additional controls. In

addition, we also estimate a Poisson regression with fixed effects. The results are

presented in Table 4 below. As an additional check, we also use alternative measures of

linguistic proximity, presented in Table 5.

To ensure comparability with our previous econometric results for migration and

trade, we include the same control variables as in our default econometric models while

running robustness checks. In addition to these control variables, we also include new

dummy variables. The coefficients on the dummy variables are interpreted as the mean

change in the dependent variable (i.e. either migration or trade flows) when the dummy

changes values from 0 to 1, holding all other variables constant.

41

First, we include a dummy variable for common colonial past, which controls for

whether countries share a common historical path or are tied to a common ethnic

background. The dummy assumes value 1 if there is common colonial past between

countries i and j, and 0 otherwise. A common history might decrease the cultural

distance between countries and increase the information available about the potential

destination country (Venturini, 2018 Discussion Paper). We also add a dummy to

control for whether a present colonial relationship still exists between countries. This

dummy takes the value 1 if country pairs are still in colonial relationship. As an

additional control for the effect of distance, we add neighboring country dummies,

which take the value of 1 if two countries are neighbors (i.e. they share a common border).

We may relate this to the Border Puzzle theory by McCallum (1995) stating that “intra-

national trade is much higher than inter-national trade”. We are interested in

identifying how sharing a common border can affect the patterns of trade, and do

people prefer to migrate in neighboring country.

Furthermore, we include a dummy taking value 1 if either the destination or

origin country is a member of the GATT or the WTO.

Finally, we also include a dummy which takes on 1 if there is a regional trade

agreement in force between countries i and j.

Empirically, the presence of migration networks – that is, a network of family

members, friends, or other people of the same origin already living in the host country

– is also expected to reduce migration costs (Massey et al., 1993; Munshi, 2003).

However, we do not include it in our robustness checks to avoid any possible

collinearity with GDP per capita through population from the source country. At the

same time, we also believe that country-fixed effects would capture the unobserved

characteristics of migration networks.

We ran a fixed effect regression and a Poisson regression, with year- and

country-specific fixed effects, with two different specifications (one for migration, the

other for trade), as displayed in Table 4. Columns (1) and (2) show the fixed effect

regression results for migration and trade, while Columns (3) and (4) show the

Poisson fixed effect regression results for migration and trade, respectively.

42

Table 4. Robustness Checks: Controlling with additional dummies and variables. Fixed Effect Regression and Poisson Estimation with fixed effect

(1) (2) (3) (4)

Regressors Fixed Effect Poisson

Dep. Var: Ln

(Migration

Flowsij)

Dep. Var: Ln

(Trade Flowsij)

Dep. Var: Ln

(Migration

Flowsij)

Dep. Var: Ln

(Trade Flowsij)

Linguistic Proximity

Index

0.822*** 0.212*** 0.286*** 0.140***

(0.124) (0.0675) (0.0367) (0.0326)

Ln GDP pCap _Dest_t-1 -2.102*** -1.781*** -0.175** -0.823***

(0.341) (0.205) (0.0868) (0.0806)

Ln GDP pCap _Orig_t-1 0.0116 0.985*** -0.0514 0.130***

(0.160) (0.0860) (0.0426) (0.0303)

Ln Distance in km -1.106*** -0.855*** -0.280*** -0.238***

(0.0528) (0.0330) (0.0142) (0.0157)

GATT/WTO Dummy_Dest 0 0.133 0 0.0860*

Omitted (0.0901) Omitted (0.0482)

GATT/WTO Dummy_Orig 0.171*** 0.0227 -0.0467** 0.00690

(0.0407) (0.0251) (0.0184) (0.0166)

RTAij Dummy -0.250*** 0.339*** 0.306*** 0.277***

(0.0688) (0.0458) (0.0388) (0.0384)

Colonialij Past Dummy 1.603*** 0.901*** -1.195*** 0.102

(0.150) (0.0808) (0.419) (0.164)

Colonialij Curr Dummy -4.074*** -0.213 -0.204*** -0.157***

(0.808) (0.448) (0.0502) (0.0420)

Neighboringij Dummy 0.0367 0.529*** 0.251*** 1.003***

(0.181) (0.105) (0.0865) (0.0822)

Ln GDP_Dest_t-1 2.400*** 2.156*** -0.0356 0.0224

(0.343) (0.212) (0.0497) (0.0327)

Ln GDP_Orig_t-1 -0.327* -0.516*** -0.0467** 0.00690

(0.189) (0.0946) (0.0184) (0.0166)

Constant 11.86*** 0.133 -2.271*** -2.271***

(1.503) (0.0901) (0.355) (0.355)

Observations 69,524 123,880 69,524 123,880

R-squared 0.789 0.742 - -

Country FE YES YES YES YES

Robust standard errors are reported in parentheses. All models include year dummies. Regressions are controlled for country specific FE for each i & j in all of the specifications. *** Statistically significant at the 1 % level p<0.01 ** Statistically significant at the 5 % level p<0.05 * Statistically significant at the 10 % level p<0.1

43

The results from the robustness analyses predict that language is still an

important factor in regulating migration and trade flows, even after other factors are

taken into account. The coefficient on the Linguistic Proximity Index is positive and

highly significant in all specifications. However, its magnitude has decreased in size,

as more controls are included. The decreased effect of linguistic proximity is more

prominent in the trade model than in the migration model. This explains that other

factors alleviate the pressure of needing to learn the language of the destination

country to integrate in the new society and labor market.

Language differences are less of a barrier to trade when other trade-favoring

factors are present between two countries, for example due to the presence of regional

trade agreements (RTAs). The coefficient on RTA with respect to trade flows is

positive and highly significant: The value 0.339 implies that presence of RTA increases

trade by 40% (i.e., exp0.339 – 1 ≈ 0.40) and with Poisson by 32% (i.e., exp0.277 – 1 ≈ 0.32).

We find that the coefficient on the GATT/WTO dummy for both exporting and

importing countries in is statistically insignificant in Column (2) for trade regressions.

This implies that membership in the GATT/WTO does not have a substantial effect

on trade, in line with Andrew K. Rose’s (2004) claim that there is little empirical

evidence that member countries of the GATT/WTO have improved trade patterns

than pairs of countries outside the GATT/WTO. We also find that the coefficient for

GATT/WTO for the importing country becomes significant under the Poisson

regression for trade, as shown in Column (4). Pairs of countries that are both

GATT/WTO members trade only 9% (exp0.086 – 1 ≈ 0.09) more than pairs of non-

member countries, though this is less statistically significant than other effects (e.g.

RTA).

The coefficient on the GATT/WTO dummy for the importing country is

omitted in the migration model due to collinearity, simply because destination

countries are all OECD members and members of the GATT/WTO. Including

dummies for GATT/WTO and RTA for migration model might appear to be

meaningless, but they were nevertheless included to maintain symmetry between the

44

trade and migration models. Interestingly, the RTA dummy is significant and negative

in migration model.

Dummies for common historical past and neighboring country are statistically

significant and enter positively in the trade model, as seen in Column (2). Accordingly,

having a past colonial tie increases trade by more than 100%, while sharing a common

border increases trade volumes by 69%. These results are consistent with economic

theory. In the migration model, our main findings are robust to the inclusion of

common historical past. As we can see in Column (1), that coefficient on the colonial

past dummy is positive and significant. However, if the countries are still in a colonial

relationship with each other, then this will have a negative impact on migration flows.

Meanwhile, the coefficient on sharing a common border is not statistically

significant in defining migration patterns, implying that countries neighboring each

other does not affect migration decisions between them – rather, a prospective

migrant puts more weight on other factors in deciding whether and where to relocate.

The magnitude and interaction of the gravity variables in the robustness

regressions, namely GDP and distance, is similar to their values in our main

econometric models of both migration and trade. In the robustness checks, GDP per

capita at destination is still significant and negative in all specifications from Columns

(1) through (4), while GDP per capita at origin is insignificant with respect to

migration flows but highly significant and positive in relation to trade flows. Real GDP

at destination has a significant and a positive impact on both migration patterns and

trade flows in Columns (1) and (2), but real GDP at source impacts trade and migration

negatively. Lastly, the coefficient on geographic distance is negative and highly

significant in all specifications in Columns (1) through (4), implying that regardless of

other factors, migration and trade are inversely proportional to the physical distance

between countries.

Including alternate measures of linguistic distance

To further test the robustness of our results, we also use a set of alternative

measures of linguistic distance, as displayed in Table 5. First, we run the regression using

45

our standard Linguistic Proximity Index (based on Ethnologue). Next, we use the

Levenshtein distance developed by the Max Planck Institute for Evolutionary

Anthropology. Finally, we use the Dyen linguistic proximity measure proposed by Dyen

et al. These indices were explained in the data description in Section 4.2.

Table 5 below displays three different regression estimations each for the

migration and trade models separately. Column (1) in each set is the baseline model

using the same Linguistic Proximity Index as in our original empirical models. Column

(2) instead uses the Levenshtein Index (divided by 100) and Column (3) uses the Dyen

Index (divided by 1,000). These divisions simply normalize alternate indices to be in line

with our standard Linguistic Proximity Index. As before, each specification compared

the first official language in each country pair. Lastly, the control variables included in

each specification are economic size variables, a distance variable, additional dummies

as included with robustness checks above, and country-fixed effects.

It is worth highlighting that the Levenshtein index is defined in terms of distance

(not proximity) between languages, so we would expect its coefficient to have a negative

sign. From Table 5, as shown in Column (2) of each set, the coefficient on the

Levenshtein index is indeed negative. The Levenshtein index is highly significant in both

the migration and trade models.

As for the Dyen index, it is worth noting that it covers only Indo-European

languages, resulting in about 50% of observations in the sample being dropped from the

regression. Nevertheless, Table 5, Column (3) of each set shows that the coefficient on

the Dyen index is significantly positive: In the trade model, countries with the same first

official language (i.e. a Dyen index of 1000) trade around 43% more than do countries

with rather dissimilar languages.

Overall, these results show that language proximity continues to have a

significant effect on both trade and migration, regardless of how “language proximity”

is defined and measured.

Estimation results are tabulated below, as Table 5.

46

Table 5. Robustness Checks: Alternative Measures of Linguistic Proximity (Dyen and Levenshtein) for First Official Languages with Controls

(1) (2) (3) (1) (2) (3) Regressors F.E F.E F.E F.E F.E F.E

Dep. Var: Ln (Migration Flowsij) Dep. Var: Ln (Trade Flowsij)

Linguistic Proximity 0.822*** - - 0.212*** - - (0.124) (0.0675)

Levenshtein - -0.745*** - - -0.254*** -

(0.134) (0.0751) -

Dyen - - 1.189*** - - 0.430***

(0.144) (0.0775)

Observations 69,524 67,326 36,283 57,397 55,702 30,199

R-squared 0.789 0.788 0.802 0.874 0.873 0.898

Country FE YES YES YES YES YES YES

Robust standard errors are reported in parentheses, clustered at the country-pair level. Controls included: Economic Variables, Distance variable, Additional Dummies, Year Dummies, Destination & Origin country Fixed Effects. *** Statistically significant at the 1 % level p<0.01 ** Statistically significant at the 5 % level p<0.05 * Statistically significant at the 10 % level p<0.1

5.3 Econometric Issues

Next, we describe the econometric issues we attempted to address. However, this

does not imply that other issues do not exist.

a. Multicollinearity

One of the biggest issues with panel data and gravity equations is

multicollinearity, which arises when one variable is an exact linear function of the other

regressor(s), or when two or more regressors are highly correlated. Stata automatically

detects perfect collinearity (by dropping the coefficients on such variables), but near-

collinearity is more difficult to diagnose 8 .Near-collinearity arises when pairwise

correlations of regressors are high. Collinearity of sufficient magnitude can adversely

affect regression results. With near-collinearity, small changes in the data matrix cause

large changes in the estimates. Although overall fit of the regression (as measured by R2

or 𝑅2̅̅̅̅ ) may be very good, the coefficients may have a high standard errors and perhaps

8 An Introduction to Modern Econometrics Using Stata (2006) C. Baum Chap. 4.

47

even incorrect signs or implausibly high magnitudes. Another possible source of perfect

multicollinearity arises when using multiple binary or dummy variables as regressors,

known as the dummy variable trap. Generally, we avoid the multicollinearity problem

by excluding one of the binary variables as a standard practice. To test for

multicollinearity, we use the “Variance Inflation Factor (VIF)” defined as:

𝑉𝐼𝐹 = 1

(1 − 𝑅2)

VIF measures the degree to which variance has been inflated because one of the

regressor is not statistically independent. Also, 1 − 𝑅2 = 𝑡𝑜𝑙𝑒𝑟𝑎𝑛𝑐𝑒 𝑜𝑟 1/𝑉𝐼𝐹 is the

percentage of variance in the independent variable that is not accounted for by other

variables. VIF can be tested after fitting a regression model. As a rule of thumb, if the

mean VIF is greater than unity or if the largest VIF is greater than 10 then there is

evidence of collinearity. Table 6 below provides VIF values, maximum VIF is less than 2

collectively for both the migration and trade models that shows there is no perfect

correlation. However, mean VIF for both (migration flows and trade flows) is greater

than 1.00, which indicates that there might be some degree of correlation between

independent variables.

Table 6. Multicollinearity Diagnostic: The VIF Method

Ln (Migration Flowsij) Ln (Trade Flowsij) Variables VIF 1/VIF VIF 1/VIF

Linguistic Proximity Index

1.09 0.921344 1.05 0.953790

Ln GDP pCap _Dest_t-1 1.29 0.776016 1.45 0.691135 Ln GDP pCap _Orig_t-1 1.67 0.597229 1.49 0.670328 Ln Distance in km 1.12 0.895169 1.10 0.909665 Ln GDP_Dest_t-1 1.20 0.833396 1.23 0.815320 Ln GDP_Orig_t-1 1.60 0.623118 1.50 0.665550

Mean VIF 3.22 1.97

48

b. Heteroscedasticity and Serial Correlation

Serial correlation (or autocorrelation) occurs when one observation’s error term

is correlated with another observation’s error term. As error terms consists of time-

varying factors that affect the dependent variable but are not included as regressors,

some of these omitted factors may be autocorrelated and cause the standard errors of

the coefficients to be smaller and R2 to be larger than they would otherwise be. Because

our gravity model uses panel data on consecutive years for country pairs and with lagged

observation on some of the variables, autocorrelation is highly likely.

Heteroscedasticity states that the variance of the regression error terms

conditional on the regressors is not constant (the opposite as in the case of

homoscedasticity). To control for both heteroscedasticity and correlation over time, all

our estimates are based on robust standard errors. We have used the vce, (cluster id)

command in our regression estimations to yield standard errors that are robust to

heteroscedasticity and autocorrelation (using country-pair as id). This is a widely-used

command by researchers.

vce(robust-) uses the robust or sandwich estimator of variance. This estimator is

robust to some types of misspecification so long as the observations are

independent.

vce(cluster-) specifies that the standard errors allow for intragroup correlation,

relaxing the usual requirement that the observations be independent.

Observations are independent across groups (clusters) but not necessarily within

groups.

c. Endogeneity

The problem of endogeneity occurs when an explanatory variable is correlated

with the error term. As highlighted in the section above, we cannot completely

overcome this problem. However, it is partly addressed by using country-specific fixed

effects for both destination and source countries. Nevertheless, the complete absence of

omitted variable bias is not guaranteed.

49

d. Year-Fixed Effects and Country-Fixed Effects

It is important to include year-fixed effects in panel data regressions to control

for economic shocks such as booms and recessions that are common to all countries,

but that vary over time. Controlling for source-country, destination-country, and time-

fixed effects was done through dummy variables that assume the value 1 if a given

source/destination or year appears in the data in a particular observation, and 0

otherwise. By using these fixed effects, we find that our model has higher explanatory

power as compared to the conventional OLS model without fixed effects.

e. Other Econometric Concerns

We have tried to achieve the best possible unbiased coefficients in our regressions.

Still, some issue may remain. We attempted to control for some of these pitfalls through

robustness checks and by adding additional variables. The results in our robustness

checks are largely consistent with our main findings in the original empirical models,

though less so when it comes to the robustness of the variables GDP and GDP per capita

due to variation in their results.

There is also the possibility that using linguistic proximity as the only proxy for

culture may bias the results. There are other factors beyond language that reflect cultural

closeness, such as genetic distance, common traditions, and ethnicity, which we did not

take into account. Finally, we cannot rule out the importance of trade liberties and

preferential trade policies in enhancing business ties.

50

6. Comparative Analysis

An important objective of our research was to conduct a comparative analysis of

our augmented gravity models to estimate the impact of language on trade and

migration. To keep the comparison simple and straightforward, we used the same set of

variables in both the trade and migration models. The OLS regression with year- and

country-fixed effects and robust standard errors is the preferred method to accurately

predict our covariates. The gravity model appears to have been successful in explaining

most of the reasons why bilateral trade and migration varies across the full sample of

observations. Overall, we find that countries that are economically larger and relatively

closer in geographic distance are more engaged in bilateral trade and attract more

immigrants towards them. In addition, the Linguistic Proximity Index is found to be

highly significant in all specifications and behaves as expected with the gravity variables.

Our estimates are not only statistically significant but are also in line with the economic

literature. Taken together, gravity variables and cultural variables do a decent job at

explaining the causal relationship for both international migration and bilateral trade.

Using the same set of countries and similar econometric methods, and holding

other variables constant, we find that linguistic distance is highly significant in

determining trade flows and migration patterns between source/exporting and

destination/importing countries. As shown in the results section, the Linguistic

Proximity Index is positive and significant in all specifications (i.e. OLS and FE

regressions) in both models. However, the coefficient on linguistic proximity is much

larger for migration. This may imply that language is relatively more important in

migration decisions than it is in trade.

In terms of other independent variables, GDP, and GDP per capita in the

destination country was found to be statistically significant in both the trade and

migration models, but with opposing signs. In particular, GDP per capita had a negative

coefficient, while GDP had a positive coefficient for destination, in both models. While

this result may not be in line with economic theory, both GDP variables do behave

51

similarly in both models. For its part, geographic distance has almost the same impact

on both trade and migration models. Our results validate the economic theory that

countries that are relatively closer in geographical distance tend to attract more

immigrants and favor bilateral trade. Numerically, a 10% decrease in the physical

distance between the capital cities of two countries is associated with an increase in

aggregate trade volumes of almost 10% and an increase in migration flows of almost 11%.

We also find that linguistic distance and geographical distance have almost the same

impact on migration. For a potential migrant, language and distance are equally

important factors to consider in migration decisions.

We also use a statistical tool, fitting a seemingly unrelated regression SURE

model (Zellner, 1962) in its most basic form, to test whether our econometric equations

are related through the correlation in their error terms. This model takes into account

the fact that subtle interactions may be present between individual statistical

relationships when each of these relationships is being used to model some aspect of

behavior. The Stata output, is tabulated in Table 7 below:

Table 7. Seemingly Unrelated Regression (SURE)-Migration and Trade Equations

Variables SUREG (Migration) SUREG (Trade)

Linguistic Proximity Index 1.098*** 0.352***

(0.0280) (0.0232)

Ln GDP pCap _Dest_t-1 -2.334*** -2.248***

(0.177) (0.147)

Ln GDP pCap _Orig_t-1 -0.177*** 1.000***

(0.0610) (0.0507)

Ln Distance in km -1.076*** -0.951***

(0.00986) (0.00820)

Ln GDP_Dest_t-1 2.618*** 2.668***

(0.186) (0.155)

Ln GDP_Orig_t-1 -0.138** -0.599***

(0.0639) (0.0531)

Observations 69,524 69,524

R-squared 0.779 0.863

Country F.E YES YES

52

Correlation matrix of residuals:

Ln Migration Flowsijt Ln Trade Flowsijt

Ln Migration Flowsijt 1.0000

Ln Trade Flowsijt 0.2476 1.0000

Breusch-Pagan test of independence:

Chi2 (1)= 4260.892, Pr. = 0.0000

The summary output indicates that each equation explains almost all the

variation in the observation on migration and trade flows. The correlation matrix

displays the estimated VCE of residuals and test for independence of the residual vectors

(error terms). Sizeable correlations, all positive appears, and the Breusch-Pagan test

rejects its null of independence of these residual series at 1% level. We can also notice

that regression coefficients, standard errors, and R2 change their values with SURE

indicating eq. (1) and (2) from migration and trade models are related through residuals.

However, this does not change our interpretations on the original econometric models

under study.

53

7. Conclusions

In this thesis, we have tried to investigate the role of cultural differences on

migration and trade patterns using a refined indicator, the Language Proximity Index. A

gravity model approach is adopted for this purpose to capture and segregate economic

characteristics, geographical distribution, and cultural barriers for migration and trade.

Previous empirical research into the determinants of migration and trade had rarely

gone beyond using simple dummy for sharing a common language. In our research, we

instead adopt a sophisticated and more accurate measure of linguistic proximity.

Furthermore, few studies in the existing literature consider trade and migration patterns

in a parallel fashion as we have done. In fact, trade and migration determinants have

been studied through gravity equations either separately or with one as an explanatory

variable for the other. Instead, we believe there are many dynamics that are actually

common in both migration and trade patterns. Under our joint approach, we are able to

conduct a comparative analysis for migration and trade models to gain a better

understanding on our coefficients of interest. In this way, we have attempted to

contribute something new to the existing literature on the subject.

To address our research question, we use a panel dataset on migration flows and

trade flows from 223 source countries to 30 OECD destination countries around the

world, for the period 1980-2006. We initially employed conventional OLS on the full

sample of countries, then extended the analysis using fixed effect regressions, and finally

we check the robustness of our findings with the Poisson fixed effect method. Starting

with simple gravity variables, we take the analysis a step further by including additional

controls through dummies. For the most part, the results remain consistent across all

specifications with little variation. Based on our empirical model(s), we conclude that:

Language proximity is highly significant across all model specifications and all

the econometric approaches used. We find that migration rates and trade

volumes are higher between countries whose first official languages are closer.

The results are robust to the use of two alternate measures of linguistic distance

54

(i.e. Levenshtein distance and Dyen linguistic proximity). Furthermore, linguistic

distance poses a relatively greater barrier for migration than for bilateral trade.

Traditional economic push and pull factors, like GDP per capita and real GDP,

interact differently for migration and trade. This contradicts economic theory

that levels of GDP should positively affect bilateral trade and migration flows

both for sending and receiving countries. However, these results do not sharply

contrast previous literature. We can say that linguistic proximity does a better

job of explaining the determinants of the direction of migration flows and trade

patterns due to consistency of their results, compared to differentials in economic

variables.

Geographical distance has a statistically and economically significant impact on

trade and migration flows in all sets of models. Countries that are geographically

farther apart trade less, and countries that are geographically closed have greater

migration inflows at a particular destination. In addition, linguistic distance and

physical distance are found to have almost same impact on migration and on

trade. Although the magnitude of the physical distance effect and linguistic

distance effect on migration and on trade has been different through various

econometric methods, their sign and significance endure the hardship of

scientific scrutiny.

The extension of models using robustness dummies finds that countries

belonging to the same regional trade associations trade more. Furthermore, a

shared colonial history encourages trade and is also conducive to an increased

influx of immigrants. Meanwhile, a dummy for whether the countries are

neighbors has a significantly positive impact for trade, but insignificant for

migration. Finally, we find that membership in the GATT/WTO is associated with

a surprisingly insignificant impact on trade. This also challenges the general

perception that multilateral trade organizations like the WTO promote trade.

Although we control for observations with value zero, endogeneity, and

heteroscedasticity, there remains a risk of measurement error or bias in our results.

However, overall our empirical strategy based on the gravity model works well in

explaining most of the factors contributing to variations in trade flows and migration

55

patterns. Furthermore, language similarity has a substantial effect on trade and

migration above and beyond gravitational effects. This finding opens the door to

potential future discussions on the topic. As patterns of trade and migration are

influenced by an interplay of several other determinants, choosing an instrumental

variable may be an interesting approach to infer a causal relationship within migration

and trade. Cultural barriers (as we find) dampen migration flows and bilateral trade, but

immigrants can counter the effects of this cultural distance on trade and vice versa, as

product diversity requires ethnic diversity. We have studied trade flows and migration

patterns exclusively to OECD destinations, which are developed countries. However,

there is reason to believe that the mechanism driving migration and trade between

developed countries will be different than those driving migration and trade between

developing and developed countries.

56

Bibliography

Anderson, J. E. (1979). A Theoretical Foundation for the Gravity Equation. The American Economic

Review, 69(1), 106-116 .

Anderson, J. E. (2011). The Gravity Model. Annual Review of Economics, 3(1), 133-160.

Barry R.Chiswick, P. W. (2015). Handbokk of the Economics of International Migration. Elsevier.

Baum, C. F. (2006). An Introduction to Modern Econometrics Using Stata. Texas: StataCorp LP.

Belot , M., & Ederveen , S. (2012). Cultural Barriers in Migration Between OECD Countries. Journal

of Population Economics.

Bergstrand, J. H. (1985). The Gravity Equation in International Trade: Some Microeconomic

Foundations and Empirical Evidence. The Review of Economics and Statistics, 67(3), 474-

481.

Brakman, P. A. (2010). The Gravity Model in International Trade. Cambridge: Cambridge University

Press.

CEPII Gravity Database. (2018). Hentet fra

http://www.cepii.fr/cepii/en/bdd_modele/presentation.asp?id=8

Christian Thiemann. (2010). The Structure of Borders in a Small World. PLoS ONE, 5(11).

Cohen, K. K. (2010). Determinants of International Migration Flows to and from Industrialized

Countries: A Panel Data Approach Beyond Gravity. The International Migration Review,

44(4), 899-932.

David Karemera, V. I. (2010). A gravity model analysis of international migration to North America.

Applied Economics , Volume 32, 2000 (13), 1745-1755.

De, P. (2013). Assessing Barriers to Trade in Services in India: An Empirical Investigation. Journal of

Economic Integration, 28(1), 108-143.

Ederveen, M. B. (2012). Cultural barriers in migration between OECD countries. Journal of

Population Economics, 25(3), 1077-1105.

Egger, P. (2000). A note on the proper econometric specification of the gravity equation. Economic

Letters, 66(1), 25-31.

Feenstra, R. C. (2016). Advanced International Trade: Theory and Evidence. New Jersey: Princeton

University Press.

Filippo Simini, M. C.-L. (2012). A universal model for mobility and migration patterns. Nature, 484,

ss. 96-100.

57

Friberg, J. H. (2012). 13. The stages of migration from going abroad to settling down : Post

Accession Polish migrant workers in Norway. Journal of Ethnic and Migration Studies,

38(10), 1589-1605.

Gabriel & Toubal, 2. (2010). Cultural Proximity and Trade. European Economic Review, 54(2), 279-

293.

Gautier Krings, F. C. (2009). Urban gravity: a model for inter-city telecommunication flows. Journal

of Statistical Mechanics: Theory and Experiment, 2009.

Gidwani, V., & Sivaranakrishan, k. (2004). Circular Migration and the Spaces of Cultural Assertion.

Wiley Online Library.

Giuliano, A. A. (2015). Culture and Institutions. Journal of Economic Literature, 53(4), 898-944.

Gokmenn, G. (2017). Clash of Civilization and the Impact of Cultural Differences on Trade. Journal

of Development Economics.

Gourdon, J. (2009). Journal of Economic Integration, 24(1).

Head, K. a. (2015). Gravity Equations: Toolkit, Cookbook, Workhorse. 4, 131-195. (H. o. Economics.,

Red., & K. R. Elhanan Helpman, Kompilator) Elsevier. Hentet fra

https://sites.google.com/site/hiegravity/

Head, K. M. (2010). The erosion of colonial trade linkages after independence. Journal of

International Economics, 81(1), 1-14.

International Migration Report. (2017). The United Nations, Department of Economic and Social

Affairs Population Division . New York: The United Nations.

Isaac, J. (1947). Economics of Migration. (D. K. Mannheim, Red.) London: hunt, Barnard and Co.

Ltd.

J.Felbermayrt, G., & Toubal, F. (2007). Cultural Proximity and Trade.

J.Lewera, J., & Bergb, H. d. (2008). A gravity model of immigration. Economic Letters, 99(1), 164-

167.

Jacques Melitz, F. T. (2014). Native language, spoken language, translation and trade. Journal of

International Economics, 93(2), 351 to 363.

JAIN, S. M. (2015). Determinants of OFDI: An Empirical Analysis of OECD Source Countries using

Gravity Model. Indian Economic Review, New Series, 50(2), 243-271 .

James E. Anderson, E. v. (2003). Gravity with Gravitas: A Solution to the Border Puzzle. American

Economic Review, 93(1), 170-192.

Kingsley, G. (1946). The P1 P2/D Hypothesis: On the Intercity Movement of Persons. American

Sociological Review, , 11(6), 677-686.

58

Klay, S. (2011). Explaining the stages of migration within a life course framework. European

Sociological Review, 27(4), 469-486.

Kónya, I. (2006). Modeling Cultural Barriers in International Trade. Review of International

Economics, 14(3), 494-507.

Ku, H., & Zussman, A. (2010). Lingua Franca: The Role of English in International Trade. Journal of

Economic Behavior and Organization.

Lassmann, P. E. (2011). The Language Effect in International Trade: A Meta-Analysis. CESifo

Working Paper Series 3682, CESifo Group Munich.

Liu, C. W. (2010). Determinants of Bilateral Trade Flows in OECD Countries: Evidence from

Gravity Panel Data Models. The World Economy, 33(7), 894-915.

Lohmann, J. (2011). Do Language Barriers affect Trade. Economic Letters.

MarcBarthélemy. (2011). Spatial Networks. Physics Reports, 499(1-3), 1-101.

Mauro Lanati, A. V. (2018). Cultural Change and the Migration Choice. IZA Discussion Paper No.

11415.

Mayda, A. M. (2010). International Migration: A Panel Data Analysis of the Determinants of

Bilateral Flows. Journal of Population Economics, 23(4), 1249-1274.

Melitz, J., & Toubal, F. (2014). Native Language, Spoken Language, Translation and Trade. Journal

of International Economics.

Migration and Migrants: A Global Overview , World Migration Report (2018). Geneva: International

Organization for Migration.

Millimet, D. J. (2008). Is Gravity Linear? Journal of Applied Econometrics, 23(2), 137-172.

Nicita, A., & Tumurchudur-Klok, B. (u.d.). Geneva: UNCTAD.

Organisation for Economic Co-operation and Development. (2017). Hentet fra National Accounts -

OECD: http://www.oecd.org/sdd/na/

Pablo Kaluza, A. K. (2010). The complex network of global cargo ship movements. J. R. Soc.

Interface, 7(48), 1093-1103.

Peter H.Egger, A. (2012). The language effect in international trade: A meta-analysis. Economics

Letters, 116(2), 221-224.

Peter Egger. (2000). A note on the proper econometric specification of the gravity equation.

Economics Letters, 66(1), 25-31.

Pytliková, A. A. (2015). The Role of Language in Shaping International Migration. The Economic

Journal, 125(586), F49-F81 (Feature Issue).

Pöyhönen, P. (1963). A Tentative Model for the Volume of Trade between Countries. (w. JSTOR,

Red.) Weltwirtschaftliches Archiv, 90(1963), 93-100.

59

ROSE, A. K. (2004). Do We Really Know That the WTO Increases Trade? The American Economic

Review, 94(1), 98-114.

The World Bank. (2017). Hentet fra World Bank National Accounts Data :

https://data.worldbank.org/

Thomlinson, R. (1961). A Model for Migration Analysis. Journal of the American Statistical

Association, 56(295), 675-686.

Tiiu Paas, E. T. (2008). Gravity Equation Analysis in the Context of International Trade: Model

Specification Implications in the Case of the European Union. Eastern European Economics,

46(5), 92-113.

UN-DESA. (2015). World Population Prospects. Hentet fra The United Nations Department of

Economic and Social Affairs:

https://esa.un.org/unpd/wpp/publications/files/key_findings_wpp_2015.pdf

UNDP. (2009). United Nations Development Program. Hentet fra

http://www.undp.org/content/undp/en/home/librarypage/corporate/undp_in_action_200

9.html

VANDERKAMP, J. (1977). THE GRAVITY MODEL AND MIGRATION BEHAVIOUR: AN

ECONOMIC INTERPRETATION. Journal of Economic Studies, 4(2), 89-102.

VK Srivastava, D. G. (1987). Seemingly Unrelated Regression Equations: Estimation and Inference.

New York: Marcel Dekker Inc.

Watson, J. H. (2015). Introduction to Econometrics. London: Pearson.

Weber, V. G. (Red.). (2016). The Palgrave Handbook of Economics and Language. London: Palgrave

Macmillan.

White, R. (2016). Cultural Differences and Economic Globalization: Effects on Trade, Foreign Direct

Investment, and Migration. Oxon and New York: Routledge.

Wincoop, J. E. (2003). Gravity with Gravitas: A Solution to the Border Puzzle. The American

Economic Review, 93(1), 170-192.

Woo-Sung Jung, F. W. (2008). Gravity Model in the Korean Highway. Europhysics Letters

Association, 81(4).

Zellner, A. (1962). Journal of the American Statistical Association.

Zimmermann, A. F. (Red.). (2013). International Handbook on the Economics of Migration.

Cheltenham , UK: Edward Elgar.

Zimmermann, K. F., & Bauer, T. (Red.). (2002). The Economics of Migration (Vol. 1). Cheltenham:

Edward lgar Publishing Inc.

60

Appendix

Table A.1

Summary Statistics – Additional Variables from Full Sample

Variables Observations Mean S.D Min Max

Levenshtein Index 208,320 87.63829 23.59133 0 106.39

Dyen 100,608 414.3834 277.7419 110.6 1000

Population_Dest 1,164,851 32.21703 110.311 .0197 1311.798

Population_Source 1,136,274 32.59678 113.4587 .0197 1311.798

Stockij 82,892 38266.71 609110.4 0 4.17e+07

GATT/WTO_Dest Dummy 1,204,671 .6014323 .4896036 0 1

GATT/WTO_Source Dummy 1,204,671 .5645425 .495817 0 1

RTA Dummy 1,204,671 .0271095 .1624025 0 1

Past Colonial Dummy 1,204,671 .017081 .1295734 0 1

Current Colonial Dummy 1,204,671 .0035852 .0597692 0 1

Neighbor Dummy 215,040 .0183036 .1340471 0 1

61

Table A.2

List of Variables and Definition – Full Sample

Variable Definition

Year Observation year

ID Pair of country ID (Destination + Source)

Destination Name of destination country j

Source Name of source country i

Migration Flowij Immigration inflow from country i to j

Trade Flowij Trade volume, Export or import

Linguistic Proximity Index Measure of language closeness or distance - official language

Levenshtein Max P distance between official languages

Dyen Dyen linguistic proximity between first official languages

Distance Capitals Distance between capital cities in kilometers

GDP pCap_Destination GDP per Capita in US $ in destination

GDP pCap_Source GDP per Capita in US $ in source

GDP_Destination GDP in US $ in destination

GDP_Source GDP in US $ in source

Stockij Foreign population from country i residing in j

GATT/WTO_ Dest Dummy variable = 1 if destination is GATT/WTO member

GATT/WTO_ Source Dummy variable = 1 if source is GATT/WTO member

RTA Dummy variable = 1 if Regional Trade Agreement in force

Colonial History Dummy Variable = 1 for pair ever in colonial relationship

Colonial Current Dummy Variable = 1 for pair currently in colonial relationship

Neighbor Dummy Variable= 1 for neighbor country, destination and source share a common border

62

Table A.3

List of OECD Destination Countries

Name of the countries

1. Australia 16. Korea

2. Austria 17. Luxembourg

3. Belgium 18. Mexico

4. Canada 19. Netherlands

5. Czech Republic 20. New Zealand

6. Denmark 21. Norway

7. Finland 22. Poland

8. France 23. Portugal

9. Germany 24. Slovak republic

10. Greece 25. Spain

11. Hungary 26. Sweden

12. Iceland 27. Switzerland

13. Ireland 28. Turkey

14. Italy 29. United Kingdom

15. Japan 30. United States

Source Countries All world countries, total 223 in number.

63

Table A.4

Definitions and Technical Notes Migration Flow: Migration flow is the inflow of immigrants to a destination from a given origin in

a given year. The definition usually covers immigrants coming for a period of half year or longer.

Flow refers to the number of migrants entering or leaving a country during a given period.

International Migrant: any person who changes his or her country of usual residence.

Foreign Population Stock: It is the total number of foreigners or international migrants from a

given source country living in a particular destination in a given year.

Citizenship and Country of Birth: The main criteria used for categorizing migrant stock and flows

are country of birth and citizenship. Citizenship indicates the particular legal bond between an

individual and his/her country, acquired by birth or naturalization, whether by declaration, choice,

marriage, or other. Country of Birth refers to the country of residence of the mother at the time of

the birth or, in default, the country in which the birth took place.

Trade Flow: Total Trade values either imports or exports for all country-pairs. One country’s exports

are other country’s import, in a way flow from exporting country to importing country.

Destination and Source countries: In migration context, destination is the country that receives

immigrants from various source countries which are sending these immigrants. In respect to trading

partners, exporting country is the source country, and destination is the importing country.

Linguistic Proximity: Language similarity or closeness. The Linguistic Proximity Index ranges from

0 to 1, depending on how many levels of linguistic family tree the languages of both countries share.

Bilateral Trade: It is the exchange of goods between two countries.

Gravity Model: Gravity model is a model used to estimate the amount of interaction between two

entities. It is based on Newton’s universal law of gravitation.

GDP and GDP per Capita: Gross domestic product defines size of a country and GDP per capita

(GDP divided by population) captures development effects. So together they describe how strong

and rich an economy is.

Multicollinearity: Generally, occurs when there is high correlation between two or more predictor

variables. In a way one predictor can be used to predict the other.

Endogeneity: This problem occurs when an explanatory variable is correlated with the error term.

Endogeneity arises as a result of measurement error, serial correlation, simultaneous causality,

selection bias and omitted variables.

Autocorrelation: Serial correlation or autocorrelation occurs when one observation’s error term is

correlated with another observation’s error term. In a way, error terms in a time series or cross section

data transfers from one period to another. Serial correlation is problematic, as it causes standard

errors of the coefficients to be smaller and R-squared higher than otherwise.

Documents

Language as a Driver of Migration and Trade using the